# Week 4: Loops, Functions, Dictionaries and List Comprehension

## Load the example data

In [None]:
import pandas as pd
# Load the example DataFrame
df = pd.read_csv('/mnt/data/example_dataframe.csv')

### What are loops, and How Do They Work?

A `for` loop is a control flow statement that allows us to execute a block of code multiple times. The basic structure is, note the indentation:

```python
for element in iterable:
    # code to execute
```

**Example 1: Basic For Loop**

In this example, we use a basic `for` loop to iterate over the 'Name' column of our DataFrame. Inside the loop, we print a greeting for each name.

In [27]:
for i in range(5):
    print(f"These are numbers in a range, {i}")
print("")
for i in range(1,5):
    print(f"These are numbers in a range, {i}")
print("")
for i in range(0,10,2):
    print(f"These are numbers in a range, {i}")



These are numbers in a range, 0
These are numbers in a range, 1
These are numbers in a range, 2
These are numbers in a range, 3
These are numbers in a range, 4

These are numbers in a range, 1
These are numbers in a range, 2
These are numbers in a range, 3
These are numbers in a range, 4

These are numbers in a range, 0
These are numbers in a range, 2
These are numbers in a range, 4
These are numbers in a range, 6
These are numbers in a range, 8


In [None]:
# Example of a basic for loop to print names from the dataset
for name in example_df['Name']:
    print(f"Hello, {name}!")

#### Example 2: Nested For Loop

In this example, we use a nested `for` loop: the outer loop iterates through names, and the inner loop iterates through each character of the current name. This demonstrates how loops can be nested within each other for more complex operations.

In [None]:
# Example of a nested for loop to print each character of each name
for name in example_df['Name']:
    print(f"Name: {name}")
    for char in name:
        print(f"  Character: {char}")

## Introduction to Functions

### Basics of Functions

In [None]:
def filter_by_age(df, min_age, max_age):
    return df[(df['Age'] >= min_age) & (df['Age'] <= max_age)]

# Apply the function to the example dataset
filter_by_age(example_df, 30, 40)


## Introduction to Lambda Functions

### Basics of Lambda Functions

Lambda functions are often used for quick, inline operations on DataFrames. They are defined using the `lambda` keyword.

### How to Use Them

In [None]:
# Example 1: Using lambda to square the 'Age' column
df['Age_squared'] = df['Age'].apply(lambda x: x ** 2)
print(df)

# Example 2: Using lambda with `filter` to get Names of people older than 30
older_than_30 = list(filter(lambda x: x > 30, df['Age']))
print(older_than_30)

# Example 3: Using lambda to create a new column with length of names
df['Name_length'] = df['Name'].apply(lambda x: len(x))
print(df)

## Introduction to Dictionaries

### Basics of Dictionaries


In the context of DataFrames, dictionaries are often used for renaming columns, mapping values, and more.

### Creating and Manipulating Dictionaries

In [28]:
# Example 1: Renaming columns using a dictionary
df.rename(columns={{'Name': 'Full Name', 'Age': 'Age in Years'}}, inplace=True)
print(df)

# Example 2: Replacing values using a dictionary
df['Occupation'].replace({{'Engineer': 'Mechanical Engineer', 'Doctor': 'Physician'}}, inplace=True)
print(df)

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Example 3: Aggregating using a dictionary

agg_rules = {'A': 'sum', 'B': 'mean'}
result = df.agg(agg_rules)
print(result)


NameError: name 'df' is not defined

### Dictionary Comprehension

## List Comprehension

### Basics of List Comprehension

List comprehensions can be used to create new lists or DataFrame columns in a concise manner.


### When and How to Use It

In [None]:
# Example 1: Creating a new list of names in uppercase
upper_names = [name.upper() for name in df['Full Name']]
print(upper_names)

# Example 2: Creating a new DataFrame column using list comprehension
df['Is_Elderly'] = ['Yes' if age > 40 else 'No' for age in df['Age in Years']]
print(df)

# Example 3: Using list comprehension with multiple conditions
df['Life_Stage'] = ['Young' if age < 30 else 'Middle-aged' if age < 50 else 'Old' for age in df['Age in Years']]
print(df)

## Merging and Joining Data

### Basics of Joining Data

Pandas provides various ways to combine DataFrames including `.merge()` for database-style joins.

In [None]:
# Example 1: Creating another DataFrame to join with the original
data2 = {'Full Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]}
df2 = pd.DataFrame(data2)

# Merging the two DataFrames on 'Full Name'
merged_df = pd.merge(df, df2, on='Full Name')
print(merged_df)

# Example 2: Left join
left_joined_df = pd.merge(df, df2, on='Full Name', how='left')
print(left_joined_df)

# Example 3: Checking if the join is correct by validating the number of rows and key uniqueness
def is_join_correct(df1, df2, key, how='inner'):
    joined_df = pd.merge(df1, df2, on=key, how=how)
    if how == 'inner':
        return len(joined_df) == min(len(df1), len(df2)) and joined_df[key].is_unique
    return True

print(is_join_correct(df, df2, 'Full Name'))

### Types of Joins

### Common Mistakes When Joining Data

In [None]:

# Example 3: Checking if the join is correct
def is_join_correct(df1, df2, key, join_type='inner'):
    joined_df = pd.merge(df1, df2, on=key, how=join_type)
    if join_type == 'inner':
        return len(joined_df) == min(len(df1), len(df2))
    elif join_type == 'left':
        return len(joined_df) == len(df1)
    elif join_type == 'right':
        return len(joined_df) == len(df2)
    elif join_type == 'outer':
        return len(joined_df) >= max(len(df1), len(df2))
    return False

# Check if the inner join was correct
is_join_correct(df, df2, 'Full_Name', 'inner')