# Week 4 Worksheet: Personal Income Analysis
## Nested Loops, Lists of Lists, and Nested Functions

**Learning Objectives:**
- Work with CSV data as lists of lists
- Use nested loops to filter and process data
- Create nested functions for data visualization
- Compare income across age groups and gender

**Time Estimate:** 60-75 minutes

---

## Background

You've been given a dataset (`personal_income.csv`) containing income information for different age groups and genders in Denmark from 2014-2023. Each row contains:
- Age group (e.g., "20-24 years")
- Gender ("men" or "women")
- Year (2014-2023)
- Income (in DKK)

Your task is to analyze this data to understand income trends and the gender pay gap across different age groups.

---
## Exercise 1: Reading CSV into a List of Lists

**Goal:** Read the CSV file and store it as a list of lists, where each inner list represents one row.

**Instructions:**
1. Import the `csv` module
2. Create an empty list called `data`
3. Open the file using `with open()` and `csv.reader()`
4. Skip the header row using `next(csv_reader)`
5. Loop through each row and append it to `data`
6. Print the first 5 rows to verify

**Expected output format:**
```
First 5 rows:
['15-19 years', 'men', '2014', '28000']
['15-19 years', 'men', '2015', '27900']
...

Total rows loaded: 280
```

In [None]:
# TODO: Import the csv module
import csv

# TODO: Create an empty list to store the data
data = []

# TODO: Open the CSV file
# Hint: Use with open('personal_income.csv', 'r', encoding='utf-8') as file:
with open('___', 'r', encoding='utf-8-sig') as file:
    # TODO: Create a csv reader - hint you need a file
    csv_reader = csv.reader(___)
    
    # TODO: Skip the header row
    # Hint: Use next(csv_reader)
    header = ___(csv_reader)
    
    # TODO: Loop through remaining rows and append to data
    for row in csv_reader:
        data.___(___)  # Append each row to data list

# TODO: Print first 5 rows to verify
print("First 5 rows:")
for i in range(___):
    print(data[i])

# TODO: Print total number of rows
print(f"\nTotal rows loaded: {len(___)}")

**âœ“ Check Your Understanding:**
- What is `data[0][2]`? (Answer: the year from the first row)
- How would you access the income from the third row? (Answer: `data[2][3]`)
- Why do we skip the header row?

---
## Exercise 2: Filtering Data with Nested Conditions

**Goal:** Filter the data to extract income for men and women in the "20-24 years" age group.

**Instructions:**
1. Create 4 empty lists: `years_men`, `income_men`, `years_women`, `income_women`
2. Set `target_age = "20-24 years"`
3. Loop through each row in `data`
4. For each row:
   - Extract age, gender, year, and income from the row
   - Check if age matches `target_age`
   - If it matches, convert income to thousands (divide by 1000)
   - Based on gender, append to the appropriate lists
5. Calculate the average pay gap percentage

**Expected output:**
```
Age Group: 20-24 years

Years: ['2014', '2015', ..., '2023']

Men's income (thousands DKK): [128.7, 131.8, ..., 170.7]

Women's income (thousands DKK): [100.2, 102.3, ..., 132.2]

Average pay gap: 22.5%
```

In [None]:
# TODO: Create empty lists for filtered data
years_men = []
income_men = []
years_women = []
income_women = []

# TODO: Set the target age group
target_age = "___"  # Fill in: "20-24 years"

# TODO: Loop through data and filter
for row in ___:
    # TODO: Extract values from row
    # Hint: age is row[0], gender is row[1], year is row[2], income is row[3]
    age = row[___]
    gender = row[___]
    year = row[___]
    income = row[___]
    
    # TODO: Check if age matches target_age
    if age == ___:
        # TODO: Convert income to integer and divide by 1000
        # Hint: income_thousands = int(income) / 1000
        income_thousands = int(___) / ___
        
        # TODO: Check gender and append to appropriate lists
        if gender == "men":
            years_men.___(year)
            income_men.___(income_thousands)
        elif gender == "___":  # Fill in: "women"
            years_women.___(___)
            income_women.___(___)

# TODO: Print results
print(f"Age Group: {target_age}")
print(f"\nYears: {years_men}")
print(f"\nMen's income (thousands DKK): {income_men}")
print(f"\nWomen's income (thousands DKK): {income_women}")

# TODO: Calculate average pay gap
# Formula: gap = (men's income - women's income) / men's income * 100
total_gap = 0
for i in range(len(income_men)):
    gap = (income_men[i] - income_women[i]) / income_men[i] * 100
    total_gap = total_gap + gap

avg_gap = total_gap / len(income_men)
print(f"\nAverage pay gap: {avg_gap:.1f}%")

**ðŸ’¡ Understanding Pay Gap:**
- A 22.5% pay gap means women earn 22.5% less than men
- If men earn 100k DKK, women earn about 77.5k DKK

**âœ“ Check Your Understanding:**
- Why do we need separate lists for men and women?
- What does `income_men[i] - income_women[i]` represent?
- How would you change the code to analyze a different age group?

---
## Exercise 3: Creating a Reusable Filter Function

**Goal:** Turn your filtering code into a reusable function.

**Instructions:**
1. Define a function `filter_by_age(data, age_group)`
2. Move your filtering logic from Exercise 2 into this function
3. Return 4 lists: `years_men, income_men, years_women, income_women`
4. Test it with 3 different age groups

**Why make it a function?**
- Reuse the same logic for different age groups
- Make code more organized and readable
- Easier to debug and modify

In [None]:
# TODO: Define the function
def filter_by_age(data, age_group):
    """
    Filter income data by age group.
    
    Args:
        data: List of lists containing income data
        age_group: String like "20-24 years"
        
    Returns:
        years_men, income_men, years_women, income_women (all lists)
    """
    # TODO: Initialize empty lists
    years_men = []
    income_men = []
    years_women = []
    income_women = []
    
    # TODO: Loop through data and filter
    # Hint: Copy your code from Exercise 2, but use the age_group parameter
    for row in ___:
        age = row[0]
        gender = row[1]
        year = row[2]
        income = int(row[3]) / 1000  # Convert to thousands
        
        # TODO: Filter by age group
        if age == ___:  # Use the parameter, not a fixed value!
            if gender == "men":
                years_men.___(___)
                income_men.___(___)
            elif gender == "women":
                years_women.___(___)
                income_women.___(___)
    
    # TODO: Return the four lists
    return ___, ___, ___, ___

# TODO: Test the function with different age groups
test_ages = ["20-24 years", "35-39 years", "55-59 years"]

for age in test_ages:
    # TODO: Call the function and unpack the returned values
    years_men, income_men, years_women, income_women = filter_by_age(___, ___)
    
    # Print the most recent year (2023) income for each group
    print(f"{age}:")
    print(f"  Men: {income_men[-1]:.0f}k DKK")  # [-1] gets last element
    print(f"  Women: {income_women[-1]:.0f}k DKK")
    print()

**âœ“ Check Your Understanding:**
- What does `income_men[-1]` do? (Answer: gets the last element)
- Why return 4 separate lists instead of one big list?
- How would you modify the function to also return the average income?

---
## Exercise 4: Visualizing with Nested Functions

**Goal:** Create a function that uses a nested function to plot income trends.

**Instructions:**
1. Import matplotlib
2. Create `plot_income_comparison(data, age_groups)` function
3. Inside it, define a nested function `plot_age_group(age)`
4. The nested function should:
   - Call `filter_by_age()` to get the data
   - Plot men's income and women's income
5. Loop through age groups and call the nested function

**Why use a nested function?**
- The plotting logic is only used within this function
- It has access to the outer function's variables
- Keeps code organized

In [None]:
# TODO: Import matplotlib
import matplotlib.pyplot as plt

def plot_income_comparison(data, age_groups):
    """
    Plot income comparison for men and women across age groups.
    
    Uses a nested function to handle the plotting for each age group.
    """
    # TODO: Create a figure with subplots
    # Hint: plt.subplots(1, 3, figsize=(15, 5))
    fig, axes = plt.subplots(___, ___, figsize=(15, 5))
    
    # TODO: Define nested function to plot one age group
    def plot_age_group(age, ax):
        """Nested function to plot data for one age group."""
        # TODO: Get filtered data using filter_by_age
        years_men, income_men, years_women, income_women = filter_by_age(___, ___)
        
        # TODO: Plot men's income (blue line with circles)
        # Hint: ax.plot(years, income, 'o-', label='Men', color='steelblue')
        ax.plot(years_men, ___, '___', label='___', color='steelblue', linewidth=2)
        
        # TODO: Plot women's income (coral line with circles)
        ax.plot(___, income_women, '___', label='___', color='coral', linewidth=2)
        
        # TODO: Set title and labels
        ax.set_title(f'Age Group: {age}', fontsize=12, fontweight='bold')
        ax.set_xlabel('___', fontsize=10)  # Fill in: 'Year'
        ax.set_ylabel('___', fontsize=10)  # Fill in: 'Income (thousands DKK)'
        
        # Add legend and grid
        ax.legend()
        ax.grid(True, alpha=0.3)
        ax.tick_params(axis='x', rotation=45)
    
    # TODO: Loop through age groups and call nested function
    for i, age in enumerate(age_groups):
        plot_age_group(___, axes[i])
    
    plt.tight_layout()
    plt.show()

# TODO: Test the function
age_groups_to_plot = ["20-24 years", "35-39 years", "55-59 years"]
plot_income_comparison(___, ___)

**âœ“ Check Your Understanding:**
- Why is `plot_age_group` nested inside `plot_income_comparison`?
- What does `enumerate()` do in the loop?
- How does the nested function access the `data` variable?

---
## Exercise 5: Calculate and Plot Pay Gap Trends

**Goal:** Create a function that calculates the pay gap for each year and plots the trends.

**Instructions:**
1. Create `calculate_pay_gap(data, age_groups)` function
2. For each age group:
   - Get filtered data
   - Calculate gap percentage for each year
   - Store in a dictionary
3. Plot the pay gap trends
4. Print summary statistics

In [None]:
def calculate_pay_gap(data, age_groups):
    """
    Calculate gender pay gap for multiple age groups.
    
    Returns:
        years: List of years
        gaps: Dictionary with age groups as keys and gap percentages as values
    """
    # TODO: Initialize empty dictionary for gaps
    gaps = {}
    years = None
    
    # TODO: Calculate gap for each age group
    for age in ___:
        # TODO: Get filtered data
        years_men, income_men, years_women, income_women = filter_by_age(___, ___)
        
        # Save years (same for all age groups)
        if years is None:
            years = years_men
        
        # TODO: Calculate gap percentage for each year
        gap_percentages = []
        for i in range(len(income_men)):
            # Formula: (men's income - women's income) / men's income * 100
            gap = (___ - ___) / ___ * 100
            gap_percentages.append(gap)
        
        # TODO: Store in dictionary
        gaps[___] = ___
    
    return years, gaps

# TODO: Calculate gaps
age_groups = ["20-24 years", "35-39 years", "55-59 years"]
years, gaps = calculate_pay_gap(___, ___)

# TODO: Plot the pay gaps
plt.figure(figsize=(12, 6))

# Different colors and markers for each age group
colors = ['steelblue', 'coral', 'green']
markers = ['o', 's', '^']  # circle, square, triangle

# TODO: Plot each age group
for i, age in enumerate(age_groups):
    plt.plot(years, gaps[age], f'{markers[i]}-', 
            label=age, color=colors[i], linewidth=2.5, markersize=7)

# TODO: Add labels and title
plt.xlabel('___', fontsize=12)
plt.ylabel('___', fontsize=12)
plt.title('Gender Pay Gap Over Time by Age Group\n(Higher values = larger gap)', 
         fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# TODO: Print summary statistics
print("\n" + "="*60)
print("Gender Pay Gap Summary")
print("="*60)

for age in age_groups:
    gap_2014 = gaps[age][0]   # First year
    gap_2023 = gaps[age][-1]  # Last year
    change = gap_2023 - gap_2014
    
    print(f"\n{age}:")
    print(f"  2014: {gap_2014:.1f}%")
    print(f"  2023: {gap_2023:.1f}%")
    print(f"  Change: {change:+.1f} percentage points")
    
    # TODO: Determine if gap improved or worsened
    if change < 0:
        print(f"  Status: âœ“ Gap decreased (improved)")
    else:
        print(f"  Status: âœ— Gap increased (worsened)")

print("\n" + "="*60)

**ðŸ“Š Interpret the Results:**
- Which age group has the largest pay gap?
- Has the gap improved or worsened over time?
- Why might older age groups have larger gaps?

---
## Bonus Challenge: Income Growth Analysis (Optional)

**Goal:** Calculate how much income has grown from 2014 to 2023 for each age group and gender.

**Instructions:**
1. Create `analyze_income_growth(data, age_groups)` function
2. For each age group:
   - Calculate percentage growth for men
   - Calculate percentage growth for women
   - Compare who had faster growth
3. Print formatted results

**Formula:** `growth% = (final - initial) / initial * 100`

In [None]:
def analyze_income_growth(data, age_groups):
    """
    Analyze income growth from 2014 to 2023 for different age groups.
    """
    print("\n" + "="*70)
    print("Income Growth Analysis (2014-2023)")
    print("="*70)
    
    for age in age_groups:
        # TODO: Get filtered data (use short variable names)
        y_m, i_m, y_w, i_w = filter_by_age(___, ___)
        
        # TODO: Calculate growth percentage
        # Formula: (final - initial) / initial * 100
        men_growth = (i_m[___] - i_m[___]) / i_m[___] * 100
        women_growth = (i_w[___] - i_w[___]) / i_w[___] * 100
        
        # TODO: Print formatted results
        print(f"\n{age}:")
        print(f"  Men:")
        print(f"    2014: {i_m[0]:.0f}k DKK")
        print(f"    2023: {i_m[-1]:.0f}k DKK")
        print(f"    Growth: {men_growth:.1f}%")
        
        print(f"  Women:")
        print(f"    2014: {i_w[0]:.0f}k DKK")
        print(f"    2023: {i_w[-1]:.0f}k DKK")
        print(f"    Growth: {women_growth:.1f}%")
        
        # TODO: Compare growth rates
        if women_growth > men_growth:
            print(f"  â†’ Women's income grew faster (+{women_growth - men_growth:.1f}pp)")
        else:
            print(f"  â†’ Men's income grew faster (+{men_growth - women_growth:.1f}pp)")
    
    print("\n" + "="*70)

# TODO: Run the analysis
analyze_income_growth(___, ___)

---
## Reflection Questions

Answer these questions to solidify your understanding:

**1. Nested Loops:**
- Where did we use nested loops or nested conditions in this worksheet?
- Why were they necessary?
- Could you use list comprehension instead? When would that be better?

**2. Nested Functions:**
- What was the purpose of the nested function in Exercise 4?
- What variables from the outer function did it access?
- When should you use a nested function vs. a separate function?

**3. Data Insights:**
- Which age group has the highest income?
- How does the pay gap change with age?
- Has the pay gap improved or worsened from 2014 to 2023?
- What surprised you most about the data?

**4. List of Lists vs Dictionary:**
- What are the advantages of using a list of lists for this data?
- What are the disadvantages?
- When would a dictionary be better?

---
## Key Takeaways

**Lists of Lists:** Good for CSV data, easy to loop through

**Nested Conditions:** Check multiple criteria (age AND gender)

**Functions:** Make code reusable and organized

**Nested Functions:** Keep helper functions close to where they're used

**Data Visualization:** Plots reveal patterns you can't see in numbers

**Real-World Data:** Programming skills help understand social issues

---

## Next Steps

1. **Try different age groups:** Modify the code to analyze other age ranges
2. **Add more visualizations:** Create bar charts, scatter plots, etc.
3. **Calculate more statistics:** Median, standard deviation, correlation
4. **Compare with other countries:** Find similar datasets for comparison

**Great job! You've completed a real data analysis project!**