# Age Distribution Analysis

Compares median age and age index across Toronto neighbourhoods.

## 1. Data Reference

### Source Tables

| Table | Grain | Key Columns |
|-------|-------|-------------|
| `mart_neighbourhood_demographics` | neighbourhood Ã— year | median_age, age_index, city_avg_age |

### SQL Query

In [1]:
import os

import pandas as pd
from dotenv import load_dotenv
from sqlalchemy import create_engine

# Load .env from project root
load_dotenv("../../.env")

engine = create_engine(os.environ["DATABASE_URL"])

query = """
SELECT
    neighbourhood_name,
    median_age,
    age_index,
    city_avg_age,
    population,
    income_quintile,
    pct_renter_occupied
FROM mart_toronto.mart_neighbourhood_demographics
WHERE year = (SELECT MAX(year) FROM mart_toronto.mart_neighbourhood_demographics)
  AND median_age IS NOT NULL
ORDER BY median_age DESC
"""

df = pd.read_sql(query, engine)
print(f"Loaded {len(df)} neighbourhoods with age data")

Loaded 158 neighbourhoods with age data


### Transformation Steps

1. Filter to most recent census year
2. Calculate deviation from city average
3. Classify as younger/older than average

In [2]:
city_avg = df["city_avg_age"].iloc[0]
df["age_category"] = df["median_age"].apply(
    lambda x: "Younger" if x < city_avg else "Older"
)
df["age_deviation"] = df["median_age"] - city_avg

data = df.to_dict("records")

### Sample Output

In [3]:
print(f"City Average Age: {city_avg:.1f}")
print("\nYoungest Neighbourhoods:")
display(
    df.tail(5)[["neighbourhood_name", "median_age", "age_index", "pct_renter_occupied"]]
)
print("\nOldest Neighbourhoods:")
display(
    df.head(5)[["neighbourhood_name", "median_age", "age_index", "pct_renter_occupied"]]
)

City Average Age: 40.5

Youngest Neighbourhoods:


Unnamed: 0,neighbourhood_name,median_age,age_index,pct_renter_occupied
153,Fort York-Liberty Village,32.4,79.9,53.4
154,Harbourfront-CityPlace,32.0,78.9,60.0
155,Yonge-Bay Corridor,31.0,76.5,67.7
156,Wellington Place,30.8,76.0,61.6
157,Bay-Cloverhill,29.8,73.5,69.7



Oldest Neighbourhoods:


Unnamed: 0,neighbourhood_name,median_age,age_index,pct_renter_occupied
0,Markland Wood,50.8,125.3,26.7
1,Rosedale-Moore Park,50.0,123.4,42.4
2,Bayview Woods-Steeles,50.0,123.4,44.3
3,Hillcrest Village,49.6,122.4,27.5
4,Bridle Path-Sunnybrook-York Mills,49.2,121.4,11.5


## 2. Data Visualization

### Figure Factory

Uses `create_ranking_bar` from `portfolio_app.figures.toronto.bar_charts`.

In [4]:
import sys

sys.path.insert(0, "../..")

from portfolio_app.figures.toronto.bar_charts import create_ranking_bar

fig = create_ranking_bar(
    data=data,
    name_column="neighbourhood_name",
    value_column="median_age",
    title="Youngest & Oldest Neighbourhoods (Median Age)",
    top_n=10,
    bottom_n=10,
    color_top="#FF9800",  # Orange for older
    color_bottom="#2196F3",  # Blue for younger
    value_format=".1f",
)

fig.show()

### Age vs Income Correlation

In [5]:
# Age by income quintile
print("Median Age by Income Quintile:")
df.groupby("income_quintile")["median_age"].mean().round(1)

Median Age by Income Quintile:


income_quintile
1    37.8
2    40.0
3    40.1
4    40.7
5    44.2
Name: median_age, dtype: float64