# Project 3 - Investigating Fine Particulate Matter (PM2.5) in New York City



## Introduction

Fine particulate matter (PM2.5) is a key air pollutant with serious health impacts, 
including cardiovascular and respiratory disease. Understanding how PM2.5 levels vary 
across time and space within New York City can inform environmental justice, public 
health, and regulatory decisions.

This project uses data from the NYC Environmental & Health Data Portal (Fine particles – PM2.5), 
which includes annual, summer, and winter mean PM2.5 concentrations by:

- Citywide
- Borough
- Community District
- UHF 42 (neighborhood groupings)

The dataset spans approximately 2009–2024.


## Prompt

- **Dataset(s) to be used:** https://a816-dohbesp.nyc.gov/IndicatorPublic/data-explorer/air-quality/?id=2023#display=summary
  - **Source:** NYC Environmental & Health Data Portal – “Fine particles (PM2.5)” CSV export.  
  - **Unit:** micrograms per cubic meter (µg/m³).  
  - **Geographies included:**
    - Citywide
    - Borough
    - Community District
    - UHF 42 (not used directly in this project)
  - The dataset has 1461 rows.
- **Analysis question:** 
  1. How have annual PM2.5 levels changed over time in New York City overall? 
  2. How do PM2.5 trends differ between boroughs and community districts? 
  3. Which boroughs and community districts experienced the largest absolute improvements in PM2.5 levels between the earliest and most recent years in the dataset?

- **Columns that will (likely) be used:**
  - `TimePeriod`: Calendar year (e.g., 2009–2024).  
  - `GeoTypeDesc`: Geographic aggregation level (Citywide, Borough, Community District, UHF 42).  
  - `GeoID`: ID for each geographic unit.  
  - `Borough`, `Geography`, `Area`: Names/labels for geographic units.  
  - `Annual mean mcg/m3`, `Summer mean mcg/m3`, `Winter mean mcg/m3`: PM2.5 concentrations.

- **Hypothesis**: 

  1. PM2.5 levels have decreased citywide over time, reflecting improved regulations, cleaner fuel, and other pollution-control measures.  
  2. All boroughs and community districts have seen declines, but the magnitude of improvement differs, with areas that started with higher PM2.5 levels experiencing larger absolute reductions.  
   
  However, I expect that disparities remain: some communities (especially dense or heavily trafficked areas) still have higher PM2.5 levels than others, even after overall improvements.

## Planned Analysis

1. Load and inspect the dataset.  
2. Clean and rename columns for easier use.  
3. Create separate DataFrames for:
   - Citywide
   - Borough
   - Community District
4. For **citywide**, use `melt` (reshaping) to compare annual, summer, and winter trends over time.  
5. For **boroughs**:
   - Compute annual trends over time.
   - Calculate absolute and percent change in annual PM2.5 from earliest to latest year for each borough.
6. For **community districts**:
   - Compute initial (earliest year) and final (latest year) annual PM2.5.
   - Calculate absolute and percent change in annual PM2.5.
   - Identify the highest PM2.5 districts in the earliest year and those with the largest improvements.
7. Perform at least one merge:
   - Merge borough-level data with citywide averages by year to see which boroughs are above/below citywide averages.


## Planned Visualizations

1. Line chart of citywide annual, summer, and winter PM2.5 over time.  
2. Line chart of annual PM2.5 trends by borough.  
3. Bar chart of borough-level absolute and percent changes from earliest to latest year.  
4. Bar charts for:
   - Community districts with the **highest PM2.5** in the earliest year.
   - Community districts with the **largest reductions** in PM2.5.

In [74]:
import pandas as pd
import plotly.express as px

import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"

## Step 1 – Load and Inspect the Dataset

- Load data and inspect the first few rows and basic structure of the dataset to understand its columns, data types, and size.

In [14]:
# Load the PM2.5 dataset
pm = pd.read_csv("NYC EH Data Portal - Fine particles (PM 2.5).csv")
pm.head()


Unnamed: 0,TimePeriod,GeoTypeDesc,GeoID,GeoRank,BoroID,Borough,Geography,Area,Annual mean mcg/m3,Summer mean mcg/m3,Winter mean mcg/m3
0,2024,Citywide,1,0,-,-,New York City,New York City,6.3,7.9,6.0
1,2024,Borough,3,1,3,Manhattan,Manhattan,Manhattan,7.7,9.6,7.2
2,2024,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,6.4,8.1,6.3
3,2024,Borough,1,1,1,Bronx,Bronx,Bronx,6.4,8.2,6.1
4,2024,Borough,5,1,5,Staten Island,Staten Island,Staten Island,5.9,7.3,5.9


In [9]:
pm.info()
pm.describe(include="all")
pm.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   TimePeriod          1460 non-null   int64  
 1   GeoTypeDesc         1460 non-null   object 
 2   GeoID               1460 non-null   int64  
 3   GeoRank             1460 non-null   int64  
 4   BoroID              1460 non-null   object 
 5   Borough             1460 non-null   object 
 6   Geography           1460 non-null   object 
 7   Area                1460 non-null   object 
 8   Annual mean mcg/m3  1460 non-null   float64
 9   Summer mean mcg/m3  1460 non-null   float64
 10  Winter mean mcg/m3  1460 non-null   float64
dtypes: float64(3), int64(3), object(5)
memory usage: 125.6+ KB


Index(['TimePeriod', 'GeoTypeDesc', 'GeoID', 'GeoRank', 'BoroID', 'Borough',
       'Geography', 'Area', 'Annual mean mcg/m3', 'Summer mean mcg/m3',
       'Winter mean mcg/m3'],
      dtype='object')

## Step 2 – Clean and Rename Columns

- Rename long column names.  
- Ensure PM2.5 value columns are stored as numeric types.  

In [37]:
# Rename columns
rename_map = {
    "TimePeriod": "year",
    "GeoTypeDesc": "geo_type",
    "GeoID": "geo_id",
    "GeoRank": "geo_rank",
    "BoroID": "boro_id",
    "Borough": "borough",
    "Geography": "geography",
    "Area": "area",
    "Annual mean mcg/m3": "annual_pm25",
    "Summer mean mcg/m3": "summer_pm25",
    "Winter mean mcg/m3": "winter_pm25"
}

pm.head()

Unnamed: 0,year,geo_type,geo_id,geo_rank,boro_id,borough,geography,area,annual_pm25,summer_pm25,winter_pm25
0,2024,Citywide,1,0,-,-,New York City,New York City,6.3,7.9,6.0
1,2024,Borough,3,1,3,Manhattan,Manhattan,Manhattan,7.7,9.6,7.2
2,2024,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,6.4,8.1,6.3
3,2024,Borough,1,1,1,Bronx,Bronx,Bronx,6.4,8.2,6.1
4,2024,Borough,5,1,5,Staten Island,Staten Island,Staten Island,5.9,7.3,5.9


In [20]:
# Ensure PM2.5 columns are numeric
value_cols = [c for c in ["annual_pm25", "summer_pm25", "winter_pm25"] if c in pm.columns]
pm[value_cols] = pm[value_cols].apply(pd.to_numeric, errors="coerce")

print(pm[value_cols].describe())

# Check geographic types and year range if year column exists
if "geo_type" in pm.columns:
    print("\nGeo types:", pm["geo_type"].value_counts())

if "year" in pm.columns:
    print("Year range:", pm["year"].min(), "to", pm["year"].max())

       annual_pm25  summer_pm25  winter_pm25
count  1460.000000  1460.000000  1460.000000
mean      7.925548     9.109589     8.805753
std       1.731963     1.659584     2.345762
min       5.000000     5.500000     5.000000
25%       6.500000     7.975000     7.000000
50%       7.500000     8.800000     8.200000
75%       9.100000    10.200000    10.200000
max      16.100000    16.100000    18.800000

Geo types: geo_type
Community District    944
UHF 42                420
Borough                80
Citywide               16
Name: count, dtype: int64
Year range: 2009 to 2024


## Step 3 – Create Separate DataFrames by Geography

I split the dataset into three main subsets:

- **Citywide**: A single time series for the entire city.  
- **Borough-level**: One row per borough per year.  
- **Community District-level**: One row per community district per year.

In [29]:
# Citywide data
citywide = pm[pm["geo_type"] == "Citywide"].copy()
citywide = citywide.sort_values("year")

# Borough-level data
borough_df = pm[pm["geo_type"] == "Borough"].copy()
borough_df = borough_df.sort_values("year")

# Community District-level data
community_df = pm[pm["geo_type"] == "Community District"].copy()
community_df = community_df.sort_values(["geo_id", "year"])

citywide.head(), borough_df.head(), community_df.head()

(      year  geo_type  geo_id  geo_rank boro_id borough      geography  \
 1395  2009  Citywide       1         0       -       -  New York City   
 1330  2010  Citywide       1         0       -       -  New York City   
 1265  2011  Citywide       1         0       -       -  New York City   
 1200  2012  Citywide       1         0       -       -  New York City   
 1135  2013  Citywide       1         0       -       -  New York City   
 
                area  annual_pm25  summer_pm25  winter_pm25  
 1395  New York City         10.4         10.7         12.9  
 1330  New York City          9.5         11.8          9.9  
 1265  New York City         10.1         11.5         12.6  
 1200  New York City          8.9         10.3          9.2  
 1135  New York City          8.6         10.2         10.3  ,
       year geo_type  geo_id  geo_rank boro_id        borough      geography  \
 1400  2009  Borough       5         1       5  Staten Island  Staten Island   
 1396  2009  Borough 

## Step 4 – Citywide PM2.5 Trends Over Time

Here I focus on how PM2.5 has changed for New York City overall.

Steps:

1. Reshape the citywide data to a long format using `melt`, with one row per year & measure (annual, summer, winter).  
2. Plot a line chart with separate lines for annual, summer, and winter mean PM2.5.

This shows whether PM2.5 is trending up or down overall and how seasons differ.


In [73]:
# Reshape citywide data to long format
citywide_long = citywide.melt(
        id_vars=["year"],
        value_vars=value_cols,
        var_name="measure",
        value_name="pm25"
)
display(citywide_long.head())
    

Unnamed: 0,year,measure,pm25
0,2009,annual_pm25,10.4
1,2010,annual_pm25,9.5
2,2011,annual_pm25,10.1
3,2012,annual_pm25,8.9
4,2013,annual_pm25,8.6


In [26]:
# Line plot of citywide annual, summer, and winter PM2.5 trends
fig = px.line(
        citywide_long,
        x="year",
        y="pm25",
        color="measure",
        markers=True,
        title="Citywide PM2.5 Trends (Annual, Summer, Winter)",
        labels={"year": "Year", "pm25": "PM2.5 (µg/m³)", "measure": "Measure"}
    )
fig.show()

### Interpretation – Citywide Trends


- Annual PM2.5 appears to decrease from around 10.4 µg/m³ in the earliest year to around 6,3 µg/m³ in the latest year.
- Summer levels are mostly higher than winter in the recent years, starting from year 2015.
- This chart is consistent with my hypothesis that PM2.5 is falling over time.


## Step 5 – Borough-Level PM2.5 Trends

Next, I compare PM2.5 trends across the five boroughs. This helps identify whether 
some boroughs have systematically higher or lower PM2.5 levels, and whether the 
rate of improvement differs across space.


In [30]:
# Quick check of borough names
if not borough_df.empty and "borough" in borough_df.columns:
    print(borough_df["borough"].unique())
else:
    print("No borough-level data available.")

['Staten Island' 'Manhattan' 'Queens' 'Brooklyn' 'Bronx']


In [31]:
# Line plot of annual PM2.5 by borough
fig = px.line(
        borough_df,
        x="year",
        y="annual_pm25",
        color="borough",
        markers=True,
        title="Annual PM2.5 by Borough",
        labels={"year": "Year", "annual_pm25": "Annual PM2.5 (µg/m³)", "borough": "Borough"}
    )
fig.show()

### Interpretation – Borough-Level Trends

- Annual PM2.5 shows a decreasing trend for all boroughs across years.
- Manhattan is systematically the most polluted borough, and Statan Island is the least polluted borough.

## Step 6 – Borough-Level Changes Over Time

To quantify how much PM2.5 has improved in each borough, I:

1. Identify the earliest and latest years in the borough-level data.  
2. Extract PM2.5 for each borough in these years.  
3. Merge the two tables (earliest & latest) on borough.  
4. Compute:
   - **Absolute change** = final − initial  
   - **Percent change** = (final − initial) / initial × 100  
5. Visualize the absolute change in a bar chart.

In [33]:
# Borough-level change from earliest to latest year
if not borough_df.empty and "annual_pm25" in borough_df.columns:
    min_year = borough_df["year"].min()
    max_year = borough_df["year"].max()
    print("Borough year range:", min_year, "to", max_year)

    # Subset to earliest and latest years
    start = (
        borough_df[borough_df["year"] == min_year]
        [["borough", "annual_pm25"]]
        .rename(columns={"annual_pm25": "annual_start"})
    )

    end = (
        borough_df[borough_df["year"] == max_year]
        [["borough", "annual_pm25"]]
        .rename(columns={"annual_pm25": "annual_end"})
    )

    # Merge start and end
    borough_change = start.merge(end, on="borough", how="inner")

    # Compute absolute and percent changes
    borough_change["abs_change"] = borough_change["annual_end"] - borough_change["annual_start"]
    borough_change["pct_change"] = borough_change["abs_change"] / borough_change["annual_start"] * 100

    display(borough_change)
else:
    borough_change = pd.DataFrame()
    print("No borough-level data for change calculation.")

Borough year range: 2009 to 2024


Unnamed: 0,borough,annual_start,annual_end,abs_change,pct_change
0,Staten Island,9.8,5.9,-3.9,-39.795918
1,Manhattan,12.6,7.7,-4.9,-38.888889
2,Queens,10.0,5.9,-4.1,-41.0
3,Brooklyn,10.5,6.4,-4.1,-39.047619
4,Bronx,11.0,6.4,-4.6,-41.818182


In [45]:
# Bar plot of borough changes (absolute change)
borough_change_sorted = borough_change.sort_values("abs_change")
fig = px.bar(
        borough_change_sorted,
        x="borough",
        y="abs_change",
        title=f"Absolute Change in Annual PM2.5 by Borough ({min_year} - {max_year})",
        labels={"borough": "Borough", "abs_change": "Change in PM2.5 (µg/m³)"}
    )
fig.add_hline(y=0)
fig.show()

borough_change_sorted

Unnamed: 0,borough,annual_start,annual_end,abs_change,pct_change
1,Manhattan,12.6,7.7,-4.9,-38.888889
4,Bronx,11.0,6.4,-4.6,-41.818182
2,Queens,10.0,5.9,-4.1,-41.0
3,Brooklyn,10.5,6.4,-4.1,-39.047619
0,Staten Island,9.8,5.9,-3.9,-39.795918


### Interpretation – Borough Differences

- All boroughs show negative absolute change across years, which means reductions in PM2.5.
- Borough Manhattan has the largest absolute reduction, while Borough Staten Island has the smallest.
- Percent changes are similar across boroughs, however Borough Bronx experienced the largest percent change instead of Manhattan.
- This supports the hypothesis that areas with higher initial levels might see larger improvements.


## Step 7 – Compare Boroughs to Citywide Averages

To see whether specific boroughs tend to be cleaner or dirtier than the city as a whole, I merge citywide annual PM2.5 with the borough-level table by year. Then I Compute the difference by:  
   - `diff_from_citywide = borough_annual_pm25 − citywide_annual_pm25`  
to see which boroughs tend to be above vs. below the citywide average.


In [47]:
# merge citywide annual PM2.5 with the borough-level table
citywide_annual = citywide[["year", "annual_pm25"]].rename(columns={"annual_pm25": "citywide_annual"})
borough_with_city = borough_df.merge(citywide_annual, on="year", how="left")
borough_with_city

Unnamed: 0,year,geo_type,geo_id,geo_rank,boro_id,borough,geography,area,annual_pm25,summer_pm25,winter_pm25,citywide_annual
0,2009,Borough,5,1,5,Staten Island,Staten Island,Staten Island,9.8,10.5,11.8,10.4
1,2009,Borough,3,1,3,Manhattan,Manhattan,Manhattan,12.6,12.6,15.3,10.4
2,2009,Borough,4,1,4,Queens,Queens,Queens,10.0,10.3,12.6,10.4
3,2009,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,10.5,10.9,12.9,10.4
4,2009,Borough,1,1,1,Bronx,Bronx,Bronx,11.0,10.7,14.1,10.4
...,...,...,...,...,...,...,...,...,...,...,...,...
75,2024,Borough,4,1,4,Queens,Queens,Queens,5.9,7.5,5.6,6.3
76,2024,Borough,5,1,5,Staten Island,Staten Island,Staten Island,5.9,7.3,5.9,6.3
77,2024,Borough,1,1,1,Bronx,Bronx,Bronx,6.4,8.2,6.1,6.3
78,2024,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,6.4,8.1,6.3,6.3


In [48]:
# Compute the difference
borough_with_city["diff_from_citywide"] = borough_with_city["annual_pm25"] - borough_with_city["citywide_annual"]
borough_with_city.head()

Unnamed: 0,year,geo_type,geo_id,geo_rank,boro_id,borough,geography,area,annual_pm25,summer_pm25,winter_pm25,citywide_annual,diff_from_citywide
0,2009,Borough,5,1,5,Staten Island,Staten Island,Staten Island,9.8,10.5,11.8,10.4,-0.6
1,2009,Borough,3,1,3,Manhattan,Manhattan,Manhattan,12.6,12.6,15.3,10.4,2.2
2,2009,Borough,4,1,4,Queens,Queens,Queens,10.0,10.3,12.6,10.4,-0.4
3,2009,Borough,2,1,2,Brooklyn,Brooklyn,Brooklyn,10.5,10.9,12.9,10.4,0.1
4,2009,Borough,1,1,1,Bronx,Bronx,Bronx,11.0,10.7,14.1,10.4,0.6


In [42]:
# Average difference by borough across all years
avg_diff_by_borough = (
    borough_with_city
    .groupby("borough", as_index=False)["diff_from_citywide"]
    .mean()
    .sort_values("diff_from_citywide", ascending=False)
)

fig = px.bar(
    avg_diff_by_borough,
    x="borough",
    y="diff_from_citywide",
    title="Average Difference from Citywide Annual PM2.5 by Borough",
    labels={"borough": "Borough", "diff_from_citywide": "Borough - Citywide PM2.5 (µg/m³)"}
)
fig.add_hline(y=0)
fig.show()

avg_diff_by_borough

Unnamed: 0,borough,diff_from_citywide
2,Manhattan,1.5625
0,Bronx,0.36875
1,Brooklyn,0.11875
3,Queens,-0.2125
4,Staten Island,-0.59375


### Interpretation
1. Manhattan has the highest PM2.5 levels relative to the citywide average—about +1.56 µg/m³, far above any other borough. The Bronx and Brooklyn are slightly above average, while Queens is slightly below. Staten Island is the cleanest borough, about –0.60 µg/m³ below the citywide mean.
2. Overall, Manhattan consistently experiences the worst PM2.5 air quality, while Staten Island experiences the best, comparing to city as a whole.

## Step 8 – Community District Baseline Analysis

To explore finer spatial disparities, I analyze PM2.5 at the Community District level.

In this step, I:
1. Extract PM2.5 levels for all community districts in the earliest year as baseline.  
2. Identify the top 10 community districts with the highest baseline PM2.5.  
3. Visualize them in a horizontal bar chart.

In [51]:
min_year_c = community_df["year"].min()
max_year_c = community_df["year"].max()

# Baseline year data
community_start = (
    community_df[community_df["year"] == min_year_c]
    [["geo_id", "geography", "borough", "annual_pm25"]]
    .rename(columns={"annual_pm25": "annual_start"})
)

# Top 10 highest PM2.5 community districts at baseline
top10_baseline = community_start.sort_values("annual_start", ascending=False).head(10)
top10_baseline

Unnamed: 0,geo_id,geography,borough,annual_start
1401,105,Midtown (CD5),Manhattan,16.1
1402,106,Stuyvesant Town and Turtle Bay (CD6),Manhattan,14.1
1405,104,Clinton and Chelsea (CD4),Manhattan,13.2
1403,101,Financial District (CD1),Manhattan,13.0
1406,108,Upper East Side (CD8),Manhattan,12.9
1408,102,Greenwich Village and Soho (CD2),Manhattan,12.8
1414,107,Upper West Side (CD7),Manhattan,12.2
1404,205,Fordham and University Heights (CD5),Bronx,12.1
1419,301,Greenpoint and Williamsburg (CD1),Brooklyn,12.0
1407,204,Highbridge and Concourse (CD4),Bronx,12.0


In [52]:
# Bar plot of top 10 highest PM2.5 community districts (baseline year)
fig = px.bar(
        top10_baseline.sort_values("annual_start"),
        x="annual_start",
        y="geography",
        orientation="h",
        title=f"Top 10 Community Districts by PM2.5 in {min_year_c}",
        labels={"annual_start": "Annual PM2.5 (µg/m³)", "geography": "Community District"}
    )
fig.show()

### Interpretation:
1. The community districts with the highest PM2.5 levels in 2009 are almost entirely located in Manhattan, with Midtown (CD5) at the top, followed closely by Stuyvesant Town/Turtle Bay (CD6), Clinton/Chelsea (CD4), and the Financial District (CD1). These areas consistently exceed 13–16 µg/m³, reflecting possible dense commercial activity and heavy traffic.
2. Only two non-Manhattan districts (Fordham/University Heights and Highbridge/Concourse in the Bronx, and Greenpoint/Williamsburg in Brooklyn) appear in the top 10, indicating more localized pollution hotspots.


## Step 9 – Community District Changes Over Time

To measure improvements at the community district level, I:

1. Extract annual PM2.5 for each district in the earliest and latest years.  
2. Merge these tables on `geo_id`.  
3. Compute:
   - **Absolute change** (`annual_end − annual_start`).  
   - **Percent change**.  
4. Identify the community districts with the largest reductions in PM2.5.  
5. Visualize these largest reductions in a bar chart.

In [68]:
# Community district change over time
community_end = (
        community_df[community_df["year"] == max_year_c]
        [["geo_id", "annual_pm25"]]
        .rename(columns={"annual_pm25": "annual_end"})
)

# Merge baseline and final
community_change = community_start.merge(community_end, on="geo_id", how="inner")
community_change.head()

Unnamed: 0,geo_id,geography,borough,annual_start,annual_end
0,101,Financial District (CD1),Manhattan,13.0,8.1
1,102,Greenwich Village and Soho (CD2),Manhattan,12.8,9.4
2,103,Lower East Side and Chinatown (CD3),Manhattan,11.8,8.1
3,104,Clinton and Chelsea (CD4),Manhattan,13.2,8.8
4,105,Midtown (CD5),Manhattan,16.1,10.5


In [69]:
# Compute absolute and percent changes
community_change["abs_change"] = community_change["annual_end"] - community_change["annual_start"]
community_change["pct_change"] = community_change["abs_change"] / community_change["annual_start"] * 100

community_change.head()

Unnamed: 0,geo_id,geography,borough,annual_start,annual_end,abs_change,pct_change
0,101,Financial District (CD1),Manhattan,13.0,8.1,-4.9,-37.692308
1,102,Greenwich Village and Soho (CD2),Manhattan,12.8,9.4,-3.4,-26.5625
2,103,Lower East Side and Chinatown (CD3),Manhattan,11.8,8.1,-3.7,-31.355932
3,104,Clinton and Chelsea (CD4),Manhattan,13.2,8.8,-4.4,-33.333333
4,105,Midtown (CD5),Manhattan,16.1,10.5,-5.6,-34.782609


In [71]:
# Find the largest reductions in PM2.5 by community district
largest_reductions = community_change.sort_values("abs_change").head(10)
largest_reductions


Unnamed: 0,geo_id,geography,borough,annual_start,annual_end,abs_change,pct_change
7,108,Upper East Side (CD8),Manhattan,12.9,7.1,-5.8,-44.96124
5,106,Stuyvesant Town and Turtle Bay (CD6),Manhattan,14.1,8.4,-5.7,-40.425532
4,105,Midtown (CD5),Manhattan,16.1,10.5,-5.6,-34.782609
16,205,Fordham and University Heights (CD5),Bronx,12.1,6.5,-5.6,-46.280992
15,204,Highbridge and Concourse (CD4),Bronx,12.0,6.5,-5.5,-45.833333
6,107,Upper West Side (CD7),Manhattan,12.2,6.8,-5.4,-44.262295
12,201,Mott Haven and Melrose (CD1),Bronx,11.9,6.8,-5.1,-42.857143
18,207,Kingsbridge Heights and Bedford (CD7),Bronx,11.5,6.5,-5.0,-43.478261
11,112,Washington Heights and Inwood (CD12),Manhattan,11.6,6.6,-5.0,-43.103448
17,206,Belmont and East Tremont (CD6),Bronx,11.5,6.6,-4.9,-42.608696


In [60]:
# Show bar chart
fig = px.bar(
        largest_reductions.sort_values("abs_change"),
        x="abs_change",
        y="geography",
        orientation="h",
        title=f"Largest Reductions in PM2.5 by Community District ({min_year_c} - {max_year_c})",
        labels={"abs_change": "Change in PM2.5 (µg/m³)", "geography": "Community District"}
)
fig.show()

### Interpretation:
1. The community districts with the largest reductions in PM2.5 from 2009 to 2024 include a mix of Bronx neighborhoods (such as Belmont/East Tremont) and several Manhattan districts (including Midtown and Upper East Side).
2. All of these districts reduced PM2.5 by roughly 4.9 to 6 µg/m³, indicating substantial air-quality improvements over the period.

## Conclusions

The analysis shows that PM2.5 levels in New York City have declined substantially from 2009 to 2024 across every level of geography—citywide, borough, and community district. This overall downward trend aligns with expectations based on stricter air-quality regulations and reductions in traffic emissions.

At the borough level, Manhattan consistently recorded the highest PM2.5 concentrations, averaging more than 1.5 µg/m³ above the citywide mean. The Bronx and Brooklyn also trended slightly above average, while Queens was slightly below. Staten Island had the lowest PM2.5 levels, consistently falling well under the citywide mean. These patterns highlight persistent spatial differences in exposure related to density, traffic intensity, land use, and built environment characteristics.

At the community district level, the areas with the highest pollution in 2009 were overwhelmingly located in Manhattan. Several Bronx and Brooklyn districts also appeared among the most polluted. This suggests that early-period PM2.5 was heavily concentrated in commercial, highly trafficked parts of Manhattan, as well as select high-density areas in other boroughs.

However, the largest reductions in PM2.5 occurred in a mix of Bronx and Manhattan neighborhoods. Districts such as Upper East Side saw decreases of nearly 6 µg/m³, reflecting meaningful improvements in air quality in communities that historically faced higher environmental burdens. 

The results support the hypothesis that:
1. PM2.5 has declined across NYC, and
2. areas that started with higher levels often saw larger absolute reductions, though not enough to eliminate all spatial inequality.

These patterns could suggest policy implications such as continued enforcement of clean-fuel regulations, targeted air-quality monitoring in dense commercial corridors, and equity-focused pollution mitigation to ensure cleaner air for all New Yorkers. For future research, a valuable next step would be to examine the relationship between air pollution and vehicle ownership/traffic intensity. As I'm doing an analysis on congestion pricing policy in NYC for another class, it would be interesting to see how these all related.
