# Project 3: Air Quality Analysis in NYC (2023)

Air pollution—especially PM2.5—is one of the most harmful forms of urban pollution because it can penetrate deep into human lungs.  
New York City’s boroughs differ in population density, land use, and traffic volume, which may result in uneven exposure to PM2.5.

This project analyzes **2023 PM2.5 concentration data** to understand how pollution levels vary across NYC's five boroughs.




## Research Question

How do average PM2.5 concentrations vary across NYC boroughs in 2023, and what might explain these differences?

In [32]:
import pandas as pd
import plotly.express as px

In [33]:
nyc= pd.read_csv("/Users/minxunxie/Desktop/mx2279-ux.github.io/annual_conc_by_monitor_2023.csv")
nyc.head()

Unnamed: 0,State Code,County Code,Site Num,Parameter Code,POC,Latitude,Longitude,Datum,Parameter Name,Sample Duration,...,75th Percentile,50th Percentile,10th Percentile,Local Site Name,Address,State Name,County Name,City Name,CBSA Name,Date of Last Change
0,1,3,10,44201,1,30.497478,-87.880258,NAD83,Ozone,1 HOUR,...,0.055,0.049,0.033,"FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, 1 PIRATE DRIVE, FAIRHOPE...",Alabama,Baldwin,Fairhope,"Daphne-Fairhope-Foley, AL",5/24/2024
1,1,3,10,44201,1,30.497478,-87.880258,NAD83,Ozone,8-HR RUN AVG BEGIN HOUR,...,0.051,0.044,0.028,"FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, 1 PIRATE DRIVE, FAIRHOPE...",Alabama,Baldwin,Fairhope,"Daphne-Fairhope-Foley, AL",5/24/2024
2,1,3,10,44201,1,30.497478,-87.880258,NAD83,Ozone,8-HR RUN AVG BEGIN HOUR,...,0.051,0.044,0.028,"FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, 1 PIRATE DRIVE, FAIRHOPE...",Alabama,Baldwin,Fairhope,"Daphne-Fairhope-Foley, AL",5/24/2024
3,1,3,10,44201,1,30.497478,-87.880258,NAD83,Ozone,8-HR RUN AVG BEGIN HOUR,...,0.051,0.045,0.028,"FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, 1 PIRATE DRIVE, FAIRHOPE...",Alabama,Baldwin,Fairhope,"Daphne-Fairhope-Foley, AL",5/24/2024
4,1,3,10,88101,3,30.497478,-87.880258,NAD83,PM2.5 - Local Conditions,1 HOUR,...,10.6,7.0,1.4,"FAIRHOPE, Alabama","FAIRHOPE HIGH SCHOOL, 1 PIRATE DRIVE, FAIRHOPE...",Alabama,Baldwin,Fairhope,"Daphne-Fairhope-Foley, AL",8/6/2024


In [34]:
nyc = df[
    (df["State Code"] == 36) &
    (df["County Code"].isin([5, 47, 61, 81, 85])) &
    (df["Parameter Code"] == 88101)  # PM2.5
]

## Data Cleaning

Although the dataset is structured, basic cleaning ensures the analysis is consistent and reliable.

Cleaning steps:
- Select relevant columns  
- Drop missing values  
- Remove duplicates  

In [35]:
nyc = nyc[["State Code", "County Code", "Site Num", "Arithmetic Mean"]]

nyc = nyc.dropna(subset=["Arithmetic Mean"])
nyc = nyc.drop_duplicates()

nyc.head()

Unnamed: 0,State Code,County Code,Site Num,Arithmetic Mean
49798,36,5,110,8.607438
49806,36,5,110,10.015433
49807,36,5,110,8.403875
49809,36,5,110,8.351515
49810,36,5,110,9.992244


In [36]:
nyc = nyc[nyc["County Code"].isin([5, 47, 61, 81, 85])]
nyc.shape

(27, 4)

In [37]:
mean_pm25 = nyc.groupby("County Code")["Arithmetic Mean"].mean().reset_index()

borough_map = {
    5: "Bronx",
    47: "Brooklyn",
    61: "Manhattan",
    81: "Queens",
    85: "Staten Island"
}

mean_pm25["Borough"] = mean_pm25["County Code"].map(borough_map)
mean_pm25


Unnamed: 0,County Code,Arithmetic Mean,Borough
0,5,9.071013,Bronx
1,47,9.19206,Brooklyn
2,61,9.647312,Manhattan
3,81,9.834885,Queens
4,85,9.528814,Staten Island


In [38]:
fig = px.bar(
    mean_pm25,
    x="Borough",
    y="Arithmetic Mean",
    title="Average PM2.5 Levels by Borough (NYC, 2023)",
    text="Arithmetic Mean",
)

fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.update_layout(yaxis_title="PM2.5 (µg/m³)", xaxis_title="Borough")

fig.show()

## Interpretation: Borough-level Differences

Manhattan has the highest average PM2.5 level among the boroughs, which may reflect its high traffic volume, dense commercial activity, and tourism.

Bronx shows the lowest values, while Brooklyn, Queens, and Staten Island fall in the middle.
Even though differences appear small, PM2.5 changes of 1–2 µg/m³ are associated with measurable health impacts.


In [39]:
fig = px.box(
    nyc,
    x="County Code",
    y="Arithmetic Mean",
    title="Distribution of PM2.5 Levels by Borough (2023)",
)

fig.update_layout(
    xaxis_title="County Code",
    yaxis_title="PM2.5 (µg/m³)"
)

fig.show()

## Variation Within Boroughs

Although the borough averages are close, their distributions differ:

- Manhattan and Queens show wider variation, indicating more pollution hotspots.
- Bronx has a narrower spread, suggesting more consistent air quality.
- Staten Island and Brooklyn fall in between.

This highlights the importance of examining both means and distributions, not just a single measure.


## Conclusion

Based on the 2023 EPA air quality monitoring data for New York City, PM2.5 pollution levels show meaningful variation across the five boroughs. Manhattan and Queens exhibit the widest distributions, indicating more variability in air quality throughout the year. Staten Island also demonstrates relatively high median values, while the Bronx consistently shows lower and more stable PM2.5 levels.

The fact that Manhattan—despite its smaller geographic size—has both high averages and wide variability suggests that traffic density, commercial activity, and building emissions may be driving exposure. In contrast, the Bronx’s narrower distribution may reflect more uniform residential patterns and fewer major emission sources, though this does not necessarily imply better overall environmental health outcomes for its residents.

From a public-health perspective, this analysis highlights that exposure to fine particulate matter is not evenly distributed across the city. Boroughs with greater variability or higher baselines may face increased risks related to asthma, cardiovascular disease, and other pollution-linked conditions.

This project also illustrates how open environmental datasets can reveal meaningful spatial differences using relatively simple data cleaning and visualization techniques. While the dataset is limited to monitored locations and does not capture all pollution sources or micro-neighborhood variation, it provides a valuable starting point for understanding urban environmental inequality.