# Do Higher NYC Co-op Values Come with Cleaner Air?

## Introduction

New York City is famous for both its expensive housing and its environmental challenges.  
In this project, I combine two NYC datasets:

- A Department of Finance dataset on **cooperative (“co-op”) apartment buildings**, including their **market value per square foot**.
- An NYC air quality dataset measuring **fine particulate matter (PM2.5)** across boroughs.

My goal is to see whether boroughs with more expensive co-op buildings tend to have **better** air quality (lower PM2.5) or **worse** air quality (higher PM2.5).

I focus on data from **2019** to line up the time period across both datasets and keep things simple.


## Data Sources

- **NYC Air Quality (PM2.5)**  
https://data.cityofnewyork.us/Environment/Air-Quality/c3uy-2p5r/about_data
  Source: NYC Open Data – “NYC Air Quality”  
  (Include the URL from NYC Open Data here.)

- **DOF: Cooperative Comparable Rental Income (Citywide)**  
https://data.cityofnewyork.us/City-Government/DOF-Cooperative-Comparable-Rental-Income-Citywide-/myei-c3fa/about_data
  Source: NYC Open Data – “DOF: Cooperative Comparable Rental Income (Citywide)”  
  (Include the URL from NYC Open Data here.)

Both datasets were downloaded as CSV files and then loaded into Python using `pandas`.


In [34]:
import pandas as pd

# Read in the CSVs (make sure the filenames match what is in your repo)
air = pd.read_csv("NYC_Air_Quality.csv")
rent = pd.read_csv("dof_coop_rent.csv")

air.head(), rent.head()



Columns (18,33,48,54) have mixed types. Specify dtype option on import or set low_memory=False.



(   Unique ID  Indicator ID                     Name Measure Measure Info  \
 0     878218           386               Ozone (O3)    Mean          ppb   
 1     876975           375   Nitrogen dioxide (NO2)    Mean          ppb   
 2     876900           375   Nitrogen dioxide (NO2)    Mean          ppb   
 3     877140           375   Nitrogen dioxide (NO2)    Mean          ppb   
 4     874556           365  Fine particles (PM 2.5)    Mean       mcg/m3   
 
   Geo Type Name  Geo Join ID                        Geo Place Name  \
 0         UHF42          402                           West Queens   
 1         UHF42          501                         Port Richmond   
 2         UHF42          207              East Flatbush - Flatbush   
 3            CD          205  Fordham and University Heights (CD5)   
 4         UHF34          410                             Rockaways   
 
    Time Period  Start_Date  Data Value  Message  
 0  Summer 2023  06/01/2023   34.365989      NaN  
 1  Su

In [35]:
# Keep only borough-level rows
air_borough = air[air["Geo Type Name"] == "Borough"].copy()

# Focus on PM2.5
air_pm = air_borough[air_borough["Name"] == "Fine particles (PM 2.5)"].copy()

# Pick 2019 annual average
air_pm_2019 = air_pm[air_pm["Time Period"] == "Annual Average 2019"].copy()

air_pm_2019


Unnamed: 0,Unique ID,Indicator ID,Name,Measure,Measure Info,Geo Type Name,Geo Join ID,Geo Place Name,Time Period,Start_Date,Data Value,Message
4380,649660,365,Fine particles (PM 2.5),Mean,mcg/m3,Borough,5,Staten Island,Annual Average 2019,01/01/2019,5.92,
4827,649648,365,Fine particles (PM 2.5),Mean,mcg/m3,Borough,1,Bronx,Annual Average 2019,01/01/2019,6.91,
4883,649657,365,Fine particles (PM 2.5),Mean,mcg/m3,Borough,4,Queens,Annual Average 2019,01/01/2019,6.47,
4891,649651,365,Fine particles (PM 2.5),Mean,mcg/m3,Borough,2,Brooklyn,Annual Average 2019,01/01/2019,6.66,
4915,649654,365,Fine particles (PM 2.5),Mean,mcg/m3,Borough,3,Manhattan,Annual Average 2019,01/01/2019,8.31,


In [36]:
air_pm_2019 = air_pm_2019[["Geo Place Name", "Data Value"]].rename(
    columns={
        "Geo Place Name": "Borough",
        "Data Value": "pm25_2019"
    }
)

air_pm_2019


Unnamed: 0,Borough,pm25_2019
4380,Staten Island,5.92
4827,Bronx,6.91
4883,Queens,6.47
4891,Brooklyn,6.66
4915,Manhattan,8.31


In [37]:
import numpy as np

# Focus on 2019 to match the air data year
rent_2019 = rent[rent["Report Year"] == 2019].copy()

# Function to map Boro-Block-Lot -> Borough name
def boro_from_bbl(bbl):
    if isinstance(bbl, str):
        boro_code = bbl.split("-")[0]
    else:
        return np.nan
    mapping = {
        "1": "Manhattan",
        "2": "Bronx",
        "3": "Brooklyn",
        "4": "Queens",
        "5": "Staten Island"
    }
    return mapping.get(boro_code, np.nan)

rent_2019["Borough"] = rent_2019["Boro-Block-Lot"].apply(boro_from_bbl)

rent_2019[["Boro-Block-Lot", "Borough", "Market Value per SqFt"]].head()


Unnamed: 0,Boro-Block-Lot,Borough,Market Value per SqFt
0,1-00011-0014,Manhattan,230.64
1,1-00028-0001,Manhattan,222.61
2,1-00094-0001,Manhattan,152.36
3,1-00100-0026,Manhattan,229.92
4,1-00117-0001,Manhattan,244.76


In [38]:
# Average market value per square foot by borough
rent_borough_2019 = (
    rent_2019
    .groupby("Borough", as_index=False)["Market Value per SqFt"]
    .mean()
    .rename(columns={"Market Value per SqFt": "market_value_per_sqft_2019"})
)

rent_borough_2019


Unnamed: 0,Borough,market_value_per_sqft_2019
0,Bronx,54.682644
1,Brooklyn,94.877103
2,Manhattan,217.352952
3,Queens,86.964047
4,Staten Island,73.336154


In [39]:
merged = pd.merge(
    rent_borough_2019,
    air_pm_2019,
    on="Borough",
    how="inner"
)

merged


Unnamed: 0,Borough,market_value_per_sqft_2019,pm25_2019
0,Bronx,54.682644,6.91
1,Brooklyn,94.877103,6.66
2,Manhattan,217.352952,8.31
3,Queens,86.964047,6.47
4,Staten Island,73.336154,5.92


## Visualization: Relationship Between Co-op Market Value and PM2.5

To understand whether housing prices and air quality are related in New York City,  
I created a scatter plot comparing:

- **Average co-op market value per square foot in 2019** (from the Department of Finance dataset), and  
- **Average annual PM2.5 concentration in 2019** (from the NYC Air Quality dataset).

A scatter plot is a good choice here because it helps reveal whether boroughs with
more expensive housing also tend to have cleaner (or dirtier) air.

In [40]:
import plotly.express as px

df = merged  # just to match the example style

fig = px.scatter(
    df,
    x="market_value_per_sqft_2019",
    y="pm25_2019",
    text="Borough",  # label each point with borough name
    title="Borough Co-op Market Value vs. PM2.5 (2019)",
    labels={
        "market_value_per_sqft_2019": "Average co-op market value per sq ft (USD, 2019)",
        "pm25_2019": "PM2.5 annual average (mcg/m³, 2019)"
    }
)

# make the labels not overlap too badly
fig.update_traces(textposition="top center")

fig.show()

## Conclusion
By combining NYC’s co-op market value data with borough-level PM2.5 readings, this project finds that higher housing prices do not necessarily mean better air quality. In fact, the borough with the highest average co-op value—Manhattan—also shows the highest PM2.5 pollution level in 2019. This suggests that economic affluence at the borough level is not strongly linked to cleaner air in New York City.  

Overall, this analysis shows how merging public datasets can reveal patterns that are not obvious at first glance and highlights the value of using open data to understand urban inequality and environmental conditions.
