# Median Household Incomes

This notebook uses median household income estimates found in the 2023 5-Year American Community Survey to figure out the median household income in each NYC City Council district. For the citywide median household income, use the average of all districts. 
<br>
<br>
A relationship file between 2020 Census Tracts and 2023 City Council district boundaries was used to assign each census tract to a council district. To find the median income of the council district, take the mean of the census tracts assigned to that district.
<br>
<br>
This data analysis was produced for THE CITY's "Get to Know Your Council District" news app, found here: https://projects.thecity.nyc/new-york-city-council-district/

In [1]:
## import libraries
import pandas as pd
import geopandas as gpd

In [19]:
## read in file
income = pd.read_csv("../input/demographics/income/acs_5y_income.csv", dtype = {"ct":"str"})

In [20]:
## take a peak
income.head()

Unnamed: 0,ct,census_tract,county,city,tot_households,moe_households,tot_less_than_10K,moe_less_than_10K,tot_more_than_200K,moe_more_than_200K,median_income,moe_median_income,tot_families,moe_families
0,36005000100,Census Tract 1,Bronx County,New York,0.0,13.0,,,,,,,0.0,13.0
1,36005000200,Census Tract 2,Bronx County,New York,1446.0,212.0,2.8,3.1,17.6,7.5,121171.0,11360.0,1114.0,159.0
2,36005000400,Census Tract 4,Bronx County,New York,2246.0,286.0,3.6,3.7,11.2,6.0,98242.0,40861.0,1480.0,340.0
3,36005001600,Census Tract 16,Bronx County,New York,2149.0,184.0,10.3,6.6,5.7,3.1,42957.0,14062.0,1441.0,273.0
4,36005001901,Census Tract 19.01,Bronx County,New York,1092.0,116.0,8.9,5.6,10.2,6.0,67361.0,25254.0,615.0,133.0


In [21]:
## drop null census tracts... original data show that there wasn't sufficient data in those tracts.
nulls_dropped = income.dropna(subset = 'median_income')

In [22]:
nulls_dropped.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2192 entries, 1 to 2325
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   ct                  2192 non-null   object 
 1   census_tract        2192 non-null   object 
 2   county              2192 non-null   object 
 3   city                2192 non-null   object 
 4   tot_households      2192 non-null   float64
 5   moe_households      2192 non-null   float64
 6   tot_less_than_10K   2192 non-null   float64
 7   moe_less_than_10K   2192 non-null   float64
 8   tot_more_than_200K  2192 non-null   float64
 9   moe_more_than_200K  2192 non-null   float64
 10  median_income       2192 non-null   float64
 11  moe_median_income   2179 non-null   float64
 12  tot_families        2192 non-null   float64
 13  moe_families        2192 non-null   float64
dtypes: float64(10), object(4)
memory usage: 256.9+ KB


In [23]:
## read in crosswalk, which transitions census tracts made in 2020 to council districts made in 2023
census_crosswalk = pd.read_csv("../input/crosswalks/ct20-to-cd23-crosswalk.csv",dtype = {"ct":"str",'cd':'str'})

In [24]:
## take a peak
census_crosswalk.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2327 entries, 0 to 2326
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ct      2327 non-null   object
 1   cd      2325 non-null   object
dtypes: object(2)
memory usage: 36.5+ KB


In [25]:
## combine 
income_by_cd = pd.merge(nulls_dropped,
                        census_crosswalk,
                        left_on = "ct",
                        right_on = "ct")

In [26]:
## take a peak
income_by_cd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2192 entries, 0 to 2191
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   ct                  2192 non-null   object 
 1   census_tract        2192 non-null   object 
 2   county              2192 non-null   object 
 3   city                2192 non-null   object 
 4   tot_households      2192 non-null   float64
 5   moe_households      2192 non-null   float64
 6   tot_less_than_10K   2192 non-null   float64
 7   moe_less_than_10K   2192 non-null   float64
 8   tot_more_than_200K  2192 non-null   float64
 9   moe_more_than_200K  2192 non-null   float64
 10  median_income       2192 non-null   float64
 11  moe_median_income   2179 non-null   float64
 12  tot_families        2192 non-null   float64
 13  moe_families        2192 non-null   float64
 14  cd                  2192 non-null   object 
dtypes: float64(10), object(5)
memory usage: 257.0+ KB


In [33]:
## group the census tracts and incomes by city council district, take the mean of the incomes
district_incomes = income_by_cd.groupby("cd")["median_income"].mean().round(1).reset_index()

In [34]:
## take a peak
district_incomes.head()

Unnamed: 0,cd,median_income
0,1,131217.6
1,10,74274.6
2,11,78149.6
3,12,67325.1
4,13,81096.6


In [35]:
district_incomes.to_csv("../output/demographics/district_incomes.csv")