# 2022 Congressional Districts with Total Population from Census PL file 09/30/22

## Background:
We received a data request asking for total populations of the 2022 congressional districts.

Note that some states adjust their redistricting data, and that processing can be found [here]<https://github.com/nonpartisan-redistricting-datahub/Processing-Requests/blob/main/Adjusted_Districts_Pop_09_28_22/README.md>

## Approach:

- Concatenate PL data for all of the states
- Join to the BAF available from the RDH
- Groupby congressional district, and join to the national 2022 congressional file
- Check file
- Export file

## Links to Download Raw Files 
- [National BAF for 2022 Districts](https://redistrictingdatahub.org/dataset/national-block-assignment-file-for-2022-state-legislative-and-congressional-districts/)
- [National Congressional Districts for 2022](https://redistrictingdatahub.org/dataset/national-congressional-districts-for-2022/)
- 2020 PL data by state is available from [the RDH](https://redistrictingdatahub.org/data/download-data/)

## Processing Steps:
See attached notebook

**Note: A full "raw-from-source" file is also available upon request. Please email info@redistrictingdatahub.org for more info.


In [1]:
import pandas as pd
import geopandas as gp
import os

state_abrvs = ['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA',
           'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
           'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
           'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
           'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY']

In [45]:
def national_pl():
    pl_concat = pd.DataFrame()
    for state in state_abrvs:
        print(f"reading in {state}")
        pl = pd.read_csv(f'./csv_pl/{state.lower()}_pl2020_b.csv', dtype='unicode', low_memory=False)[['GEOID20', 'P0010001']]
        pl_concat = pd.concat([pl_concat, pl], sort=False)
        
    return pl_concat   

In [54]:
print(baf.shape)
print(natpl.shape)

(8126712, 6)
(8126956, 2)


In [64]:
def baf_pl_merge():
    global natpl
    natpl = national_pl()
    natpl['GEOID20'] = natpl['GEOID20'].str.zfill(16)
    baf['GEOID20'] = baf['GEOID20'].str.zfill(16)
    baf_pl = baf.merge(natpl, on='GEOID20', how='outer', indicator=False)
    
    return baf_pl

In [65]:
baf_pl = baf_pl_merge()

In [95]:
baf_pl[(baf_pl['STATEAB'].isna())&(baf_pl['P0010001']!=0)]

Unnamed: 0,GEOID20,STATEAB,CONG,SLDU,SLDL,FLOTERIAL,P0010001,statefips
8126797,20160001001440,,,,,,141,2
8126798,20160001001441,,,,,,91,2


In [66]:
len(baf_pl['GEOID20'].str.slice(stop=3).value_counts())

50

In [80]:
def check_state_totals():
    natpl['statefips'] = natpl['GEOID20'].str.slice(stop=3)
    natpl['P0010001'] = natpl['P0010001'].astype(int)
    pl_gpby = natpl.groupby(['statefips']).sum()
    
    baf_pl['statefips'] = baf_pl['GEOID20'].str.slice(stop=3)
    baf_pl['P0010001'] = baf_pl['P0010001'].astype(int)
    baf_pl_gpby = baf_pl.groupby(['statefips']).sum()
    
    return pl_gpby == baf_pl_gpby
    
check_state_totals()

Unnamed: 0_level_0,P0010001
statefips,Unnamed: 1_level_1
1,True
2,True
4,True
5,True
6,True
8,True
9,True
10,True
12,True
13,True


In [149]:
baf_pl = baf_pl[(~baf_pl['STATEAB'].isna())|(baf_pl['P0010001']!=0)]
baf_pl['CD_ID'] = baf_pl['STATEAB'].astype(str) + '-' + baf_pl['CONG'].astype(str).str.upper().str.zfill(3)

cd = gp.read_file(f'zip+s3://data.redistrictingdatahub.org/web_ready_stage/NATIONAL/national_cong_2022.zip')
cd['CD_ID'] = cd['STATE'].astype(str) + '-' + cd['DISTRICT'].astype(str).str.upper().str.zfill(3)

baf_pl_sum = baf_pl.groupby(['CD_ID']).sum()

cd.loc[(cd['CD_ID'].str.contains('MS-')), 'CD_ID'] = cd.loc[(cd['CD_ID'].str.contains('MS-')), 'CD_ID'].map(ms_dict)
cd_pop_geo = cd.merge(baf_pl_sum, on="CD_ID", how='outer', indicator=True)
cd_pop_geo.loc[cd_pop_geo['STATE']=='AK', 'P0010001'] = cd_pop_geo.loc[cd_pop_geo['STATE']=='AK', 'P0010001']+232
cd_pop_geo = cd_pop_geo[~cd_pop_geo['STATE'].isna()]

In [154]:
cd_pop_geo

Unnamed: 0,DISTRICT,STATE,geometry,CD_ID,P0010001,_merge
0,At-Large,AK,"MULTIPOLYGON (((-18455563.423 7215576.889, -18...",AK-AT-LARGE,733391,both
1,1,AL,"POLYGON ((-9751458.168 3632414.860, -9751458.1...",AL-001,726276,both
2,2,AL,"POLYGON ((-9563720.783 3799405.395, -9563217.0...",AL-002,693466,both
3,3,AL,"POLYGON ((-9635097.250 3897801.575, -9635008.9...",AL-003,735132,both
4,4,AL,"POLYGON ((-9590022.743 4100677.324, -9589939.0...",AL-004,702982,both
...,...,...,...,...,...,...
430,7,WI,"MULTIPOLYGON (((-9995933.342 5824402.270, -999...",WI-007,736715,both
431,8,WI,"MULTIPOLYGON (((-9910574.113 5503101.483, -991...",WI-008,736714,both
432,1,WV,"POLYGON ((-8938523.270 4535974.462, -8938520.5...",WV-001,896067,both
433,2,WV,"POLYGON ((-9030263.119 4787731.719, -9030173.7...",WV-002,897649,both


In [153]:
def check_state_totals_from_CD():
    cd_gpby = cd_pop_geo.groupby(['STATE']).sum()
    baf_pl['P0010001'] = baf_pl['P0010001'].astype(int)
    baf_cd_gpby = baf_pl.groupby(['STATEAB']).sum()
    
    return cd_gpby ==baf_cd_gpby
    
check_state_totals_from_CD()

Unnamed: 0_level_0,P0010001
STATE,Unnamed: 1_level_1
AK,False
AL,True
AR,True
AZ,True
CA,True
CO,True
CT,True
DE,True
FL,True
GA,True


In [158]:
cd_pop_geo.columns

Index(['DISTRICT', 'STATE', 'geometry', 'CD_ID', 'P0010001', '_merge'], dtype='object')

In [159]:
export = cd_pop_geo[['STATE','DISTRICT','CD_ID','P0010001','geometry']]

In [161]:
os.mkdir('./cd_pop_2022_csv')
os.mkdir('./cd_pop_2022_shp')

In [162]:
export.to_csv('./cd_pop_2022_csv/cd_pop_2022_csv.csv')
export.to_file('./cd_pop_2022_shp/cd_pop_2022_shp.shp')