## Data Preprocessing

### Creating precinct-level data for Jefferson County, Alabama, the includes 2020 Census demographic data and the 2020 election results

Data files:
- `al_vest_20.shp`: 2020 Alabama precinct and election results shapefile
- `al_b_2020_bound.shp`: 2020 Census Redistricting Data (P.L. 94-171) Block Shapefile for Alabama
- `al_pl2020_b.csv`: 2020 PL 94-171 Data Summary File for Alabama based on the Decennial Census at the Block level

All data files are from Redistricting data hub: https://www.redistrictingdatahub.org/

In [1]:
import maup
import pandas as pd
import geopandas as gp

### Load Precincts

In [2]:
precincts = gp.read_file('raw-data/al_vest_20/al_vest_20.shp')
precincts = precincts[precincts.GEOID20.str.startswith('01073')]
precincts.reset_index(drop=True, inplace=True)

precincts = precincts.to_crs(epsg=32616) # set crs to UTM

2020 State-wide offices in AL for election with both a Democratic and Republican candidate:
- President
- Sentate
- Alabama Public Service Commission President

In [3]:
precincts = precincts[['GEOID20', 'G20PRERTRU', 'G20PREDBID', 'G20PRELJOR', 'G20PREOWRI', 'G20USSRTUB', 'G20USSDJON', 'G20USSOWRI', 'G20PSCRCAV', 'G20PSCDCAS', 'G20PSCOWRI', 'geometry']]

In [4]:
precincts = precincts[['GEOID20', 'G20PRERTRU', 'G20PREDBID', 'G20PRELJOR', 'G20PREOWRI', 'G20USSRTUB', 'G20USSDJON', 'G20USSOWRI', 'G20PSCRCAV', 'G20PSCDCAS', 'G20PSCOWRI', 'geometry']]

# preprocessing
precincts.rename(columns={col: col[:-3] for col in [col for col in precincts.columns if col.startswith('G') and col!='GEOID']}, inplace=True)
precincts['G20PREDO'] = precincts['G20PREL'] + precincts['G20PREO']
precincts = precincts.drop(columns=['G20PREL', 'G20PREO'])
# get vote totals for each office
pres_cols = ['G20PRER', 'G20PRED', 'G20PREDO']
precincts['G20PRE_TOT'] = precincts[pres_cols].sum(axis=1)

senate_cols = ['G20USSR', 'G20USSD', 'G20USSO']
precincts['G20USS_TOT'] = precincts[senate_cols].sum(axis=1)

senate_cols = ['G20PSCR', 'G20PSCD', 'G20PSCO']
precincts['G20PSC_TOT'] = precincts[senate_cols].sum(axis=1)

In [5]:
precincts['prop_PRER']= precincts['G20PRER']/ precincts['G20PRE_TOT']
precincts['prop_PRED']= precincts['G20PRED']/ precincts['G20PRE_TOT']

precincts['prop_USSR']= precincts['G20USSR']/ precincts['G20USS_TOT']
precincts['prop_USSD']= precincts['G20USSD']/ precincts['G20USS_TOT']

precincts['prop_PSCR']= precincts['G20PSCR']/ precincts['G20PSC_TOT']
precincts['prop_PSCD']= precincts['G20PSCD']/ precincts['G20PSC_TOT']

### Load Census Blocks

In [6]:
blocks = gp.read_file('raw-data/2020_census_blocks/al_b_2020_bound/al_b_2020_bound.shp', mask=precincts)
blocks.rename(columns={"GEOID20": "GEOID"}, inplace=True)
blocks = blocks[blocks.COUNTYFP20 == '073']
blocks = blocks.to_crs(epsg=32616) # set crs to UTM


demo_blocks = pd.read_csv('raw-data/2020_census_blocks/al_pl2020_b/al_pl2020_b.csv', low_memory=False)
demo_blocks['GEOID'] = demo_blocks['GEOID'].str.slice(-15)

Census Public Law 94-171 fields: https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/summary-file/2020Census_PL94_171Redistricting_StatesTechDoc_English.pdf

- P0040001 = total population 18 and over
- P0040002 = total population 18 and over: hispanic or latino
- P0040005 = total population 18 and over: white alone, not hispanic or latino
- P0040006 = total population 18 and over: black alone, not hispanic or latino
- P0040007 = total population 18 and over: american indian and alaska native alone, not hispanic or latino
- P0040008 = total population 18 and over: asian alone, not hispanic or latino
- P0040009 = total population 18 and over: native hawaiian and other pacific islander alone, not hispanic or latino
- P0040010 = total population 18 and over: some other, not hispanic or latino
- P0040011 = total population 18 and over: two or more, not hispanic or latino

In [7]:
# Attach demographic data to blocks
demo_cols = ['P0010001', 'P0040001', 'P0040002', 'P0040005', 'P0040006', 'P0040007', 'P0040008', 'P0040009', 'P0040010', 'P0040011']

blocks = \
  blocks\
  .merge(demo_blocks[['GEOID']+demo_cols], how='left', on='GEOID')\
  [['GEOID']+demo_cols+['geometry']]

In [8]:
# population check
blocks['P0010001'].sum() # 674,721

674721

### Prorate population data to precincts

In [9]:
blocks_to_precincts_assignment = maup.assign(blocks, precincts)
precincts[demo_cols] = blocks[demo_cols].groupby(blocks_to_precincts_assignment).sum()

In [10]:
# population check
print(precincts['P0010001'].sum())
precincts['P0010001'].sum() == blocks['P0010001'].sum() 

674721


True

In [11]:
precincts.head()

Unnamed: 0,GEOI,G20PRER,G20PRED,G20USSR,G20USSD,G20USSO,G20PSCR,G20PSCD,G20PSCO,geometry,...,P0010001,P0040001,P0040002,P0040005,P0040006,P0040007,P0040008,P0040009,P0040010,P0040011
0,1073005170,1287,1311,1228,1465,12,1324,1252,3,"POLYGON ((518375.238 3702058.931, 518215.629 3...",...,4849,3382,131,2795,250,5,107,0,4,90
1,1073005150,1311,1019,1225,1188,16,1317,1023,6,"POLYGON ((527413.935 3710128.995, 527402.089 3...",...,4787,4197,231,3472,340,6,51,0,0,97
2,1073005130,1797,772,1713,938,13,1842,719,2,"POLYGON ((526122.583 3704381.652, 526120.684 3...",...,4752,3499,44,3325,21,2,42,0,6,59
3,1073005120,1359,905,1322,1003,4,1354,888,1,"POLYGON ((519684.961 3699390.017, 519684.490 3...",...,5100,3893,143,2987,357,4,303,1,15,83
4,1073005100,1614,1561,1584,1726,13,1647,1556,6,"POLYGON ((516918.913 3700841.038, 516917.627 3...",...,5828,4323,155,3271,538,3,204,0,23,129


In [13]:
precincts.to_file('output-data/al_jefferson-2020_precincts.shp', driver='ESRI Shapefile')