In [1]:
import pandas as pd
import geopandas as gp
import os

# 2022 Districts using Adjusted Data - Populations 09_28_22

## Background:
- We received a data request asking for total populations of the 2022 districts.
- Although most states draw their redistricting plans using the census' population, a handful of states use adjusted data.
- The usage of adjusted data is made more complicated by the fact that not every state that produces adjusted data uses it for all levels of redistricting.
- Below is a list of the states that produce adjusted data and the level(s) of redistricting they use it for:
    - CA (Congressional and State Legislative)
    - CO (State Legislative)
    - CT (State Legislative)
    - DE (State Legislative)
    - HI (State Legislative)
    - MD (Congressional and State Legislative)
    - MT (State Legislative)
    - NJ (Congressional and State Legislative)
    - NV (Congressional and State Legislative)
    - NY (State Legislative)
    - PA (State Legislative)
    - RI (Congressional and State Legislative)
    - VA (Congressional and State Legislative)
    - WA (Congressional and State Legislative)
    
- Due to this nuance, we thought it would be easier to produce a dataset with the districts that used adjusted data and their adjusted population, rather than adding in an "adjusted population" column to the national block-assignment file.
- Furthermore, RI did not release block-level adjusted data, but they did release their district-level adjusted populations.

## Approach:
- For every state on the above list, except RI, load in files containing adjusted populations for each block.
  - Note, these files were produced for earlier work and involve manipulating states' adjusted datasets
- Join the adjusted block-level populations to the national block assignment file.
- Transform the block assignment file so that every row is now a particular congressional, state house, or state senate district with its population
- For RI, transcribe the district populations from an official report and join these populations to the relevant districts.
- Check the file
- Export the file   

## Links to Download Raw Files
- RI District Population Reports
  - Official state reports, available upon requests.   
- RI District Population csv
  - Created by the RDH using the reports, available upon request. 
- State Block-Level Adjusted Populations for all States except RI (where data not available)
  - Produced using official files on the RDH website, processed files available upon request.   
- [National BAF for 2022 Districts](https://redistrictingdatahub.org/dataset/national-block-assignment-file-for-2022-state-legislative-and-congressional-districts/)

#### Note: A full "raw-from-source" file is also available upon request. Please email info@redistrictingdatahub.org


## Load in Data for Every State Except RI

Note: These are processed files created for work for the "States that Adjust the Census Data for Redistricting" page on the RDH website. HI and PA need to be loaded separately

In [2]:
adjusted_data_state_subset = ['CA', 'CO', 'CT', 'DE', 'HI', 'MD', 'MT', 'NJ', 'NV',
'NY', 'PA', 'VA', 'WA']

#### Code for PA

Note PA splits a handful of census blocks in their official file. We use the original PL blocks in the national BAF. These split blocks are all in the same districts, so they can be combined

In [3]:
def mod_census(block_id):
    block_id = str(block_id)
    
    # PA appends a letter to the GEOID for the split blocks
    if "A" in block_id or "B" in block_id or "C" in block_id:
        
        # Return the GEOID with out the split so the blocks can be combined
        return block_id[:len(block_id)-1]
    
    # If it's not one of these special blocks, just return the GEOID
    else:
        return block_id

In [4]:
# Create a list to store the state data
adjusted_data_list = []

# Iterate over the states
for state in adjusted_data_state_subset:
    
    # Load and filter the data
    adj_state = pd.read_csv("./raw-from-source/Adjusted_Counts/"+state+"_blocks.csv")
    adj_state = adj_state[["GEOID20", "Adj_Pop"]]
    
    # Deal with PA split blocks
    if state == "PA":
        
        # Use the above function to return the "unsplit" GEOID
        adj_state["mod_GEOID20"] = adj_state["GEOID20"].apply(lambda x: mod_census(x))
        
        # Because the splits blocks are in the same districts, we can join them together to match PL geographies
        adj_state_mod = adj_state.groupby("mod_GEOID20").sum()
        
        # Clean the index and rename columns to match others
        adj_state_mod.reset_index(drop = False, inplace = True)
        adj_state_mod.rename(columns = {"mod_GEOID20":"GEOID20"}, inplace = True)
        adj_state_mod = adj_state_mod[["GEOID20", "Adj_Pop"]]
        
        # Append the PA data to the list
        adjusted_data_list.append(adj_state_mod)
    
    # For other states just add the data
    else:
        adjusted_data_list.append(adj_state)
    

  adj_state = pd.read_csv("./raw-from-source/Adjusted_Counts/"+state+"_blocks.csv")


In [5]:
# Transform the list to the dataframe
adj_state_data_df = pd.concat(adjusted_data_list)

# Clean the columns
adj_state_data_df["Adj_Pop"] = adj_state_data_df["Adj_Pop"].astype(int)
adj_state_data_df["GEOID20"] = adj_state_data_df["GEOID20"].astype(str).str.zfill(16)

# If desired, export this data to csv (CAUTION: NO RI DATA)
#adj_state_data_df.to_csv("./adjusted_data_pops.csv", index = False)

## Load in Block Assignment File

We will join the block-level adjusted population data to this file and aggregate to the needed districts

In [6]:
national_baf = pd.read_csv("./raw-from-source/national_baf/national_baf.csv", dtype =({"GEOID20":str, "STATEAB":str, "CONG":str, "SLDU":str, "SLDL":str, "FLOTERIAL":str}))

# Create columns for the various districts
national_baf["UNQ_CONG_DIST_ID"] = national_baf["STATEAB"] + "-" + national_baf["CONG"].astype(str)
national_baf["UNQ_SLDL_DIST_ID"] = national_baf["STATEAB"] + "-" + national_baf["SLDL"].astype(str)
national_baf["UNQ_SLDU_DIST_ID"] = national_baf["STATEAB"] + "-" + national_baf["SLDU"].astype(str)

In [7]:
# Clean the GEOID column
national_baf["GEOID20"] = national_baf["GEOID20"].astype(str).str.zfill(16)

## Join to National BAF

Note: This file is available for download from the RDH website (https://redistrictingdatahub.org/dataset/national-block-assignment-file-for-2022-state-legislative-and-congressional-districts/)

In [8]:
# Join the two files together
adjusted_counts = pd.merge(national_baf, adj_state_data_df, how = "outer", on = "GEOID20", indicator = True)

# Check the join
adjusted_counts["_merge"].value_counts()

left_only     6066848
both          2059864
right_only         73
Name: _merge, dtype: int64

Note - it is okay that some blocks do not join, as long as they don't have population. Some water blocks are not included in states' redistricting plans

In [9]:
# Confirm that there is no population for any of the unjoined blocks
sum(adjusted_counts[adjusted_counts["_merge"]=="right_only"]["Adj_Pop"])

0.0

In [10]:
# Filter down to join blocks or blocks in RI (need that to get the district data for the state)
joined = adjusted_counts[(adjusted_counts["_merge"]=="both") | (adjusted_counts["STATEAB"]=="RI")]

# Clean the columns
joined["Adj_Pop"] = joined["Adj_Pop"].fillna(0)
joined["Adj_Pop"] = joined["Adj_Pop"].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  joined["Adj_Pop"] = joined["Adj_Pop"].fillna(0)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  joined["Adj_Pop"] = joined["Adj_Pop"].astype(int)


## Aggregate to Districts

In [11]:
len(joined["STATEAB"].unique())

14

In [12]:
joined["STATEAB"].unique()

array(['CA', 'CO', 'CT', 'DE', 'HI', 'MD', 'MT', 'NJ', 'NV', 'NY', 'PA',
       'RI', 'VA', 'WA'], dtype=object)

In [13]:
# Create a subset of states that use adjusted data for congressional redistricting
uses_cong = joined[joined["STATEAB"].isin(["CA", "MD", "NJ", "NV", "RI", "VA", "WA"])]

# Aggregate to the appropriate district levels
joined_cong = uses_cong.groupby("UNQ_CONG_DIST_ID").sum()
joined_sldl = joined.groupby("UNQ_SLDL_DIST_ID").sum()
joined_sldu = joined.groupby("UNQ_SLDU_DIST_ID").sum()

# Clean the aggregations
joined_cong.reset_index(inplace = True, drop = False)
joined_sldl.reset_index(inplace = True, drop = False)
joined_sldu.reset_index(inplace = True, drop = False)

joined_cong.columns = ["ID", "Adj_Pop"]
joined_sldl.columns = ["ID", "Adj_Pop"]
joined_sldu.columns = ["ID", "Adj_Pop"]

joined_cong["Level"] = "CONG"
joined_sldl["Level"] = "SLDL"
joined_sldu["Level"] = "SLDU"

# Join them back into one file
combined_files = pd.concat([joined_cong, joined_sldl, joined_sldu])

# Get the state abbreviation
combined_files["State"] = combined_files["ID"].apply(lambda x: x[0:2])

## Add in RI Data

In [14]:
# Add in a leading zero for the RI data so it will join
ri_update_dict = {"RI-1":"RI-01",
"RI-2":"RI-02"}

# Apply this update
combined_files["ID"] = combined_files["ID"].map(ri_update_dict).fillna(combined_files["ID"])

In [15]:
# Create an ID of the level and the ID so we can join to RI
combined_files["unique_id"] = combined_files["Level"]+"-"+combined_files["ID"]

In [16]:
combined_files

Unnamed: 0,ID,Adj_Pop,Level,State,unique_id
0,CA-1,760066,CONG,CA,CONG-CA-1
1,CA-10,760066,CONG,CA,CONG-CA-10
2,CA-11,760067,CONG,CA,CONG-CA-11
3,CA-12,760065,CONG,CA,CONG-CA-12
4,CA-13,760065,CONG,CA,CONG-CA-13
...,...,...,...,...,...
552,WA-45,157270,SLDU,WA,SLDU-WA-45
553,WA-46,157255,SLDU,WA,SLDU-WA-46
554,WA-47,157240,SLDU,WA,SLDU-WA-47
555,WA-48,157252,SLDU,WA,SLDU-WA-48


In [17]:
# Load in the RI data
ri_data = pd.read_csv("./raw-from-source/ri_sizes.csv",dtype={"Number":str, "Adj_Pop":int, "Level":str})

# Create a unique ID to join with the pop. file
ri_data["unique_id"] = ri_data["Level"]+"-RI-"+ri_data["Number"].astype(str).str.zfill(2)

# Make the population an integer
ri_data["Adj_Pop"] = ri_data["Adj_Pop"].astype(int)

# Create a dictionary mapping from district ID to population in RI
ri_data_dict = dict(zip(ri_data["unique_id"], ri_data["Adj_Pop"]))

# Apply the above dictionary to the RI districts in the combined file
combined_files["Adj_Pop"] = combined_files["unique_id"].map(ri_data_dict).fillna(combined_files["Adj_Pop"])

In [18]:
combined_files["state-level"] = combined_files["State"]+"-"+combined_files["Level"]

In [19]:
combined_files["Adj_Pop"] = combined_files["Adj_Pop"].astype(int)

## Check Again

In [20]:
# Remove the "No Data" districts (these were unassigned blocks we kept in the BAF)
combined_files = combined_files[~combined_files["ID"].str.contains("NO")]

In [21]:
# Check how many of each district type there is (correct numbers are below)
combined_files["state-level"].value_counts()

PA-SLDL    203
CT-SLDL    151
NY-SLDL    150
VA-SLDL    100
MT-SLDL    100
CA-SLDL     80
RI-SLDL     75
MD-SLDL     71
CO-SLDL     65
NY-SLDU     63
CA-CONG     52
HI-SLDL     51
PA-SLDU     50
MT-SLDU     50
WA-SLDL     49
WA-SLDU     49
MD-SLDU     47
NV-SLDL     42
DE-SLDL     41
VA-SLDU     40
CA-SLDU     40
NJ-SLDU     40
NJ-SLDL     40
RI-SLDU     38
CT-SLDU     36
CO-SLDU     35
HI-SLDU     25
DE-SLDU     21
NV-SLDU     21
NJ-CONG     12
VA-CONG     11
WA-CONG     10
MD-CONG      8
NV-CONG      4
RI-CONG      2
Name: state-level, dtype: int64

Target District Numbers

- PA-SLDL    203
- CT-SLDL    151
- NY-SLDL    150
- VA-SLDL    100
- MT-SLDL    100
- CA-SLDL     80
- RI-SLDL     75
- MD-SLDL     71
- CO-SLDL     65
- NY-SLDU     63
- CA-CONG     52
- HI-SLDL     51
- PA-SLDU     50
- MT-SLDU     50
- WA-SLDL     49
- WA-SLDU     49
- MD-SLDU     47
- NV-SLDL     42
- DE-SLDL     41
- VA-SLDU     40
- CA-SLDU     40
- NJ-SLDU     40
- NJ-SLDL     40
- RI-SLDU     38
- CT-SLDU     36
- CO-SLDU     35
- HI-SLDU     25
- DE-SLDU     21
- NV-SLDU     21
- NJ-CONG     12
- VA-CONG     11
- WA-CONG     10
- MD-CONG      8
- NV-CONG      4
- RI-CONG      2

In [22]:
# Check the population totals for the various district types
state_sums = combined_files.groupby("state-level").sum()

In [23]:
state_sums["Adj_Pop"]

state-level
CA-CONG    39523437
CA-SLDL    39523437
CA-SLDU    39523437
CO-SLDL     5773714
CO-SLDU     5773714
CT-SLDL     3603566
CT-SLDU     3603566
DE-SLDL      989598
DE-SLDU      989598
HI-SLDL     1383606
HI-SLDU     1383606
MD-CONG     6175403
MD-SLDL     6175403
MD-SLDU     6175403
MT-SLDL     1082717
MT-SLDU     1082717
NJ-CONG     9283016
NJ-SLDL     9283016
NJ-SLDU     9283016
NV-CONG     3104614
NV-SLDL     3104614
NV-SLDU     3104614
NY-SLDL    20193858
NY-SLDU    20193858
PA-SLDL    13002700
PA-SLDU    13002700
RI-CONG     1097379
RI-SLDL     1097379
RI-SLDU     1097379
VA-CONG     8631393
VA-SLDL     8631393
VA-SLDU     8631393
WA-CONG     7705281
WA-SLDL     7705281
WA-SLDU     7705281
Name: Adj_Pop, dtype: int64

Target Pops

- CA 39523437
- CO     5773714
- CT     3603566
- DE      989598
- HI    1383606
- MD    6175403
- MT    1082717
- NJ    9283016
- NV     3104614
- NY   20193858
- PA   13002700
- RI    1097379
- VA    8631393
- WA    7705281

## Prepare to Export File

In [24]:
combined_files.drop(["unique_id","state-level"], axis = 1, inplace = True)
combined_files["ID"] = combined_files["ID"].apply(lambda x: x.split("-")[1])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  combined_files.drop(["unique_id","state-level"], axis = 1, inplace = True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  combined_files["ID"] = combined_files["ID"].apply(lambda x: x.split("-")[1])


In [25]:
# Take a look at the file
combined_files.head(10)

Unnamed: 0,ID,Adj_Pop,Level,State
0,1,760066,CONG,CA
1,10,760066,CONG,CA
2,11,760067,CONG,CA
3,12,760065,CONG,CA
4,13,760065,CONG,CA
5,14,760065,CONG,CA
6,15,760066,CONG,CA
7,16,760067,CONG,CA
8,17,760066,CONG,CA
9,18,760065,CONG,CA


In [26]:
# Export the file
combined_files.to_csv("./adjusted_districts_pop.csv", index = False)