2010 block data scaled by 5yr ACM block group data assuming that the demographic distribution between blocks within a block group does not change.

Get Block from lat/long: https://www.fcc.gov/general/census-block-conversions-api-v100



What variables do we want to track?

https://www.jstor.org/stable/586227?seq=1: https://deepblue.lib.umich.edu/bitstream/handle/2027.42/45484/11109_2004_Article_BF00991978.pdf;sequence=1

https://msaag.aag.org/wp-content/uploads/2013/04/3_Klos.pdf


Variabels:
https://api.census.gov/data/2010/dec/sf1/variables.html

Considerations: 
  * Variables are present for both the decennial and the ACM 5.
  
  
From: https://www.icpsr.umich.edu/icpsrweb/instructors/setups2012/voting.jsp  
  * Race and ethnicity. Minority groups, especially Blacks, are more Democratic in their voting than are Whites.
  * Social class or socio-economic status. Those who are better off in income or socio-economic status are more Republican than are those who are worse off.
  * Religion. Those who are more religious are more Republican than those who are less religious. In the past, White Catholics and Protestants differed considerably in their voting, but that distinction has declined in significance.
  * Region. Voters in the South, Great Plains, and Rocky Mountains regions are more Republican, while those in the Northeast and on the Pacific Coast are more Democratic.
  * Gender. Women are more Democratic than are men.
  * Marital status. Married individuals are more Republican than are single individuals.
  * Age. Younger voters are more Democratic than are older voters.

|SF1|ACS5|Reason|Description|
|-----|-----|-----|-----|
|H001001|B25001_001E|Housing Units: Total|
|H002002|None|Housing Units: Urban|
|H002005|None|Housing Units: Rural|
|H003002|B25002_002E|Occupancy: Occupied|
|H003003|B25002_003E|Occupancy: Vacant|
|H004002|B25003_002E|Owned with mortgage or loan|
|H004003|B25003_002E|Owned outright|
|H004004|B25003_003E|Renter occupied|
|H006002-8|B25006_008E|Race of Householder|
|H007010||Householder identifies as hispanic or latino origin|
|H012002||Average tenure, owner|
|H012003||Average tenure, renter|
|H013002-8||Household size|
|P001001|B01003_001E|Total population|
|P002002||Total urban|
|P002005||total rural|
|P003002-8|B02001_002-8E|Races|
|P004003|B03003_003E|Total Hispanic|
|P012002-49|B01001_02-49E|sex by age|
|P018002-9|B11001_002-9E|Household type|
|P038002|B11003_002E|Husband-wife family|
|P03803|B11003_003E|Husband-wife, own kids under 18|

In [1]:
variables = {
    "H001001": ("", ""),
    "H002002": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
    "": ("", ""),
}

In [2]:
import time

import requests
import numpy as np
import pandas as pd
import geopandas as gpd

# Figure out the API

In [3]:
api_root = "https://api.census.gov/data/2010/dec/sf1?"
variables = ["P001001", "H001001", "NAME"]
for_block = {"block": "*"}
in_block = {"state": "08",
            "county": "013"}
api_key = "b19b387a819c219d7e4f0569c1ef8c5e3b237ace"

def block_substring(block):
    return "%20".join([key+":"+block[key] for key in block])

def compose_request_str(api_root, variables, for_block, in_block, key):
    request = (api_root + 
               "get=" + ",".join(variables) + 
               "&for=" + block_substring(for_block) + 
               "&in=" + block_substring(in_block) + 
               "&key=" + key )
    return request

In [4]:
response = requests.get(
    compose_request_str(api_root, variables, for_block, in_block, api_key)
)
response.status_code

200

In [5]:
df = pd.DataFrame(data=response.json()[1:], columns=response.json()[0])

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7368 entries, 0 to 7367
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   P001001  7368 non-null   object
 1   H001001  7368 non-null   object
 2   NAME     7368 non-null   object
 3   state    7368 non-null   object
 4   county   7368 non-null   object
 5   tract    7368 non-null   object
 6   block    7368 non-null   object
dtypes: object(7)
memory usage: 403.1+ KB


# Get State Data

## Get list of counties

In [7]:
precincts = gpd.read_file("../data/processed/precincts/precincts_2018.geojson")

In [8]:
county_fips = sorted(list(set(precincts.COUNTYFP.values)))

In [9]:
def scan_county():
    data = pd.DataFrame()
    for i, county in enumerate(county_fips):
        in_block["county"] = county
        t1 = time.time()
        response = requests.get(
            compose_request_str(api_root, variables, for_block, in_block, api_key)
            )
        t2 = time.time()
        print(f"{i}/{len(county_fips)}: {county}, {response.status_code}, "
              f"{len(response.json())}, dt = {t2-t1:.3F}")
        df = pd.DataFrame(data=response.json()[1:], columns=response.json()[0])
        data = data.append(df)
    return data

## Get All Blocks in CO

In [10]:
for_block = {"block": "*"}
blocks = scan_county()

0/64: 001, 200, 10352, dt = 86.228
1/64: 003, 200, 1126, dt = 2.807
2/64: 005, 200, 9119, dt = 5.108
3/64: 007, 200, 1184, dt = 2.845
4/64: 009, 200, 1189, dt = 10.006
5/64: 011, 200, 1139, dt = 10.955
6/64: 013, 200, 7369, dt = 3.945
7/64: 014, 200, 1420, dt = 2.938
8/64: 015, 200, 1899, dt = 16.081
9/64: 017, 200, 613, dt = 6.936
10/64: 019, 200, 1484, dt = 3.037
11/64: 021, 200, 1151, dt = 2.976
12/64: 023, 200, 2834, dt = 22.985
13/64: 025, 200, 538, dt = 7.008
14/64: 027, 200, 1207, dt = 3.949
15/64: 029, 200, 1990, dt = 3.938
16/64: 031, 200, 11040, dt = 65.938
17/64: 033, 200, 674, dt = 7.999
18/64: 035, 200, 5569, dt = 3.733
19/64: 037, 200, 3687, dt = 24.222
20/64: 039, 200, 1119, dt = 10.950
21/64: 041, 200, 18691, dt = 137.939
22/64: 043, 200, 3286, dt = 3.991
23/64: 045, 200, 4330, dt = 3.860
24/64: 047, 200, 836, dt = 9.001
25/64: 049, 200, 2750, dt = 21.959
26/64: 051, 200, 3629, dt = 3.978
27/64: 053, 200, 510, dt = 2.971
28/64: 055, 200, 2025, dt = 18.048
29/64: 057, 20

In [11]:
blocks.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 201062 entries, 0 to 2895
Data columns (total 7 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   P001001  201062 non-null  object
 1   H001001  201062 non-null  object
 2   NAME     201062 non-null  object
 3   state    201062 non-null  object
 4   county   201062 non-null  object
 5   tract    201062 non-null  object
 6   block    201062 non-null  object
dtypes: object(7)
memory usage: 12.3+ MB


In [14]:
raw_path = "../data/raw/census_data/"
blocks.to_csv(raw_path + "basic_blocks.csv")

## Get All Block-Groups in CO

In [12]:
for_block = {"block%20group": "*"}
block_groups = scan_county()

0/64: 001, 200, 261, dt = 2.820
1/64: 003, 200, 16, dt = 2.969
2/64: 005, 200, 409, dt = 5.089
3/64: 007, 200, 11, dt = 2.988
4/64: 009, 200, 5, dt = 4.910
5/64: 011, 200, 6, dt = 3.070
6/64: 013, 200, 201, dt = 4.003
7/64: 014, 200, 47, dt = 2.952
8/64: 015, 200, 16, dt = 2.965
9/64: 017, 200, 4, dt = 3.013
10/64: 019, 200, 7, dt = 3.023
11/64: 021, 200, 7, dt = 2.983
12/64: 023, 200, 5, dt = 2.947
13/64: 025, 200, 5, dt = 3.068
14/64: 027, 200, 5, dt = 2.966
15/64: 029, 200, 25, dt = 2.991
16/64: 031, 200, 482, dt = 6.115
17/64: 033, 200, 3, dt = 2.864
18/64: 035, 200, 156, dt = 3.169
19/64: 037, 200, 30, dt = 2.864
20/64: 039, 200, 16, dt = 3.070
21/64: 041, 200, 366, dt = 3.900
22/64: 043, 200, 37, dt = 2.951
23/64: 045, 200, 37, dt = 3.066
24/64: 047, 200, 5, dt = 2.965
25/64: 049, 200, 10, dt = 2.966
26/64: 051, 200, 21, dt = 3.068
27/64: 053, 200, 2, dt = 2.956
28/64: 055, 200, 8, dt = 2.980
29/64: 057, 200, 3, dt = 3.082
30/64: 059, 200, 411, dt = 5.970
31/64: 061, 200, 3, dt =

In [13]:
block_groups.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3532 entries, 0 to 8
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   P001001      3532 non-null   object
 1   H001001      3532 non-null   object
 2   NAME         3532 non-null   object
 3   state        3532 non-null   object
 4   county       3532 non-null   object
 5   tract        3532 non-null   object
 6   block group  3532 non-null   object
dtypes: object(7)
memory usage: 220.8+ KB


In [15]:
block_groups.to_csv(raw_path + "basic_block_groups.csv")

# Clean Block Data

## Get Block Shapefiles

In [18]:
block_shapes = gpd.read_file("../data/raw/census_shape/tl_2010_08_tabblock10/")

In [85]:
block_shapes.crs

{'init': 'epsg:4269'}

In [19]:
block_shapes.head()

Unnamed: 0,STATEFP10,COUNTYFP10,TRACTCE10,BLOCKCE10,GEOID10,NAME10,MTFCC10,UR10,UACE10,UATYP10,FUNCSTAT10,ALAND10,AWATER10,INTPTLAT10,INTPTLON10,geometry
0,8,1,8535,1020,80010085351020,Block 1020,G5040,U,23527.0,U,S,1312,0,39.9143691,-104.8699465,"POLYGON ((-104.87058 39.91431, -104.87049 39.9..."
1,8,1,8535,1021,80010085351021,Block 1021,G5040,U,23527.0,U,S,11846,0,39.9174508,-104.8601251,"POLYGON ((-104.85744 39.92146, -104.85722 39.9..."
2,8,1,8537,1068,80010085371068,Block 1068,G5040,U,23527.0,U,S,94928,0,39.9012635,-104.8476875,"POLYGON ((-104.84661 39.90358, -104.84674 39.9..."
3,8,1,8543,3003,80010085433003,Block 3003,G5040,U,23527.0,U,S,6841,0,39.9839459,-104.7917056,"POLYGON ((-104.79228 39.98428, -104.79209 39.9..."
4,8,1,60000,1054,80010600001054,Block 1054,G5040,R,,,S,42382,0,39.9729774,-104.9912392,"POLYGON ((-104.99040 39.97199, -104.99244 39.9..."


In [20]:
block_shapes.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 201062 entries, 0 to 201061
Data columns (total 16 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   STATEFP10   201062 non-null  object  
 1   COUNTYFP10  201062 non-null  object  
 2   TRACTCE10   201062 non-null  object  
 3   BLOCKCE10   201062 non-null  object  
 4   GEOID10     201062 non-null  object  
 5   NAME10      201062 non-null  object  
 6   MTFCC10     201062 non-null  object  
 7   UR10        201062 non-null  object  
 8   UACE10      94167 non-null   object  
 9   UATYP10     94167 non-null   object  
 10  FUNCSTAT10  201062 non-null  object  
 11  ALAND10     201062 non-null  int64   
 12  AWATER10    201062 non-null  int64   
 13  INTPTLAT10  201062 non-null  object  
 14  INTPTLON10  201062 non-null  object  
 15  geometry    201062 non-null  geometry
dtypes: geometry(1), int64(2), object(13)
memory usage: 24.5+ MB


In [21]:
blocks.head()

Unnamed: 0,P001001,H001001,NAME,state,county,tract,block
0,244,71,"Block 1007, Block Group 1, Census Tract 78.01,...",8,1,7801,1007
1,310,110,"Block 1008, Block Group 1, Census Tract 78.01,...",8,1,7801,1008
2,56,18,"Block 1013, Block Group 1, Census Tract 78.01,...",8,1,7801,1013
3,11,4,"Block 1010, Block Group 1, Census Tract 78.01,...",8,1,7801,1010
4,68,47,"Block 1011, Block Group 1, Census Tract 78.01,...",8,1,7801,1011


## Merge Block Data and Shapefiles

In [22]:
def make_GEOID(df):
    return df.state + df.county + df.tract + df.block

blocks["GEOID10"] = blocks.apply(make_GEOID, axis=1)

In [23]:
blocks.head()

Unnamed: 0,P001001,H001001,NAME,state,county,tract,block,GEOID10
0,244,71,"Block 1007, Block Group 1, Census Tract 78.01,...",8,1,7801,1007,80010078011007
1,310,110,"Block 1008, Block Group 1, Census Tract 78.01,...",8,1,7801,1008,80010078011008
2,56,18,"Block 1013, Block Group 1, Census Tract 78.01,...",8,1,7801,1013,80010078011013
3,11,4,"Block 1010, Block Group 1, Census Tract 78.01,...",8,1,7801,1010,80010078011010
4,68,47,"Block 1011, Block Group 1, Census Tract 78.01,...",8,1,7801,1011,80010078011011


In [24]:
blocks = blocks.set_index("GEOID10")
block_shapes = block_shapes.set_index("GEOID10")

In [25]:
all_blocks = pd.concat([blocks, block_shapes], axis=1)

In [26]:
all_blocks.info()

<class 'pandas.core.frame.DataFrame'>
Index: 201062 entries, 080010078011007 to 081259632005151
Data columns (total 22 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   P001001     201062 non-null  object  
 1   H001001     201062 non-null  object  
 2   NAME        201062 non-null  object  
 3   state       201062 non-null  object  
 4   county      201062 non-null  object  
 5   tract       201062 non-null  object  
 6   block       201062 non-null  object  
 7   STATEFP10   201062 non-null  object  
 8   COUNTYFP10  201062 non-null  object  
 9   TRACTCE10   201062 non-null  object  
 10  BLOCKCE10   201062 non-null  object  
 11  NAME10      201062 non-null  object  
 12  MTFCC10     201062 non-null  object  
 13  UR10        201062 non-null  object  
 14  UACE10      94167 non-null   object  
 15  UATYP10     94167 non-null   object  
 16  FUNCSTAT10  201062 non-null  object  
 17  ALAND10     201062 non-null  int64   
 18  AWATER

In [27]:
trimmed_blocks = all_blocks.drop(["STATEFP10", "COUNTYFP10", "TRACTCE10", "BLOCKCE10", 
                                  "NAME10", "MTFCC10", "UR10", "UACE10", 
                                  "UATYP10", "FUNCSTAT10", "ALAND10", 
                                  "AWATER10"],
                                 axis=1)

In [28]:
def extract_bg(df):
    return df.state + df.county + df.tract + df.block[0]
    
trimmed_blocks["blockgroup"] = trimmed_blocks.apply(extract_bg, axis=1)

In [45]:
trimmed_blocks['area'] = trimmed_blocks.geometry.apply(lambda x: x.area)

In [38]:
trimmed_blocks = trimmed_blocks.rename(columns={'P001001': 'population', 'H001001': 'residences', 'NAME': 'name'})

In [39]:
trimmed_blocks[::10000]

Unnamed: 0,population,residences,name,state,county,tract,block,INTPTLAT10,INTPTLON10,geometry,blockgroup
80010078011007,244,71,"Block 1007, Block Group 1, Census Tract 78.01,...",8,1,7801,1007,39.7410946,-104.8746404,"POLYGON ((-104.87523 39.74021, -104.87523 39.7...",80010078011
80010601002040,46,14,"Block 2040, Block Group 2, Census Tract 601, A...",8,1,60100,2040,39.9296836,-105.011453,"POLYGON ((-105.01091 39.92974, -105.01015 39.9...",80010601002
80050862002010,32,13,"Block 2010, Block Group 2, Census Tract 862, A...",8,5,86200,2010,39.6152277,-104.7508516,"POLYGON ((-104.75167 39.61533, -104.75124 39.6...",80050862002
80130136021017,2,7,"Block 1017, Block Group 1, Census Tract 136.02...",8,13,13602,1017,40.2421819,-105.5128035,"POLYGON ((-105.51228 40.24292, -105.51225 40.2...",80130136021
80239727002605,3,2,"Block 2605, Block Group 2, Census Tract 9727, ...",8,23,972700,2605,37.0357247,-105.5426739,"POLYGON ((-105.54269 37.04064, -105.54257 37.0...",80239727002
80310040031013,52,27,"Block 1013, Block Group 1, Census Tract 40.03,...",8,31,4003,1013,39.6558139,-104.9204474,"POLYGON ((-104.91922 39.65373, -104.91918 39.6...",80310040031
80350142021045,0,0,"Block 1045, Block Group 1, Census Tract 142.02...",8,35,14202,1045,39.4405716,-105.042176,"POLYGON ((-105.04271 39.43778, -105.04275 39.4...",80350142021
80410025022004,0,0,"Block 2004, Block Group 2, Census Tract 25.02,...",8,41,2502,2004,38.8232486,-104.8472738,"POLYGON ((-104.84694 38.82332, -104.84695 38.8...",80410025022
80410051062002,4,1,"Block 2002, Block Group 2, Census Tract 51.06,...",8,41,5106,2002,38.9193076,-104.7180262,"POLYGON ((-104.71880 38.91109, -104.71911 38.9...",80410051062
80459518042017,32,12,"Block 2017, Block Group 2, Census Tract 9518.0...",8,45,951804,2017,39.3882075,-107.2087764,"POLYGON ((-107.20972 39.38819, -107.20944 39.3...",80459518042


# Clean Block-Group Data

In [30]:
block_groups.head()

Unnamed: 0,P001001,H001001,NAME,state,county,tract,block group
0,1355,543,"Block Group 1, Census Tract 78.01, Adams Count...",8,1,7801,1
1,2377,829,"Block Group 2, Census Tract 78.01, Adams Count...",8,1,7801,2
2,1275,548,"Block Group 1, Census Tract 78.02, Adams Count...",8,1,7802,1
3,1133,490,"Block Group 2, Census Tract 78.02, Adams Count...",8,1,7802,2
4,1350,478,"Block Group 3, Census Tract 78.02, Adams Count...",8,1,7802,3


In [31]:
def block_group_id(df):
    return df.tract + df["block group"]

block_groups["blockgroup"] = block_groups.apply(block_group_id, axis=1)

def block_group_geoid(df):
    return df.state + df.county + df.tract + df["block group"]

block_groups["geoid"] = block_groups.apply(block_group_geoid, axis=1)
block_groups = block_groups.set_index("geoid")

In [33]:
block_list = []
for idx, bg in block_groups.iterrows():
    block_list.append(trimmed_blocks[trimmed_blocks.blockgroup == idx].index.values)

In [40]:
bg_area = []
bg_pop = []
bg_res = []
for blocks in block_list:
    bg_area.append(sum([trimmed_blocks.loc[block].geometry.area for block in blocks]))
    bg_pop.append(sum([int(trimmed_blocks.loc[block].population) for block in blocks]))
    bg_res.append(sum([int(trimmed_blocks.loc[block].residences) for block in blocks]))

In [41]:
block_groups["blocks"] = block_list
block_groups["area"] = bg_area
block_groups["population"] = bg_pop
block_groups["residences"] = bg_res

In [42]:
block_groups.head()

Unnamed: 0_level_0,P001001,H001001,NAME,state,county,tract,block group,blockgroup,blocks,area,population,residences
geoid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
80010078011,1355,543,"Block Group 1, Census Tract 78.01, Adams Count...",8,1,7801,1,78011,"[080010078011007, 080010078011008, 08001007801...",3.4e-05,1355,543
80010078012,2377,829,"Block Group 2, Census Tract 78.01, Adams Count...",8,1,7801,2,78012,"[080010078012010, 080010078012007, 08001007801...",3.4e-05,2377,829
80010078021,1275,548,"Block Group 1, Census Tract 78.02, Adams Count...",8,1,7802,1,78021,"[080010078021001, 080010078021002, 08001007802...",3.4e-05,1275,548
80010078022,1133,490,"Block Group 2, Census Tract 78.02, Adams Count...",8,1,7802,2,78022,"[080010078022000, 080010078022001, 08001007802...",1.7e-05,1133,490
80010078023,1350,478,"Block Group 3, Census Tract 78.02, Adams Count...",8,1,7802,3,78023,"[080010078023003, 080010078023004, 08001007802...",1.7e-05,1350,478


## What fraction of its block group is each block?

In [82]:
def assign_block_portion(block, bg, var):
    try:
        res = float(block[var])/float(bg.loc[block.blockgroup][var])
    except ZeroDivisionError:
        res = 0
    return res

for var in ['area', 'population', 'residences']:
    trimmed_blocks[var + '_prop'] = trimmed_blocks.apply(
        lambda x: assign_block_portion(x, block_groups, var), axis=1
    )

In [83]:
trimmed_blocks.head()

Unnamed: 0,population,residences,name,state,county,tract,block,INTPTLAT10,INTPTLON10,geometry,blockgroup,area,area_prop,population_prop,residences_prop
80010078011007,244,71,"Block 1007, Block Group 1, Census Tract 78.01,...",8,1,7801,1007,39.7410946,-104.8746404,"POLYGON ((-104.87523 39.74021, -104.87523 39.7...",80010078011,2e-06,0.060947,0.180074,0.130755
80010078011008,310,110,"Block 1008, Block Group 1, Census Tract 78.01,...",8,1,7801,1008,39.7410876,-104.8734686,"POLYGON ((-104.87407 39.74021, -104.87404 39.7...",80010078011,2e-06,0.061142,0.228782,0.202578
80010078011013,56,18,"Block 1013, Block Group 1, Census Tract 78.01,...",8,1,7801,1013,39.7410877,-104.8675985,"POLYGON ((-104.86701 39.74197, -104.86701 39.7...",80010078011,2e-06,0.061274,0.041328,0.033149
80010078011010,11,4,"Block 1010, Block Group 1, Census Tract 78.01,...",8,1,7801,1010,39.7410904,-104.8711363,"POLYGON ((-104.87173 39.74021, -104.87174 39.7...",80010078011,2e-06,0.062724,0.008118,0.007366
80010078011011,68,47,"Block 1011, Block Group 1, Census Tract 78.01,...",8,1,7801,1011,39.74109,-104.8699492,"POLYGON ((-104.86936 39.74021, -104.87054 39.7...",80010078011,2e-06,0.061446,0.050185,0.086556


# Which Precinct is each Block in?

In [332]:
blocks_gpd = gpd.GeoDataFrame(trimmed_blocks, crs=block_shapes.crs)

In [333]:
precincts_2016 = gpd.read_file("../data/processed/precincts/precincts_2016.geojson")
precincts_2018 = gpd.read_file("../data/processed/precincts/precincts_2018.geojson")

In [334]:
def find_overlaps_precinct_block(precincts, blocks, counties):

    precinct_assignment = []

    for county in counties:
        dt1 = time.time()
        precincts_to_match = precincts[precincts.COUNTYFP == county]
        blocks_to_match = blocks[blocks.county == county]
        assignment = []
        for bname, block in blocks_to_match.iterrows():
            block_assign = [bname]
            for pname, prec in precincts_to_match.iterrows():
                if prec.geometry.intersects(block.geometry):
                    block_assign.append(prec.VTDST5)

            assignment.append(block_assign)

        precinct_assignment.append(assignment)
        dt2 = time.time()
        print(f"county: {county}, precincts: {len(precincts_to_match)}, "
              f"blocks: {len(blocks_to_match)}, time: {dt2-dt1:.2F}")
        
    return precinct_assignment

bp_overlaps_2016 = find_overlaps_precinct_block(precincts_2016, blocks_gpd, county_fips)
bp_overlaps_2018 = find_overlaps_precinct_block(precincts_2018, blocks_gpd, county_fips)

county: 001, precincts: 249, blocks: 10351, time: 283.43
county: 003, precincts: 8, blocks: 1125, time: 1.95
county: 005, precincts: 395, blocks: 9118, time: 392.55
county: 007, precincts: 8, blocks: 1183, time: 2.03
county: 009, precincts: 9, blocks: 1188, time: 1.70
county: 011, precincts: 5, blocks: 1138, time: 1.39
county: 013, precincts: 233, blocks: 7368, time: 188.21
county: 014, precincts: 37, blocks: 1419, time: 6.24
county: 015, precincts: 15, blocks: 1898, time: 4.32
county: 017, precincts: 5, blocks: 612, time: 0.60
county: 019, precincts: 9, blocks: 1483, time: 2.55
county: 021, precincts: 10, blocks: 1150, time: 2.05
county: 023, precincts: 8, blocks: 2833, time: 4.53
county: 025, precincts: 6, blocks: 537, time: 0.65
county: 027, precincts: 3, blocks: 1206, time: 1.11
county: 029, precincts: 20, blocks: 1989, time: 5.68
county: 031, precincts: 346, blocks: 11039, time: 419.54
county: 033, precincts: 4, blocks: 673, time: 0.77
county: 035, precincts: 155, blocks: 5568, ti

In [335]:
def parse_precinct_overlaps_to_df(precinct_assignment):
    block_num = []
    assn = []
    for county in precinct_assignment:
        for block in county:
            block_num.append(block[0])
            assn.append(block[1:])
            
    block_assign = pd.DataFrame(data=np.zeros((len(assn), 2)), columns=['overlaps', 'idx'])
    block_assign.overlaps = assn
    block_assign.idx = block_num
    block_assign = block_assign.set_index('idx')
    return block_assign

In [336]:
block_overlaps_2016 = parse_precinct_overlaps_to_df(bp_overlaps_2016)
block_overlaps_2018 = parse_precinct_overlaps_to_df(bp_overlaps_2018)

In [337]:
block_overlaps_2016.head()

Unnamed: 0_level_0,overlaps
idx,Unnamed: 1_level_1
80010078011007,[01229]
80010078011008,[01229]
80010078011013,[01230]
80010078011010,[01229]
80010078011011,"[01229, 01230]"


In [382]:
def parse_matches(idx, matches, precinct_targets, length, trunc_thresh=0.01):
    default = [None, 0]
    if len(matches.overlaps) == 1:
        return [matches[0][0], 1.0] + (length-1)*default
    else:
        block_geom = trimmed_blocks.loc[idx].geometry
        #precincts = []
        #for i, match in enumerate(matches.overlaps):
        #    try:
        #        precincts.append(precinct_targets.loc[match].geometry)
        #    except:
        #        print(i, match)
        
        
        precincts = [precinct_targets.loc[match].geometry
                   for match in matches.overlaps]
        overlap = [block_geom.intersection(g)
                   for g in precincts]
        area = trimmed_blocks.loc[idx].geometry.area
        frac_overlap = np.array([over.area/area for over in overlap])
        sorter = np.argsort(frac_overlap).astype(int)
        ranked_match = np.flip(np.array(matches.overlaps)[sorter])
        ranked_frac = np.flip(frac_overlap[sorter])
        ranked_match = ranked_match[ranked_frac>trunc_thresh]
        ranked_frac = ranked_frac[ranked_frac>trunc_thresh]
        ranked_match = ranked_match[:length]
        ranked_frac = ranked_frac[:length]
        ranked_frac = ranked_frac/np.sum(ranked_frac)        
        ranked = [i for pair in zip(ranked_match, ranked_frac) for i in pair]
        
        ranked = ranked + default*(length-int(len(ranked)/2))
        return ranked

def parse_overlaps_to_assignments(precinct_df, overlap_df, num_matches=5):    
    precinct_targets = precinct_df.set_index("VTDST5")
    parsed_matches = []
    num_matches = 5
    for idx, matches in overlap_df.iterrows():
        parsed_matches.append(parse_matches(idx, matches, precinct_targets, num_matches))
    return parsed_matches

def parse_precinct_assingments_to_df(assignments, overlap_df, num_matches=5):
    columns = [["match"+str(i), "frac"+str(i)] for i in range(num_matches)]
    columns = [i for j in columns for i in j]
    new_assign = pd.DataFrame(index=overlap_df.index, data=assignments,
                          columns=columns)
    block_assignments = pd.concat([overlap_df, new_assign], axis=1)
    return block_assignments

In [383]:
block_assignments_2016 = parse_precinct_assingments_to_df(
            parse_overlaps_to_assignments(precincts_2016, block_overlaps_2016),
            block_overlaps_2016)

In [384]:
block_assignments_2018 = parse_precinct_assingments_to_df(
            parse_overlaps_to_assignments(precincts_2018, block_overlaps_2018),
            block_overlaps_2018)

In [388]:
block_assignments_all_2016 = gpd.GeoDataFrame(pd.concat([trimmed_blocks, block_assignments_2016], axis=1))
block_assignments_all_2018 = gpd.GeoDataFrame(pd.concat([trimmed_blocks, block_assignments_2018], axis=1))

In [396]:
assignments_to_save_2016 = block_assignments_all_2016.drop(['overlaps'], axis=1)
assignments_to_save_2018 = block_assignments_all_2018.drop(['overlaps'], axis=1)

assignments_to_save_2016.to_file("../data/processed/blocks/block_assignments_2016.geojson", driver='GeoJSON')
assignments_to_save_2018.to_file("../data/processed/blocks/block_assignments_2018.geojson", driver='GeoJSON')