I've downloaded a bunch of yearly ACS data from the IPUMS NHGIS, but since this is only a 5 year average of a sample survey, the smallest geographic unit reported at is the block group, not the block. 

However, blocks nest nicely into block groups, so I should be able to disaggregate from block groups to blocks using maup and prorating by total population, which seems like a reasonable assumption to make.

* `ADK5E001` = `pop`: Total population
* `ADK5E001` = `TotPop`: Total population (again)
* `ADK5E004` = `BlackPop`: Total Black population
* `ADK5E012` = `HispPop`: Total Hispanic Population

In [64]:
import geopandas
import pandas as pd

# read in blocks shapefile
blck_grp = geopandas.read_file("zip://C:/Users/madie/OneDrive/data/ipums/VA_blck_grp_2019.zip")
# keep only the useful cols
blck_grp = blck_grp[["GEOID", "GISJOIN", "geometry"]].copy()
# read in population data csv
data = pd.read_csv("C:/Users/madie/OneDrive/data/ipums/VA_blck_grp_2015_pop.zip")
# keep only the relevant columns for total Black and Hispanic population (and vap)
data = data[["GISJOIN", "ADK5E001", "ADK5E001", "ADK5E004", "ADK5E012"]].copy()
# rename these cols to something more intelligible
data.columns = ["GISJOIN", "pop", "TotPop", "BlackPop", "HispPop"]
# merge the population data into the blocks shapefile
blck_grp = blck_grp.merge(data, on='GISJOIN')
blck_grp

Unnamed: 0,GEOID,GISJOIN,geometry,pop,TotPop,BlackPop,HispPop
0,510010901001,G51000100901001,"MULTIPOLYGON (((1781384.119 243080.884, 178137...",994,994,77,10
1,510010901002,G51000100901002,"POLYGON ((1782731.187 242733.316, 1782702.147 ...",816,816,0,0
2,510010901003,G51000100901003,"MULTIPOLYGON (((1781920.301 240213.966, 178191...",668,668,0,0
3,510010901004,G51000100901004,"MULTIPOLYGON (((1780346.252 237655.516, 178035...",452,452,0,28
4,510010902001,G51000100902001,"POLYGON ((1767188.764 240954.662, 1767173.531 ...",691,691,325,44
...,...,...,...,...,...,...,...
5316,518400003014,G51084000003014,"POLYGON ((1517324.368 330698.470, 1517321.984 ...",1141,1141,126,62
5317,518400003021,G51084000003021,"POLYGON ((1516961.823 327984.365, 1516917.605 ...",1578,1578,62,582
5318,518400003022,G51084000003022,"POLYGON ((1516283.567 328858.090, 1516269.411 ...",1360,1360,3,0
5319,518400003023,G51084000003023,"POLYGON ((1516961.823 327984.365, 1516947.695 ...",1635,1635,145,0


Ok, so that's looking good. All that's missing is the total VAP and VAP by race. The census is required by law to publish this specific table every year, but it rather annoyingly is not available from IPUMS, so I had to download it from the census directly. 

In [63]:
#read in csv file
vap = pd.read_csv("C:/Users/madie/OneDrive/data/census/VA_blockgroup_2011-2015_vap.zip", encoding="latin1")
# split up GEONAME columns on commas into 4 different things
vap[["blck_grp", "tract", "county", "state"]] = vap["GEONAME"].str.split(pat=",", expand=True)
# remove leading and trailing spaces from state col
vap['state'] = vap['state'].str.strip()
# filter to only incude virginia block groups
vap = vap.loc[vap['state'] == "Virginia"]
# group by unique identifier then by racial group
vap = vap.set_index(['geoid', "lnnumber"])
vap = vap[["CVAP_EST"]]
# "pivot" with geoid as row and lnnumber (race) as col
df_vap = vap.unstack()
# remove top level col name
df_vap = df_vap.droplevel(None, axis=1)
df_vap.columns.name = None
df_vap = df_vap.reset_index()
# filter to only include geoid, total, Black, Hispanic
df_vap = df_vap.filter(items=["geoid", 1, 5, 13])
# rename cols
df_vap.columns = ["geoid", "VAP", "BlackVAP", "HISPVAP"]
# reformat geoid to match that in other table
df_vap[["prefix", "GEOID"]] = df_vap["geoid"].str.split(pat="US", expand=True)
df_vap = df_vap.drop(columns=["prefix", "geoid"])
df_vap

Unnamed: 0,VAP,BlackVAP,HISPVAP,GEOID
0,830,35,0,510010901001
1,685,0,0,510010901002
2,535,0,0,510010901003
3,415,0,30,510010901004
4,455,220,10,510010902001
...,...,...,...,...
5327,985,115,0,518400003014
5328,795,0,100,518400003021
5329,1030,4,0,518400003022
5330,1230,80,0,518400003023


In [65]:
# merge in VAP
blck_grp = blck_grp.merge(df_vap, on='GEOID')
blck_grp

Unnamed: 0,GEOID,GISJOIN,geometry,pop,TotPop,BlackPop,HispPop,VAP,BlackVAP,HISPVAP
0,510010901001,G51000100901001,"MULTIPOLYGON (((1781384.119 243080.884, 178137...",994,994,77,10,830,35,0
1,510010901002,G51000100901002,"POLYGON ((1782731.187 242733.316, 1782702.147 ...",816,816,0,0,685,0,0
2,510010901003,G51000100901003,"MULTIPOLYGON (((1781920.301 240213.966, 178191...",668,668,0,0,535,0,0
3,510010901004,G51000100901004,"MULTIPOLYGON (((1780346.252 237655.516, 178035...",452,452,0,28,415,0,30
4,510010902001,G51000100902001,"POLYGON ((1767188.764 240954.662, 1767173.531 ...",691,691,325,44,455,220,10
...,...,...,...,...,...,...,...,...,...,...
5316,518400003014,G51084000003014,"POLYGON ((1517324.368 330698.470, 1517321.984 ...",1141,1141,126,62,985,115,0
5317,518400003021,G51084000003021,"POLYGON ((1516961.823 327984.365, 1516917.605 ...",1578,1578,62,582,795,0,100
5318,518400003022,G51084000003022,"POLYGON ((1516283.567 328858.090, 1516269.411 ...",1360,1360,3,0,1030,4,0
5319,518400003023,G51084000003023,"POLYGON ((1516961.823 327984.365, 1516947.695 ...",1635,1635,145,0,1230,80,0


In [67]:
blck_grp.to_file("C:/Users/madie/OneDrive/data/blck_grp/VA_blck_grp_2015_pop_vap.shp")