## data
### for grabbing / processing (census) data

In [1]:
from evaltools.data import *
import geopandas as gpd
import pandas as pd
import us

_Note: Sometimes, when calling any of the below functions, you may get an error code that looks like this:_
```
ValueError: Unexpected response (URL: ...): Sorry, the system is currently undergoing maintenance or is busy.  Please try again later. 
```
_This is due to a Census API issue and can't be fixed on our end. Usually, re-running the function again works like a charm!_

### census
Uses the US Census Bureau's API to retrieve 2020 Decennnial Census PL 94-171 data at the stated geometry level. The five tables are
 * P1: Race
 * P2: Hispanic or Latino, and Not Hispanic or Latino by Race
 * P3: Race for the Population 18 Years and Over (Race by VAP)
 * P4: Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
 * P5: Group Quarters Population by Major Group Quarters Type

In [2]:
%%time
df = census(us.states.MA, 
            table="P3", # Table from which we retrieve data, defaults to "P1"
            columns={}, # mapping Census column names from the table to human-readable names, if desired
            geometry="tract", # data granularity, one of "block" (default), "block group", or "tract"
           )

CPU times: user 78.3 ms, sys: 12.8 ms, total: 91.1 ms
Wall time: 2.8 s


In [3]:
# The `variables()` function produces the default mapping that `census()` uses 
# to map Census column-names to human-readable ones
mapping = variables("P3")

### acs5
Uses the US Census Bureau's API to retrieve 5-year population estimates from the American Community Survey (ACS) for the provided state, geometry level, and year.

In [11]:
%%time
acs5_df = acs5(us.states.MA,
               geometry="tract", # data granularity, either "tract" (default) or "block group"
               year=2019, # Year for which data is retrieved. Defaults to 2019, i.e. 2015-19 ACS 5-year
              )

CPU times: user 370 ms, sys: 25.3 ms, total: 396 ms
Wall time: 11.8 s


### cvap
Uses the US Census Bureau's API to retrieve the 2019 5-year CVAP (Citizen Voting Age Population) data for the provided state at the specified geometry. Please note that the geometries are from the **2010 Census**.

In [12]:
%%time
cvap_df = cvap(us.states.MA,
               geometry="tract", # data granularity, either "tract" (default) or "block group"
              )

CPU times: user 3.25 s, sys: 236 ms, total: 3.49 s
Wall time: 3.5 s


### estimating cvap
This function wraps the above `cvap()` and `acs5()` functions to help users pull forward CVAP estimates from 2019 (on 2010 geometries) to estimates for 2020 (on 2020 geometries). To use this, one must supply a base geodataframe with the 2020 geometries on which they want CVAP estimates. Additionally, users must specify the demographic groups whose CVAP statistics are to be estimated. For each group, users specify a triple $(X, Y, Z)$ where $X$ is the old CVAP column for that group, $Y$ is the old VAP column for that group, and $Z$ is the new VAP column for that group, which must be an existing column on `base`.  Then, the estimated new CVAP for that group will be constructed by multiplying $X / Y \cdot Z$ for each new geometry.

In [13]:
%%time
base = gpd.read_file("data/al_bg/") # Load AL 2020 block-group shapefile
cvap19 = acs5(us.states.AL) # Get CVAP19 estimates, for comparison

CPU times: user 1.05 s, sys: 82.9 ms, total: 1.14 s
Wall time: 12.6 s


In [14]:
estimates = estimatecvap(base,
                         us.states.AL,
                         groups=[ # (Old CVAP, Old VAP, new VAP)
                             ("WCVAP19", "WVAP19", "WVAP20"),
                             ("BCVAP19", "BVAP19", "BVAP20"),
                         ],
                         ceiling=1, # see below
                         zfill=0.1, # see below
                         geometry10="tract"
                        )

100%|██████████████████████████████████████| 1181/1181 [00:08<00:00, 131.74it/s]
100%|███████████████████████████████████████| 1181/1181 [00:15<00:00, 75.32it/s]


The `ceiling` parameter marks when we will cap the CVAP / VAP ratio to 1. Set to 1, this means that if there is ever more CVAP19 in a geometry than VAP19, we will "cap" the CVAP20 estimate to 100\% of the VAP20. The `zfill` parameter tells us what to do when there is 0 CVAP19 in a geometry. Set to 0.1, this will estimate that 10\% of the VAP20 is CVAP.

We can see that our estimate for Black-alone Voting Age Population in Alabama in 2020 is 970,120, down slightly from 970,239 in 2019.

In [15]:
print(f"AL BCVAP20: {estimates.BCVAP20_EST.sum()}")
print(f"AL BCVAP19: {cvap19.BCVAP19.sum()}")

AL BCVAP20: 970120.3645540088
AL BCVAP19: 970239


We can also make estimates of Black VAP in Alabama among `APBVAP` — Alabamians who identified as Black alone or in combination with other races. This bumps up the estimate to around 1,007,363.

In [16]:
estimates = estimatecvap(base,
                         us.states.AL,
                         groups=[
                             # Changing the new VAP column from BVAP20 -> APBVAP20
                             ("BCVAP19", "BVAP19", "APBVAP20"),
                         ],
                         ceiling=1,
                         zfill=0.1,
                         geometry10="tract"
                        )

100%|██████████████████████████████████████| 1181/1181 [00:08<00:00, 134.11it/s]
100%|███████████████████████████████████████| 1181/1181 [00:15<00:00, 76.28it/s]


In [17]:
print(f"AL APBCVAP20 estimate: {estimates.BCVAP20_EST.sum()}")

AL APBCVAP20 estimate: 1007362.5586538106
