## Pulling a timeseries from the US. Census is easy with the census data collector.

The `CensusTimeSeries` class wraps built in methods to grab the most appropriate data and then builds a pandas dataframe of data from 2000 through 2020.

Let's get started by importing necessary packages

In [1]:
import os
import shapefile
from censusdc.utils import CensusTimeSeries
from censusdc import AcsVariables, Sf3Variables


Now let's get setup by importing our Census API key. Each user needs a unique api key which can be requested at https://api.census.gov/data/key_signup.html

In [2]:
apikey = os.path.join('..', "api_key.dat")
with open(apikey) as api:
    apikey = api.readline().strip()

## Using the `CensusTimeSeries` class:

The `CensusTimeSeries` is initiated with a shapefile much like the `TigerWeb` class shown in notebook 1. Parameters for the `CensusTimeSeries` class include:

   * `shp`: shapefile name
   * `apikey`: census api key string
   * `field`: identifying attribute field in the shapefile class to tag feature groups
   * `radius`: if using a point shapefile, radius can be a shapefile attribute field or a floating point value

In [3]:
shp_file = os.path.join("..", "data", "Sacramento_neighborhoods_WGS.shp")

ts = CensusTimeSeries(shp_file, apikey, field="name")

The `get_timeseries()` method performs multiple functions behind the scenes.
   1) It efficiently queries TigerWeb to grab geocodes and tigerlines
   2) It queries ACS1, ACS5, and SF3 products
   3) It caches information from census products for computational effieciency
   4) It allows the user to intersect polygons with census data
   
The `get_timeseries()` method can be parameterized using:
   * `feature_name`: the name of the feature group the user is requesting a timeseries
   * `sf3_variables`: tuple of variables to grab from the sf3 census. Default is all variables defined in Sf3Variables class
   * `acs_variables`: tuple of variables to grab from the ACS1 and ACS5 census. Defualt is all variables defined in the AcsVariables class
   * `hr_dict`: human readable label dict, assists in aligning data. If hr_dict is None, defaults are used that cover AcsVariables, Sf3Variables, and Sf3Variables1990 classes
   * `polygons`: list(shapely Polygon,), list(shapefile.Shape,), or list([x,y],[x_n, y_n]) shapes to intersect the timeseries with.
   * `retry`: default 1000. Used for intermittent connection issues

__Example usage__: let's get default data for the La Riveria neighborhood.

_Note:_ you'll notice that this grabs data for both La Riveria and Tahoe park. The data is cached, so if we want to put together a timeseries of tahoe park the `ts.get_timeseries` method will use the cached data to put together the timeseries!

_Note 2:_ For years 2005 - 2009 data comes from county level discretization and therefore is not as representative as the other years which use tract level information. Populations estimates from 2005 - 2009 may be significantly different using area weighted intersections becasue of the difference in population density between the area of interest and the entire county.

In [4]:
shp = shapefile.Reader(shp_file)
polygons = [shp.shape(0), ]


df = ts.get_timeseries("la_riviera", polygons=polygons)
df

Getting Tigerline data for census year 2000
Getting data for census year 2000
Getting Tigerline data for census year 2005
Getting data for census year 2005
Getting Tigerline data for census year 2006
Getting data for census year 2006
Getting Tigerline data for census year 2007
Getting data for census year 2007
Getting Tigerline data for census year 2008
Getting data for census year 2008
Getting Tigerline data for census year 2009
Getting data for census year 2009
Getting Tigerline data for census year 2010
Getting data for census year 2010
Getting Tigerline data for census year 2011
Getting data for census year 2011
Getting Tigerline data for census year 2012
Getting data for census year 2012
Getting Tigerline data for census year 2013
Getting data for census year 2013
Getting Tigerline data for census year 2014
Getting data for census year 2014
Getting Tigerline data for census year 2015
Getting data for census year 2015
Getting Tigerline data for census year 2016
Getting data for cen

Unnamed: 0,year,population,P052001,P052002,P052003,P052004,P052005,P052006,P052007,P052008,...,h_age_2000_2004,h_age_1990_1999,h_age_1980_1989,h_age_1970_1979,h_age_1960_1969,h_age_1950_1959,h_age_1940_1949,h_age_older_1939,median_h_year,gini
0,2000,14543.120514,6298.044821,543.281303,253.56834,310.689948,522.759893,418.725151,495.346931,376.44675,...,,,,,,,,,,
1,2005,4732.510526,,,,,,,,,...,190.012521,242.348292,354.076461,417.550738,229.165309,234.644365,87.297028,98.658375,1977.0,
2,2006,4862.614224,,,,,,,,,...,199.322318,249.40492,344.34928,390.827519,248.212898,245.482213,98.584095,92.096949,1977.0,
3,2007,4904.858486,,,,,,,,,...,209.604832,231.330049,328.66199,396.1757,249.337714,251.969358,89.571419,106.089941,1977.0,
4,2008,4931.341179,,,,,,,,,...,213.219807,243.349307,310.548211,395.447046,249.447366,244.297265,95.715461,103.419388,1977.0,
5,2009,16921.259978,,,,,,,,,...,340.839218,273.315722,872.307571,2977.672953,1380.378531,588.873463,312.157808,403.15919,1972.286,
6,2010,15890.654204,,,,,,,,,...,2.313275,207.491438,581.260086,3483.447094,2576.833024,285.303302,33.453046,54.589952,1972.143,0.412786
7,2011,15836.23273,,,,,,,,,...,9.974665,208.514945,527.898546,3494.407352,2587.611999,249.396726,55.287049,61.767226,-51280230.0,0.402877
8,2012,15717.275745,,,,,,,,,...,16.621701,195.270438,600.10822,3474.00596,2482.648504,228.926734,58.485714,54.685949,1972.231,0.412177
9,2013,15749.537664,,,,,,,,,...,15.497372,201.675978,531.412432,3652.918946,2327.223721,257.691745,76.794394,96.617692,1972.308,0.415238


Now let's do the same for Tahoe park using the Cached data

In [5]:
polygons = [shp.shape(1), ]

df = ts.get_timeseries('tahoe_park', polygons=polygons)
df

Performing intersections and building DataFrame


Unnamed: 0,year,population,P052001,P052002,P052003,P052004,P052005,P052006,P052007,P052008,...,h_age_2000_2004,h_age_1990_1999,h_age_1980_1989,h_age_1970_1979,h_age_1960_1969,h_age_1950_1959,h_age_1940_1949,h_age_older_1939,median_h_year,gini
0,2000,0.212571,0.068132,0.012165,0.004856,0.008126,0.00327,0.003558,0.005962,0.00476,...,,,,,,,,,,
1,2005,2175.347832,,,,,,,,,...,87.341238,111.397921,162.754939,191.931553,105.338225,107.85673,40.126989,45.349351,1977.0,
2,2006,2235.151354,,,,,,,,,...,91.620583,114.641573,158.283739,179.647946,114.093648,112.83846,45.315208,42.333323,1977.0,
3,2007,2254.56937,,,,,,,,,...,96.347048,106.33327,151.072912,182.106294,114.610681,115.820344,41.172437,48.765348,1977.0,
4,2008,2266.742416,,,,,,,,,...,98.008708,111.858048,142.746725,181.77136,114.661084,112.293787,43.99661,47.537801,1977.0,
5,2009,7326.569508,,,,,,,,,...,322.026548,202.735457,469.779905,491.724586,374.680939,430.83492,278.586676,382.306193,1971.0,
6,2010,6999.85811,,,,,,,,,...,303.55351,57.557124,87.779623,268.768926,317.054039,514.856271,1104.076215,904.567372,1949.428571,0.419429
7,2011,6922.229783,,,,,,,,,...,199.899354,39.578985,87.497762,275.137017,356.810843,538.057832,1085.809874,890.283958,1950.428571,0.450271
8,2012,7282.228129,,,,,,,,,...,396.19488,61.786002,90.453724,305.4032,282.383087,543.423202,984.133406,917.704729,1948.428571,0.470643
9,2013,7789.038151,,,,,,,,,...,401.494379,57.612607,82.715747,390.536889,303.26191,649.815919,943.8831,737.837777,1949.571429,0.484271


### Finally let's use this again but specify the census data we'd like to request from the census API

In [6]:
ts = CensusTimeSeries(shp_file, apikey, field='name')

In [7]:
sf3vars = (Sf3Variables.population, Sf3Variables.median_income)
acsvars = (AcsVariables.population, AcsVariables.median_income)

polygon = shp.shape(0)
df = ts.get_timeseries("la_riviera", sf3_variables=sf3vars,
                       acs_variables=acsvars,
                       polygons=polygon)
df

Getting Tigerline data for census year 2000
Getting data for census year 2000
Getting Tigerline data for census year 2005
Getting data for census year 2005
Getting Tigerline data for census year 2006
Getting data for census year 2006
Getting Tigerline data for census year 2007
Getting data for census year 2007
Getting Tigerline data for census year 2008
Getting data for census year 2008
Getting Tigerline data for census year 2009
Getting data for census year 2009
Getting Tigerline data for census year 2010
Getting data for census year 2010
Getting Tigerline data for census year 2011
Getting data for census year 2011
Getting Tigerline data for census year 2012
Getting data for census year 2012
Getting Tigerline data for census year 2013
Getting data for census year 2013
Getting Tigerline data for census year 2014
Getting data for census year 2014
Getting Tigerline data for census year 2015
Getting data for census year 2015
Getting Tigerline data for census year 2016
Getting data for cen

Unnamed: 0,year,population,HCT012001,pop_density,median_income
0,2000,14543.120514,50303.307692,6769820.0,
1,2005,4732.510526,,2110166.0,51793.0
2,2006,4862.614224,,2168178.0,53930.0
3,2007,4904.858486,,2187014.0,56987.0
4,2008,4931.341179,,2198822.0,56984.0
5,2009,16921.259978,,6159779.0,52394.571429
6,2010,15890.654204,,6528632.0,59605.642857
7,2011,15836.23273,,6842570.0,57320.461538
8,2012,15717.275745,,6766971.0,56320.153846
9,2013,15749.537664,,6745843.0,54183.769231


In [8]:
polygon = shp.shape(1)

df = ts.get_timeseries("tahoe_park", sf3_variables=sf3vars,
                       acs_variables=acsvars,
                       polygons=polygon)
df

Performing intersections and building DataFrame


Unnamed: 0,year,population,HCT012001,pop_density,median_income
0,2000,0.212571,30846.0,12951550.0,
1,2005,2175.347832,,2110166.0,51793.0
2,2006,2235.151354,,2168178.0,53930.0
3,2007,2254.56937,,2187014.0,56987.0
4,2008,2266.742416,,2198822.0,56984.0
5,2009,7326.569508,,7107150.0,50381.0
6,2010,6999.85811,,9851311.0,42597.71
7,2011,6922.229783,,9931613.0,43395.43
8,2012,7282.228129,,10203060.0,43660.14
9,2013,7789.038151,,10428420.0,44068.71
