## Pulling a timeseries from the US. Census is easy with the census data collector.

The `CensusTimeSeries` class wraps built in methods to grab the most appropriate data and then builds a pandas dataframe of data from 1990 through 2018.

Let's get started by importing necessary packages

In [1]:
import os
import shapefile
from censusdc.utils import CensusTimeSeries
from censusdc import AcsVariables, Sf3Variables, Sf3Variables1990


Now let's get setup by importing our Census API key. Each user needs a unique api key which can be requested at https://api.census.gov/data/key_signup.html

In [2]:
apikey = os.path.join('..', "api_key.dat")
with open(apikey) as api:
    apikey = api.readline().strip()

## Using the `CensusTimeSeries` class:

The `CensusTimeSeries` is initiated with a shapefile much like the `TigerWeb` class shown in notebook 1. Parameters for the `CensusTimeSeries` class include:

   * `shp`: shapefile name
   * `apikey`: census api key string
   * `field`: identifying attribute field in the shapefile class to tag feature groups
   * `radius`: if using a point shapefile, radius can be a shapefile attribute field or a floating point value

In [3]:
shp_file = os.path.join("..", "data", "Sacramento_neighborhoods_WGS.shp")

ts = CensusTimeSeries(shp_file, apikey, field="name")

The `get_timeseries()` method performs multiple functions behind the scenes.
   1) It efficiently queries TigerWeb to grab geocodes and tigerlines
   2) It queries ACS1, ACS5, and SF3 products
   3) It caches information from census products for computational effieciency
   4) It allows the user to intersect polygons with census data
   
The `get_timeseries()` method can be parameterized using:
   * `feature_name`: the name of the feature group the user is requesting a timeseries
   * `sf3_variables`: tuple of variables to grab from the sf3 census. Default is all variables defined in Sf3Variables class
   * `sf3_variables_1990`: tuple of variables to grab from the 1990 sf3 census. Default is all variables defined in Sf3Variables1990 class
   * `acs_variables`: tuple of variables to grab from the ACS1 and ACS5 census. Defualt is all variables defined in the AcsVariables class
   * `hr_dict`: human readable label dict, assists in aligning data. If hr_dict is None, defaults are used that cover AcsVariables, Sf3Variables, and Sf3Variables1990 classes
   * `polygons`: list(shapely Polygon,), list(shapefile.Shape,), or list([x,y],[x_n, y_n]) shapes to intersect the timeseries with.
   * `retry`: default 1000. Used for intermittent connection issues

__Example usage__: let's get default data for the La Riveria neighborhood.

_Note:_ you'll notice that this grabs data for both La Riveria and Tahoe park. The data is cached, so if we want to put together a timeseries of tahoe park the `ts.get_timeseries` method will use the cached data to put together the timeseries!

_Note 2:_ For years 2005 - 2009 data comes from county level discretization and therefore is not as representative as the other years which use tract level information. Populations estimates from 2005 - 2009 may be significantly different using area weighted intersections becasue of the difference in population density between the area of interest and the entire county.

In [4]:
shp = shapefile.Reader(shp_file)
polygons = [shp.shape(0), ]


df = ts.get_timeseries("la_riviera", polygons=polygons)
df

Received 13 entries, 13 total
Received 5 entries, 18 total
Received 2 entries, 20 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 1 entries, 1 total
Received 1 entries, 1 total
Received 14 entries, 14 total
Received 6 entries, 20 total
Received 3 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 2 entries, 10 total
Received 13 entries, 13 total
Received 6 entries, 19 total
Received 4 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 1 entries, 9 total
Received 13 entries, 13 total
Received 5 entries, 18 total
Received 3 entries, 21 total
Received 2 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 3 entries, 9 total
Received 13 entries, 13 total
Received 5 entries, 18 total
Received 3 entries, 21 total
Received 2 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 3 entries, 9

Getting tract data for tahoe_park feature # 3
Getting tract data for tahoe_park feature # 4
Getting tract data for tahoe_park feature # 5
Getting data for census year 2015
Getting tract data for la_riviera feature # 0
Getting tract data for la_riviera feature # 1
Getting tract data for la_riviera feature # 2
Getting tract data for la_riviera feature # 3
Getting tract data for la_riviera feature # 4
Getting tract data for la_riviera feature # 5
Getting tract data for la_riviera feature # 6
Getting tract data for la_riviera feature # 7
Getting tract data for la_riviera feature # 8
Getting tract data for la_riviera feature # 9
Getting tract data for la_riviera feature # 10
Getting tract data for la_riviera feature # 11
Getting tract data for la_riviera feature # 12
Getting tract data for tahoe_park feature # 0
Getting tract data for tahoe_park feature # 1
Getting tract data for tahoe_park feature # 2
Getting tract data for tahoe_park feature # 3
Getting tract data for tahoe_park feature #

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Unnamed: 0,households,households2,income_100k_125k,income_10K_15k,income_10k_12k,income_125k_150k,income_13k_15k,income_150k_200k,income_15k_17k,income_15k_20k,...,income_55k_60k,income_5K_10k,income_60k_75k,income_75k_100k,income_gt_200k,income_lt_10k,income_lt_5k,median_income,population,year
0,,,109.568208,,237.16351,32.355595,190.349525,,220.819167,,...,322.735406,568.62756,628.234816,376.733869,,,196.398162,41111.181818,15134.640992,1990
1,6298.605389,,342.909825,253.583314,,152.672071,,80.077888,,310.716347,...,,,832.858482,628.473357,42.539261,543.300964,,50303.307692,14544.594289,2000
2,,1756.716739,141.293104,97.19503,,83.487676,,58.113658,,80.777382,...,,,191.003079,231.618514,38.709372,114.147652,,51793.0,4740.151392,2005
3,,1772.468328,150.982847,84.51865,,92.950675,,70.722723,,88.812606,...,,,199.647676,237.726417,50.574435,103.944193,,53930.0,4870.465149,2006
4,,1783.327218,149.370842,98.236633,,96.628172,,89.563693,,87.735574,...,,,199.856705,256.220186,57.755828,83.416818,,56987.0,4912.777617,2007
5,,1778.735661,161.366992,93.545877,,93.184504,,86.548713,,78.821719,...,,,204.795463,246.052156,62.124184,76.85897,,56984.0,4939.303067,2008
6,,1797.498688,145.633116,104.457909,,87.437973,,84.178535,,88.199689,...,,,187.764898,234.945266,56.544167,101.729901,,52504.0,4963.376852,2009
7,,6643.746195,427.166442,319.956285,,350.366288,,196.683631,,353.260733,...,,,835.189904,903.55504,160.285878,380.554907,,59605.642857,15891.96914,2010
8,,6705.39254,604.770784,265.401077,,326.03674,,238.738311,,356.98801,...,,,803.281604,838.075509,127.30605,474.235055,,57320.461538,15837.636749,2011
9,,6598.456793,566.322775,276.96862,,388.757953,,210.594522,,326.109838,...,,,748.646202,862.29552,113.774823,480.691855,,56320.153846,15718.643943,2012


Now let's do the same for Tahoe park using the Cached data

In [5]:
polygon = shp.shape(1)

df = ts.get_timeseries('tahoe_park', polygons=polygon)
df

Unnamed: 0,households,households2,income_100k_125k,income_10K_15k,income_10k_12k,income_125k_150k,income_13k_15k,income_150k_200k,income_15k_17k,income_15k_20k,...,income_55k_60k,income_5K_10k,income_60k_75k,income_75k_100k,income_gt_200k,income_lt_10k,income_lt_5k,median_income,population,year
0,,,20.94811,,148.852336,0.112576,188.07994,,168.79262,,...,181.680973,482.562411,211.979465,31.040102,,,157.369239,21734.0,7054.755632,1990
1,3381.303408,,166.584411,282.052309,,26.742608,,56.456392,,228.246607,...,,,345.327857,260.370345,22.063147,415.883011,,30835.285714,6957.108138,2000
2,,807.302417,64.931506,44.666155,,38.366915,,26.706239,,37.121395,...,,,87.775817,106.440715,17.788963,52.456764,,51793.0,2178.345313,2005
3,,814.541089,69.384446,38.840701,,42.715654,,32.500758,,40.813997,...,,,91.748458,109.247613,23.241575,47.767746,,53930.0,2238.23124,2006
4,,819.531311,68.643646,45.144826,,44.405654,,41.15916,,40.319045,...,,,91.844518,117.746459,26.541797,38.334353,,56987.0,2257.676013,2007
5,,817.421253,74.156499,42.98918,,42.823111,,39.773621,,36.222666,...,,,94.114133,113.073722,28.549283,35.320681,,56984.0,2269.865832,2008
6,,826.043837,66.925967,48.003825,,40.182282,,38.684401,,40.53233,...,,,86.287705,107.96953,25.984976,46.750164,,52504.0,2280.928985,2009
7,,3455.102872,245.729218,321.85674,,241.930743,,95.986745,,169.970499,...,,,290.841526,440.961931,88.451293,363.016787,,42597.714286,6999.967737,2010
8,,3415.905786,243.738077,374.283456,,190.28068,,138.801612,,161.966818,...,,,287.829641,467.376595,122.758515,259.292922,,43395.428571,6922.338582,2011
9,,3400.651609,195.325619,390.500079,,180.033796,,175.997621,,184.652373,...,,,297.052042,431.608552,159.100271,320.738339,,43660.142857,7282.330598,2012


### Finally let's use this again but specify the census data we'd like to request from the census API

In [6]:
ts = CensusTimeSeries(shp_file, apikey, field='name')

In [7]:
sf3vars = (Sf3Variables.population, Sf3Variables.median_income)
sf3vars1990 = (Sf3Variables1990.population, Sf3Variables1990.median_income)
acsvars = (AcsVariables.population, AcsVariables.median_income)

polygon = shp.shape(0)
df = ts.get_timeseries("la_riviera", sf3_variables=sf3vars,
                       sf3_variables_1990=sf3vars1990, acs_variables=acsvars,
                       polygons=polygon)
df

Received 13 entries, 13 total
Received 5 entries, 18 total
Received 2 entries, 20 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 1 entries, 1 total
Received 1 entries, 1 total
Received 14 entries, 14 total
Received 6 entries, 20 total
Received 3 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 2 entries, 10 total
Received 13 entries, 13 total
Received 6 entries, 19 total
Received 4 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 2 entries, 8 total
Received 1 entries, 9 total
Received 13 entries, 13 total
Received 5 entries, 18 total
Received 3 entries, 21 total
Received 2 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 3 entries, 9 total
Received 13 entries, 13 total
Received 5 entries, 18 total
Received 3 entries, 21 total
Received 2 entries, 23 total
Received 1 entries, 24 total
Received 6 entries, 6 total
Received 3 entries, 9

Getting tract data for tahoe_park feature # 3
Getting tract data for tahoe_park feature # 4
Getting tract data for tahoe_park feature # 5
Getting data for census year 2015
Getting tract data for la_riviera feature # 0
Getting tract data for la_riviera feature # 1
Getting tract data for la_riviera feature # 2
Getting tract data for la_riviera feature # 3
Getting tract data for la_riviera feature # 4
Getting tract data for la_riviera feature # 5
Getting tract data for la_riviera feature # 6
Getting tract data for la_riviera feature # 7
Getting tract data for la_riviera feature # 8
Getting tract data for la_riviera feature # 9
Getting tract data for la_riviera feature # 10
Getting tract data for la_riviera feature # 11
Getting tract data for la_riviera feature # 12
Getting tract data for tahoe_park feature # 0
Getting tract data for tahoe_park feature # 1
Getting tract data for tahoe_park feature # 2
Getting tract data for tahoe_park feature # 3
Getting tract data for tahoe_park feature #

Unnamed: 0,year,population,median_income
0,1990,15134.640992,41111.181818
1,2000,14544.594289,50303.307692
2,2005,4740.151392,51793.0
3,2006,4870.465149,53930.0
4,2007,4912.777617,56987.0
5,2008,4939.303067,56984.0
6,2009,4963.376852,52504.0
7,2010,15891.96914,59605.642857
8,2011,15837.636749,57320.461538
9,2012,15718.643943,56320.153846


In [8]:
polygon = shp.shape(1)

df = ts.get_timeseries("tahoe_park", sf3_variables=sf3vars,
                       sf3_variables_1990=sf3vars1990, acs_variables=acsvars,
                       polygons=polygon)
df

Unnamed: 0,year,population,median_income
0,1990,7054.755632,21734.0
1,2000,6957.108138,30835.285714
2,2005,2178.345313,51793.0
3,2006,2238.23124,53930.0
4,2007,2257.676013,56987.0
5,2008,2269.865832,56984.0
6,2009,2280.928985,52504.0
7,2010,6999.967737,42597.714286
8,2011,6922.338582,43395.428571
9,2012,7282.330598,43660.142857
