## Census ACS 2009 Through 2015 5-Year Data

"This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau."  

data source:  http://www.census.gov/data/developers/data-sets/acs-5year.html  

In [1]:
import pandas as pd
import numpy as np
import os
import sys

version = ".".join(map(str, sys.version_info[:3]))
print('python version ', version)
print('numpy version ', np.__version__)
print('pandas version ',pd.__version__)

python version  3.5.2
numpy version  1.11.1
pandas version  0.18.1


Census APIs that are currently available  
http://www.census.gov/data/developers/data-sets.html  
American Community Survey 5-Year Data (2009-2015)  
http://www.census.gov/data/developers/data-sets/acs-5year.html  

## Using python census library (not used)
* Currently works with up to year 2014
* Note that if year is not specified it will default to 2013  
https://pypi.python.org/pypi/census

In [2]:
from census import Census
from census import __version__ as census__version__
from us import states

print('census library version ', census__version__)

census library version  0.8.1


In [3]:
# Retrieving my Census API key from a file outside of the
# local git repository 
api_key_filepath = os.environ.get('CENSUS_KEY_PATH')
fh = open(api_key_filepath,'r')
api_key = fh.read()
api_key = api_key.rstrip('\n')
fh.close()

In [4]:
c = Census(api_key)
c.acs5.get(('NAME', 'B25034_010E'),
            {'for': 'state:{}'.format(states.CA.fips)}, year=2014)

[{'B25034_010E': '1296802', 'NAME': 'California', 'state': '06'}]

## Newer census APIs using requests library
ACS 5 year  
http://www.census.gov/data/developers/data-sets/acs-5year.html  
Python requests library quick start guide  
http://docs.python-requests.org/en/master/user/quickstart/

In [5]:
import requests
print('requests version ', requests.__version__)

requests version  2.11.1


In [6]:
year = '2015' ## 5 year 2011 through 2015
census_api_url = "http://api.census.gov/data/" + year + "/acs5"
#payload = {'get':['NAME', 'B05003I_003E'], 'for':{'state':'*'},'key':api_key}
payload = {'get':['B05003I_003E'], 'for':{'county':'*'},'key':api_key}
r = requests.get(census_api_url, params=payload)

In [7]:
# Response is list of lists as a UTF-8 encoded string
# The first row contains the column headers
rows = r.text.split(',\n')
print('r.text type is ', type(r.text))
print('num_rows', len(rows))

r.text type is  <class 'str'>
num_rows 3221


In [8]:
rows[0:3]

['[["B05003I_003E","state","county"]',
 '["261","01","001"]',
 '["1505","01","003"]']

In [9]:
# Convert each row from a string to an actual list

# Strip characters from string and split
# on commas
def str_list2elements(s):
    s = s.replace('[','')
    s = s.replace(']','')
    s = s.replace('"','')
    elements = s.split(',')
    return elements

rows2 = [str_list2elements(s) for s in rows]

In [10]:
rows2[0:3]

[['B05003I_003E', 'state', 'county'],
 ['261', '01', '001'],
 ['1505', '01', '003']]

In [11]:
# Pop the zeroth element of rows2
columns = rows2.pop(0)
# Construct a DataFrame from rows2
acs5_09to15_df = pd.DataFrame(rows2)
acs5_09to15_df.columns = columns
print('acs5_09to15_df (num_rows,num_cols) ', acs5_09to15_df.shape)
acs5_09to15_df.head(5)

acs5_09to15_df (num_rows,num_cols)  (3220, 3)


Unnamed: 0,B05003I_003E,state,county
0,261,1,1
1,1505,1,3
2,342,1,5
3,62,1,7
4,959,1,9


In [12]:
acs5_09to15_df['GEOID'] = [s1+s2 for s1,s2 in zip(acs5_09to15_df.state,
                                               acs5_09to15_df.county)]
acs5_09to15_df.head(3)

Unnamed: 0,B05003I_003E,state,county,GEOID
0,261,1,1,1001
1,1505,1,3,1003
2,342,1,5,1005


In [15]:
filename_out = '../output/census_acs5_09to15_population_by_county.csv'
acs5_09to15_df.columns = ['B05003I_003E','STATE_FIPS','COUNTY_FIPS','GEOID']
acs5_09to15_df.to_csv(filename_out, cols=columns, index=False)

In [16]:
# Test loading of file
test_df = pd.read_csv(filename_out,
                      dtype={'B05003I_003E':int,
                             'STATE_FIPS':str,
                             'COUNTY_FIPS':str,
                             'GEOID':str})
print('(num_rows,num_cols) ', test_df.shape)
test_df.head(3)

(num_rows,num_cols)  (3220, 4)


Unnamed: 0,B05003I_003E,STATE_FIPS,COUNTY_FIPS,GEOID
0,261,1,1,1001
1,1505,1,3,1003
2,342,1,5,1005
