# Census Data Prep
This notebook reads in the zip codes CSV from the `01_food_inspections_data_prep` notebook as well as an API Key for the US Census Data which I have stored in the directory above this git repository.  It then uses the US Census API to query for the median household income of the zip codes pulled from the Chicago Food Inspections Data detailed in `01_food_inspections_data_prep`.

To understand how to use the US Census API, I read through the references linked below.  This analysis could be replicated or augmented with additional census data by swapping out or adding to the tables and fields queried.  I indicate where in the code this could be done.

### Disclaimer
As required by census.gov: "This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau."

### References
- US Census Data API Terms of Service: https://www.census.gov/data/developers/about/terms-of-service.html
- API Key Signup: https://api.census.gov/data/key_signup.html
- API User Guide: https://www.census.gov/data/developers/guidance/api-user-guide.html
- American Community and Five Year Survey: https://www.census.gov/data/developers/data-sets/acs-5year.html
- American Community and Five Year Survey Variables: https://api.census.gov/data/2017/acs/acs5/variables.html
- Blogpost I Read to Help with Setup: https://towardsdatascience.com/getting-census-data-in-5-easy-steps-a08eeb63995d

### Set Global Seed

In [1]:
SEED = 666

### Imports

In [2]:
import numpy as np
import pandas as pd
import requests
import json

### Read Census Data API Key
Stored in the directory one level up from this git repository.

In [3]:
with open('../../../us_census_api_key.txt') as f:
    api_key = f.readline()

### Read Zip Codes from Chicago Food Inspections Data

In [4]:
zips_df = pd.read_csv('../data/Zips.csv')

### Remove NaNs, Cast to String, and Create Comma Separated String of All Zips

In [5]:
zips = zips_df['zip'].values
zips_array = zips[~np.isnan(zips)].astype(np.int64).astype(str)
zips_string = ','.join(zips_array)

### Call the Census.gov API by Passing:
- The API key
- The string of zip codes
- The code representing the field of interest, in this case: B19013_001E for Median Household Income

**Note:** this `api_base` string could be used with a different `api_key` and `zips_string` to pull median household income for other zip codes.  It could also be modified to pull other fields by changing parameters in the string, such as the census table at "&get=" where I have passed the Median Household Income table: B19013_001E.

In [6]:
api_base = "https://api.census.gov/data/2017/acs/acs5?key=%s&get=B19013_001E&for=zip%%20code%%20tabulation%%20area:%s"
api_call = api_base % (api_key, zips_string)

In [7]:
response = requests.get(api_call)

### Parse the Response
Note that the first element is the the name of the fields returned.

In [8]:
parsed_response = json.loads(response.text)[1:]
median_household_income_df = pd.DataFrame(columns=['median_household_income', 'zip'], data=parsed_response)

### Write the Dataframe of Median Household Income and Zip Code to CSV

In [9]:
median_household_income_df.to_csv('../data/Census_Features.csv', index=False)