Goal: Merge several ACS variables of interest by zip code to one data frame. The variables we chose are:

* DP05_0032PE: Percent of nonwhite residents
* DP03_0062E: Median household income
* DP03_0120PE: Percent of households with an 18 year-old child or younger below the poverty line
* DP02_0066PE: Percent high school graduates
* DP02_0067PE: Percent college graduates

In [42]:
import numpy as np
import pandas as pd
import requests

The below function allows us to specify a variable and get data back at the zip code level

In [43]:
def get_ma_data(variable):
  api_base = "https://api.census.gov/data/2015/acs5/profile?for=zip%20code%20tabulation%20area:*&get=NAME,"
  return requests.get(api_base + variable).json()

We only needed data for the zip codes in our dataset, so we pulled a list from our school address file. Once we had the data from our API request, we passed it through the list of zip codes and kept only the relevant columns and rows. 

In [44]:
ma_zip_codes = pd.read_csv("school_addresses.csv", encoding = "ISO-8859-1", converters={'Zip Code': str})
zip_list = list(set(ma_zip_codes['Zip Code'].tolist()))

In [55]:
def create_df(data, var):
    df = []
    for i in data:
        if i[2] in zip_list:
            df.append(i)
    frame = pd.DataFrame(df, columns = ["Name", var, "Zip Code"])
    return frame

In [61]:
perc_white = get_ma_data("DP05_0032PE")
perc_white_df = create_df(perc_white, "Zip Perc White")

In [63]:
median_income = get_ma_data("DP03_0062E")
median_income_df = create_df(median_income, "Median Income")

In [65]:
perc_poverty_kids = get_ma_data("DP03_0120PE")
percent_poverty_kids_df = create_df(hs_grad, "Percent Poverty")

In [66]:
hs_grad = get_ma_data("DP02_0066PE")
hs_grad_df = create_df(hs_grad, "% HS Graduates")

In [67]:
college_grad = get_ma_data("DP02_0067PE")
college_grad_df = create_df(college_grad, "% College Graduates")

We then merged the dataframes together before exporting our master file.

In [70]:
acs_all = pd.merge(median_income_df, percent_poverty_kids_df)
acs_all = pd.merge(acs_all, hs_grad_df)
acs_all = pd.merge(acs_all, college_grad_df)
acs_all = pd.merge(acs_all, perc_white_df)
acs_all.head()

Unnamed: 0,Name,Median Income,Zip Code,Percent Poverty,% HS Graduates,% College Graduates,Zip Perc White
0,ZCTA5 01001,60161,1001,91.5,91.5,28.0,91.8
1,ZCTA5 01002,50540,1002,95.6,95.6,68.0,78.4
2,ZCTA5 01005,68786,1005,95.7,95.7,24.0,97.3
3,ZCTA5 01007,76881,1007,92.4,92.4,42.0,94.9
4,ZCTA5 01010,87961,1010,94.0,94.0,41.6,100.0


In [41]:
acs_all.to_csv("acs_indicators_by_zip.csv")