# Data Dive 2: Loading and Summarizing Data
### Part 1: Tapping the Census API for Valuable Insights

The U.S. Census API is an extremely rich data source by itself, but also one that can enhance and provide insights into other datasets. In the first part of today's exercise, we'll put 

Helpful links:
* [List](https://api.census.gov/data/2016/acs/acs5/variables.html) of census variables provided by the API. 
* [Sample queries](https://api.census.gov/data/2016/acs/acs5/examples.html)

Today we'll be using the 2016 ACS 5-year sample. The details of the different ACS samples are beyond the context of this class, but suffice to say that these estimates are built of five years of data from the ongoing ACS. 

The Census API, like many others, requires a *developer key* to use the API regularly. You can sign up for one [here](). 

In [1]:
import pandas as pd
import requests


In [2]:
url = 'https://api.census.gov/data/2016/acs/acs5?'
params = {'get' : 'NAME,B01001_001E,B19013_001E',
          'for' : 'county:*',
          'in' : 'state:*'}

r = requests.get(url, params=params)
print(r.url)

https://api.census.gov/data/2016/acs/acs5?get=NAME%2CB01001_001E%2CB19013_001E&for=county%3A%2A&in=state%3A%2A


Load our results into a data frame.

In [12]:
# In this case, we set the first row of results as the headers.
census_df = pd.DataFrame(r.json()[1:], columns=r.json()[0])


In [6]:

# Clean things up a bit by renaming our columns and resetting the index
census_df["County Number"] = census_df.state.astype(str) + census_df.county.astype(str) 

census_df = (census_df
             .rename(columns={'NAME' : 'County Name',
                              'B01001_001E' : 'Total Population',
                              'B19013_001E' : 'Median Household Income',
                             })
             .set_index('County Name')
             .drop(columns=['state', 'county']))

for col in ['Median Household Income', 'Total Population']:
    census_df[col] = census_df[col].astype(int)
      
census_df.sort_values(by='Median Household Income', ascending=False).head(10)

Unnamed: 0_level_0,Total Population,Median Household Income,County Number
County Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Loudoun County, Virginia",362435,125672,51107
"Falls Church city, Virginia",13597,115244,51610
"Fairfax County, Virginia",1132887,114329,51059
"Howard County, Maryland",308447,113800,24027
"Arlington County, Virginia",226092,108706,51013
"Hunterdon County, New Jersey",125708,108177,34019
"Los Alamos County, New Mexico",17895,105902,35028
"Douglas County, Colorado",314238,105759,8035
"Fairfax city, Virginia",23620,104065,51600
"Morris County, New Jersey",498215,102798,34027


In [7]:
census_df.sort_values(by='Total Population', ascending=False).head(10)

Unnamed: 0_level_0,Total Population,Median Household Income,County Number
County Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Los Angeles County, California",10057155,57952,6037
"Cook County, Illinois",5227575,56902,17031
"Harris County, Texas",4434257,55584,48201
"Maricopa County, Arizona",4088549,55676,4013
"San Diego County, California",3253356,66529,6073
"Orange County, California",3132211,78145,6059
"Miami-Dade County, Florida",2664418,44224,12086
"Kings County, New York",2606852,50640,36047
"Dallas County, Texas",2513054,51411,48113
"Riverside County, California",2323892,57972,6065


In [11]:
# census_df.to_csv('../../../web/www/data/census_counties.csv')