In [2]:
import glob
import json
import requests
import pandas as pd
from pprint import pprint

# Census Examples 

This notebook grabs data from the US Census in Python using DataMade's Census API wrapper
- https://github.com/datamade/census
-  https://pypi.org/project/census/

> ðŸ’¡ note: You may also want to check out [tidycensus in R](https://walker-data.com/tidycensus/). I might make a notebook later that does the same with that library.

### Step 1 | Get a Census API key and replace it in the cell below

In [4]:
from us import states # US state abbreviations, and a few other things

# datamade's Census package
from census import Census
c = Census("ce99dfcd9884743b2a3bfcef24bd4eb5e58f6e4a")

### Step 2 | Figure out what tables you want data from

Use https://censusreporter.org/ to figure out which tables you want. 
- Scroll to the bottom of the page to see the tables. 
- If you already know the table ID, stick that in the "Explore" section to learn more about that table.

In [6]:
TABLE = 'B01003' #population

# Here I use data from the 5-year American Community Survey
# In this python package that is "c.acs5"
# Check DataMade's documentation for other options like acs1, pf, sf1, etc...
# https://pypi.org/project/census/
for t in c.acs5.tables():
    if TABLE in t['name']:
        pprint(t)
        print("\n")

        variables_url = t['variables']
        response = requests.get(variables_url).json()
        print(f"Variables for table {t['name']}, {t['description']}:")
        variables = pd.DataFrame(response['variables'])
        display(variables)

{'description': 'TOTAL POPULATION',
 'name': 'B01003',
 'universe ': 'TOTAL_POP',
 'variables': 'http://api.census.gov/data/2020/acs/acs5/groups/B01003.json'}


Variables for table B01003, TOTAL POPULATION:


Unnamed: 0,B01003_001E,B01003_001M,B01003_001MA,B01003_001EA
label,Estimate!!Total,Margin of Error!!Total,Annotation of Margin of Error!!Total,Annotation of Estimate!!Total
concept,TOTAL POPULATION,TOTAL POPULATION,TOTAL POPULATION,TOTAL POPULATION
predicateType,int,int,string,string
group,B01003,B01003,B01003,B01003
limit,0,0,0,0
predicateOnly,True,True,True,True
universe,TOTAL_POP,TOTAL_POP,TOTAL_POP,TOTAL_POP


In the cell below, I get the population by zipcode for the 2019 5-year ACS in New York State.

If you want the data by a different geography like county, census tract or "place" (NYC is a "census place"), see the documentation for the Census package.
https://pypi.org/project/census/

You may want to use  `state_place`, `state_county` or some other function to get the data for a different geography.

In [8]:
year = 2019
state = states.NY
population = pd.DataFrame(
    c.acs5.state_county_tract(
        fields = ['NAME'] + list(variables.columns),
        state_fips = state.fips,
        county_fips = '047,005,001,085,061', 
        tract='*',
        year = year,
        table=[TABLE]))

population['state'] = population.state.apply(lambda x: states.lookup(x).name)
population = population[['state','county', 'tract', 'NAME'] + list(variables.columns)]
population

Unnamed: 0,state,county,tract,NAME,B01003_001E,B01003_001M,B01003_001MA,B01003_001EA
0,New York,047,037300,"Census Tract 373, Kings County, New York",4486.0,451.0,,
1,New York,047,024100,"Census Tract 241, Kings County, New York",2765.0,455.0,,
2,New York,005,039000,"Census Tract 390, Bronx County, New York",2861.0,530.0,,
3,New York,047,096400,"Census Tract 964, Kings County, New York",1978.0,328.0,,
4,New York,005,033202,"Census Tract 332.02, Bronx County, New York",4033.0,474.0,,
...,...,...,...,...,...,...,...,...
1568,New York,085,027900,"Census Tract 279, Richmond County, New York",2219.0,175.0,,
1569,New York,047,056800,"Census Tract 568, Kings County, New York",1455.0,178.0,,
1570,New York,061,017700,"Census Tract 177, New York County, New York",9265.0,913.0,,
1571,New York,047,026800,"Census Tract 268, Kings County, New York",3965.0,473.0,,


Great! We have the data. But I want to replace the headers with more human-readable labels. 

Let's grab those from the variables response we got earlier.

In [9]:
labels = dict(variables.loc['label'])
labels

{'B01003_001E': 'Estimate!!Total',
 'B01003_001M': 'Margin of Error!!Total',
 'B01003_001MA': 'Annotation of Margin of Error!!Total',
 'B01003_001EA': 'Annotation of Estimate!!Total'}

In [10]:
population.rename(columns=labels)

Unnamed: 0,state,county,tract,NAME,Estimate!!Total,Margin of Error!!Total,Annotation of Margin of Error!!Total,Annotation of Estimate!!Total
0,New York,047,037300,"Census Tract 373, Kings County, New York",4486.0,451.0,,
1,New York,047,024100,"Census Tract 241, Kings County, New York",2765.0,455.0,,
2,New York,005,039000,"Census Tract 390, Bronx County, New York",2861.0,530.0,,
3,New York,047,096400,"Census Tract 964, Kings County, New York",1978.0,328.0,,
4,New York,005,033202,"Census Tract 332.02, Bronx County, New York",4033.0,474.0,,
...,...,...,...,...,...,...,...,...
1568,New York,085,027900,"Census Tract 279, Richmond County, New York",2219.0,175.0,,
1569,New York,047,056800,"Census Tract 568, Kings County, New York",1455.0,178.0,,
1570,New York,061,017700,"Census Tract 177, New York County, New York",9265.0,913.0,,
1571,New York,047,026800,"Census Tract 268, Kings County, New York",3965.0,473.0,,


In [None]:
#table= 'B01003' -> change this to wealth ,race etc and merge the table. 

# Hope that helps!