### Getting Census data from the API
(To reproduce the data below, you'll need to save your Census API key to `../data/census-api-key.txt`. You can obtain a key here: https://api.census.gov/data/key_signup.html)

The dependencies for this script are:
- pandas
- requests
- census
- us

In [1]:
import pandas as pd
import requests
from census import Census
from us import states


In [2]:
api_key = open("../data/census-api-key.txt").read().strip()
c = Census(api_key)

Select data categories for gentrification measurement:

In [3]:
# Full API variable list available here https://api.census.gov/data/2016/acs/acs5/variables/
categories = [
     'NAME', # county name
     'B01001_001E', # Total population
     'B15002_001E', # Total population 25 and over
     'B19013_001E', # Median income
     'B25077_001E', # Median home value
     'B15011_001E', # Total population age 25+ years with a bachelor's degree or higher
     'B03002_003E', # Not Hispanic or Latino!!White alone
     'B03002_004E', # Not Hispanic or Latino!!Black or African American alone
     'B02001_004E', # American Indian and Alaska Native Alone
     'B03002_006E', # Not Hispanic or Latino!!Asian alone
     'B03002_007E', # Not Hispanic or Latino!!Native Hawaiian and Other Pacific Islander alone
     'B03002_008E', # Not Hispanic or Latino!!Some other race alone
     'B03002_009E', # Not Hispanic or Latino!!Two or more races
     'B03002_012E', # Hispanic or Latino
]

Function to run API scraper:

In [4]:
def get_acs_data(state_code, county_code, timeperiod, city, metro_area):
    results = c.acs5.state_county_tract(
        categories,
        state_code,
        county_code, 
        Census.ALL,
        year = timeperiod
    )

    return [ {
        'geoid': res['state'] + res['county'] + res['tract'],
        'name': res['NAME'],
        'total_population': res['B01001_001E'],
        'total_population_25_over': res['B15002_001E'],
        'median_income': res['B19013_001E'],
        'median_home_value': res['B25077_001E'],
        'educational_attainment': res['B15011_001E'],
        'white_alone': res['B03002_003E'],
        'black_alone': res['B03002_004E'],
        'native_alone': res['B02001_004E'],
        'asian_alone': res['B03002_006E'],
        'native_hawaiian_pacific_islander': res['B03002_007E'],
        'some_other_race_alone': res['B03002_008E'],
        'two_or_more': res['B03002_009E'],
        'hispanic_or_latino': res['B03002_012E'],
        'city': city,
        'metro_area': metro_area
    } for res in results ]

This script downloads data for five Metropolitan Statistical Areas (MSAs) which are all laid out in the `county_names.csv` in the `data` folder:

- Atlanta-Sandy Springs-Alpharetta, GA (for Atlanta)
- Baltimore-Columbia-Towson, MD (for Baltimore)
- New York-Newark-Jersey City, NY-NJ-PA (for New York City)
- San Francisco-Oakland-Berkeley, CA (for Oakland)
- Washington-Arlington-Alexandria, DC-VA-MD-WV (for Washington, D.C.)

For each MSA, the code below fetches demographic data for every tract in every county within it. (The list of counties was [obtained from the Bureau of Economic Analysis](https://apps.bea.gov/regional/docs/msalist.cfm) — "Counties in Micropolitan Statistical Areas". You can modify this file to gather data for other cities.)

In [5]:
metro_area_counties = pd.read_csv(
    '../data/county_names.csv',
    dtype = {
        "full_county_geocode": str,
        "state_code": str,
        "county_code": str
    }
)

metro_area_counties.head()

Unnamed: 0,county_name,full_county_geocode,state_code,county_code,metro_area_name,city,state
0,"District of Columbia, DC",11001,11,1,Washington-Arlington-Alexandria,Washington,DC
1,"Calvert, MD",24009,24,9,Washington-Arlington-Alexandria,Washington,DC
2,"Charles, MD",24017,24,17,Washington-Arlington-Alexandria,Washington,DC
3,"Frederick, MD",24021,24,21,Washington-Arlington-Alexandria,Washington,DC
4,"Montgomery, MD",24031,24,31,Washington-Arlington-Alexandria,Washington,DC


Go through every county and find census tracts for each:

In [6]:
census_data = []
for index, county in metro_area_counties.iterrows():
    print(county["county_name"])
    
    census_data += get_acs_data(
        county["state_code"], 
        county["county_code"], 
        2017, 
        county["city"], 
        county["metro_area_name"]
    )

census_data = pd.DataFrame(census_data)

District of Columbia, DC
Calvert, MD
Charles, MD
Frederick, MD
Montgomery, MD
Prince George's, MD
Arlington, VA
Clarke, VA
Culpeper, VA
Fauquier, VA
Loudoun, VA
Madison, VA
Rappahannock, VA
Stafford, VA
Warren, VA
Alexandria (Independent City), VA
Fairfax, Fairfax City + Falls Church, VA
Prince William, Manassas + Manassas Park, VA
Spotsylvania + Fredericksburg, VA
Jefferson, WV
Anne Arundel, MD
Baltimore, MD
Carroll, MD
Harford, MD
Howard, MD
Queen Anne's, MD
Baltimore (Independent City), MD
Barrow, GA
Bartow, GA
Butts, GA
Carroll, GA
Cherokee, GA
Clayton, GA
Cobb, GA
Coweta, GA
Dawson, GA
DeKalb, GA
Douglas, GA
Fayette, GA
Forsyth, GA
Fulton, GA
Gwinnett, GA
Haralson, GA
Heard, GA
Henry, GA
Jasper, GA
Lamar, GA
Meriwether, GA
Morgan, GA
Newton, GA
Paulding, GA
Pickens, GA
Pike, GA
Rockdale, GA
Spalding, GA
Walton, GA
Alameda, CA
Contra Costa, CA
Marin, CA
San Francisco, CA
San Mateo, CA
Bergen, NJ
Essex, NJ
Hudson, NJ
Hunterdon, NJ
Middlesex, NJ
Monmouth, NJ
Morris, NJ
Ocean, NJ
Pass

In [7]:
census_data.head()

Unnamed: 0,geoid,name,total_population,total_population_25_over,median_income,median_home_value,educational_attainment,white_alone,black_alone,native_alone,asian_alone,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino,city,metro_area
0,11001002102,"Census Tract 21.02, District of Columbia, Dist...",5264.0,4072.0,76836.0,453200.0,1991.0,503.0,3786.0,82.0,38.0,0.0,0.0,94.0,807.0,Washington,Washington-Arlington-Alexandria
1,11001002202,"Census Tract 22.02, District of Columbia, Dist...",3347.0,2393.0,54817.0,464200.0,839.0,265.0,2081.0,7.0,63.0,0.0,0.0,79.0,859.0,Washington,Washington-Arlington-Alexandria
2,11001002302,"Census Tract 23.02, District of Columbia, Dist...",1708.0,1386.0,60417.0,302800.0,797.0,685.0,764.0,0.0,67.0,0.0,52.0,67.0,73.0,Washington,Washington-Arlington-Alexandria
3,11001003400,"Census Tract 34, District of Columbia, Distric...",4901.0,2210.0,88772.0,710200.0,1298.0,1287.0,2926.0,0.0,108.0,0.0,6.0,112.0,462.0,Washington,Washington-Arlington-Alexandria
4,11001005800,"Census Tract 58, District of Columbia, Distric...",3039.0,2778.0,137259.0,510700.0,2237.0,1851.0,240.0,0.0,605.0,0.0,0.0,106.0,237.0,Washington,Washington-Arlington-Alexandria


In [8]:
census_data.to_csv("../output/census_tracts.csv", index = False)

---

---

---