### Getting Census data from the API
(To reproduce the data below, you'll need to save your Census API key to `../data/census-api-key.txt`. You can obtain a key here: https://api.census.gov/data/key_signup.html)

In [10]:
import time
import pandas as pd
import requests
from census import Census
from us import states

In [2]:
api_key = open("../data/census-api-key.txt").read().strip()
c = Census(api_key)

Select data categories for gentrification measurement:

In [3]:
# Full API variable list available here https://api.census.gov/data/2016/acs/acs5/variables/
categories = [
     'NAME', # county name
     'B01001_001E', # Total population
     'B15002_001E', # Total population 25 and over
     'B19013_001E', # Median income
     'B25077_001E', # Median home value
     'B15011_001E', # Total population age 25+ years with a bachelor's degree or higher
     'B03002_003E', # Not Hispanic or Latino!!White alone
     'B03002_004E', # Not Hispanic or Latino!!Black or African American alone
     'B02001_004E', # American Indian and Alaska Native Alone
     'B03002_006E', # Not Hispanic or Latino!!Asian alone
     'B03002_007E', # Not Hispanic or Latino!!Native Hawaiian and Other Pacific Islander alone
     'B03002_008E', # Not Hispanic or Latino!!Some other race alone
     'B03002_009E', # Not Hispanic or Latino!!Two or more races
     'B03002_012E', # Hispanic or Latino
]

Function to run API scraper:

In [4]:
def get_acs_data(state_code, county_code, timeperiod, metro_area):
    results = c.acs5.state_county_tract(
        categories,
        state_code,
        county_code, 
        Census.ALL,
        year = timeperiod
    )

    return [ {
        'geoid': res['state'] + res['county'] + res['tract'],
        'name': res['NAME'],
        'total_population': res['B01001_001E'],
        'total_population_25_over': res['B15002_001E'],
        'median_income': res['B19013_001E'],
        'median_home_value': res['B25077_001E'],
        'educational_attainment': res['B15011_001E'],
        'white_alone': res['B03002_003E'],
        'black_alone': res['B03002_004E'],
        'native_alone': res['B02001_004E'],
        'asian_alone': res['B03002_006E'],
        'native_hawaiian_pacific_islander': res['B03002_007E'],
        'some_other_race_alone': res['B03002_008E'],
        'two_or_more': res['B03002_009E'],
        'hispanic_or_latino': res['B03002_012E'],
#         'city': city,
        'metro_area': metro_area
    } for res in results ]

This analysis focuses on 50 Metropolitan Statistical Areas (MSAs). For each MSA, the code below fetches demographic data for every tract in every county within it. (The list of counties was [obtained from the Bureau of Economic Analysis](https://apps.bea.gov/regional/docs/msalist.cfm) — "Counties in Micropolitan Statistical Areas".)

In [5]:
metro_area_counties = pd.read_csv(
        "../data/county_names_bea_list.csv",
        dtype = {
            "metro_area_code": str,
            " county_code": str
        }
    ).rename(
        columns={
            " county_code": "full_geo_code",
            " county_name":  "county_name",
            " metro_area_name":"metro_area"

        }
    )
metro_area_counties.head()

Unnamed: 0,metro_area_code,metro_area,full_geo_code,county_name
0,10180,"Abilene, TX (Metropolitan Statistical Area)",48059,"Callahan, TX"
1,10180,"Abilene, TX (Metropolitan Statistical Area)",48253,"Jones, TX"
2,10180,"Abilene, TX (Metropolitan Statistical Area)",48441,"Taylor, TX"
3,10420,"Akron, OH (Metropolitan Statistical Area)",39133,"Portage, OH"
4,10420,"Akron, OH (Metropolitan Statistical Area)",39153,"Summit, OH"


In [6]:
metro_area_counties.columns

Index(['metro_area_code', 'metro_area', 'full_geo_code', 'county_name'], dtype='object')

In [7]:
metro_area_counties["state_code"]= metro_area_counties["full_geo_code"].apply(lambda x: str(x)[0:2])
metro_area_counties["county_code"] =  metro_area_counties["full_geo_code"].apply(lambda x: str(x)[2:5])
metro_area_counties.head()

Unnamed: 0,metro_area_code,metro_area,full_geo_code,county_name,state_code,county_code
0,10180,"Abilene, TX (Metropolitan Statistical Area)",48059,"Callahan, TX",48,59
1,10180,"Abilene, TX (Metropolitan Statistical Area)",48253,"Jones, TX",48,253
2,10180,"Abilene, TX (Metropolitan Statistical Area)",48441,"Taylor, TX",48,441
3,10420,"Akron, OH (Metropolitan Statistical Area)",39133,"Portage, OH",39,133
4,10420,"Akron, OH (Metropolitan Statistical Area)",39153,"Summit, OH",39,153


Go through every county and find census tracts for each:

In [9]:
census_data = []
for index, county in metro_area_counties.iterrows():
    print(county["county_name"])
    
    census_data += get_acs_data(
        county["state_code"], 
        county["county_code"], 
        2019, 
#         county["city"],  
        county["metro_area"]
    )

census_data = pd.DataFrame(census_data)

Callahan, TX
Jones, TX
Taylor, TX
Portage, OH
Summit, OH
Dougherty, GA
Lee, GA
Terrell, GA
Worth, GA
Linn, OR
Albany, NY
Rensselaer, NY
Saratoga, NY
Schenectady, NY
Schoharie, NY
Bernalillo, NM
Sandoval, NM
Torrance, NM
Valencia, NM
Grant, LA
Rapides, LA
Warren, NJ
Carbon, PA
Lehigh, PA
Northampton, PA
Blair, PA
Armstrong, TX
Carson, TX
Oldham, TX
Potter, TX
Randall, TX
Boone, IA
Story, IA
Anchorage Municipality, AK
Matanuska-Susitna Borough, AK
Washtenaw, MI
Calhoun, AL
Calumet, WI
Outagamie, WI
Buncombe, NC
Haywood, NC
Henderson, NC
Madison, NC
Clarke, GA
Madison, GA
Oconee, GA
Oglethorpe, GA
Barrow, GA
Bartow, GA
Butts, GA
Carroll, GA
Cherokee, GA
Clayton, GA
Cobb, GA
Coweta, GA
Dawson, GA
DeKalb, GA
Douglas, GA
Fayette, GA
Forsyth, GA
Fulton, GA
Gwinnett, GA
Haralson, GA
Heard, GA
Henry, GA
Jasper, GA
Lamar, GA
Meriwether, GA
Morgan, GA
Newton, GA
Paulding, GA
Pickens, GA
Pike, GA
Rockdale, GA
Spalding, GA
Walton, GA
Atlantic, NJ
Lee, AL
Burke, GA
Columbia, GA
Lincoln, GA
McDuffie,

Merced, CA
Broward, FL
Miami-Dade, FL
Palm Beach, FL
LaPorte, IN
Midland, MI
Martin, TX
Midland, TX
Milwaukee, WI
Ozaukee, WI
Washington, WI
Waukesha, WI
Anoka, MN
Carver, MN
Chisago, MN
Dakota, MN
Hennepin, MN
Isanti, MN
Le Sueur, MN
Mille Lacs, MN
Ramsey, MN
Scott, MN
Sherburne, MN
Washington, MN
Wright, MN
Pierce, WI
St. Croix, WI
Missoula, MT
Mobile, AL
Washington, AL
Stanislaus, CA
Morehouse, LA
Ouachita, LA
Union, LA
Monroe, MI
Autauga, AL
Elmore, AL
Lowndes, AL
Montgomery, AL
Monongalia, WV
Preston, WV
Grainger, TN
Hamblen, TN
Jefferson, TN
Skagit, WA
Delaware, IN
Muskegon, MI
Brunswick, NC
Horry, SC
Napa, CA
Collier, FL
Cannon, TN
Cheatham, TN
Davidson, TN
Dickson, TN
Macon, TN
Maury, TN
Robertson, TN
Rutherford, TN
Smith, TN
Sumner, TN
Trousdale, TN
Williamson, TN
Wilson, TN
Craven, NC
Jones, NC
Pamlico, NC
New Haven, CT
Jefferson, LA
Orleans, LA
Plaquemines, LA
St. Bernard, LA
St. Charles, LA
St. James, LA
St. John the Baptist, LA
St. Tammany, LA
Bergen, NJ
Essex, NJ
Hudson, 

In [11]:
census_data.head()

Unnamed: 0,geoid,name,total_population,total_population_25_over,median_income,median_home_value,educational_attainment,white_alone,black_alone,native_alone,asian_alone,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino,metro_area
0,48059030101,"Census Tract 301.01, Callahan County, Texas",4609.0,2848.0,49329.0,113100.0,571.0,4063.0,61.0,0.0,20.0,0.0,0.0,204.0,261.0,"Abilene, TX (Metropolitan Statistical Area)"
1,48059030102,"Census Tract 301.02, Callahan County, Texas",4223.0,3284.0,48281.0,82500.0,585.0,3693.0,0.0,0.0,0.0,0.0,0.0,154.0,376.0,"Abilene, TX (Metropolitan Statistical Area)"
2,48059030200,"Census Tract 302, Callahan County, Texas",4938.0,3505.0,39353.0,84400.0,683.0,4144.0,5.0,11.0,19.0,0.0,0.0,96.0,663.0,"Abilene, TX (Metropolitan Statistical Area)"
3,48253020102,"Census Tract 201.02, Jones County, Texas",6959.0,5990.0,-666666666.0,-666666666.0,121.0,1658.0,2446.0,148.0,66.0,0.0,9.0,437.0,2318.0,"Abilene, TX (Metropolitan Statistical Area)"
4,48253020500,"Census Tract 205, Jones County, Texas",4113.0,2784.0,54195.0,101700.0,533.0,3608.0,3.0,4.0,0.0,0.0,7.0,127.0,364.0,"Abilene, TX (Metropolitan Statistical Area)"


In [15]:
census_data.to_csv("../output/census_tracts.csv", index = False)

In [16]:
len(census_data)

60341

In [18]:
census_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60341 entries, 0 to 60340
Data columns (total 16 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   geoid                             60341 non-null  object 
 1   name                              60341 non-null  object 
 2   total_population                  60341 non-null  float64
 3   total_population_25_over          60341 non-null  float64
 4   median_income                     60341 non-null  float64
 5   median_home_value                 60341 non-null  float64
 6   educational_attainment            60341 non-null  float64
 7   white_alone                       60341 non-null  float64
 8   black_alone                       60341 non-null  float64
 9   native_alone                      60341 non-null  float64
 10  asian_alone                       60341 non-null  float64
 11  native_hawaiian_pacific_islander  60341 non-null  float64
 12  some

---

---

---