Commuting Data
==============
This notebook gathers US census data from the American Community Survey (ACS)
related to car ownership, commuting time, distance, and mode of transportation.


Working with Census Data
------------------------

The first step is determining the relevant variables and their codes.
[Tables and variables in Excel can be found here.](https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html)

I found that the variable names from the spreadsheets don't exactly match the names in the API.
The API variables can be loaded as JSON (below), with a couple of helper methods to
look through that data.

The python `census` package is a thin wrapper on the Census REST API. We don't need to
know the exact endpoints in order to use it.

You will need to [register for an API key](https://api.census.gov/data/key_signup.html) to use the API.
If you keep your Notebook private, you can include the key in the notebook.

I have my key stored in a file called .env in my project root, in the format:
`CENSUS_API_KEY=3330943536436307591747355883236309887155`
                
(no, that's not my real key)

My basic practice is: look in the Excel sheets for likely variables, and/or ask ChatGPT for suggestions.
Write some code to load some of this data and see what it looks like. Then pull a more
coherent dataset together.

The Census data is tied to geography, from largest to smallest (roughly):

- Nation
- Region
- Division
- State
- County
- Place
- Tract
- Block Group

Use a fairly large geography to start with because it will be faster and the results will
be smaller. Include `NAME` AND `GEO_ID` in the query so you can see what you're looking at.
We can use `GEO_ID` later to join in geospatial data (e.g. `GeoDataFrame`).


In [1]:
from census import Census
import us
from us import states
import os
import pandas as pd
import geopandas as gpd
import requests

api_key = os.environ["CENSUS_API_KEY"]

In [2]:
# this is a full of using example Census
# get total population for each county in New York State

# put the variables we want in a dictionary
# we get the county and state as part of the query because they are in the for and in clauses
core_vars = {
    "NAME": "county",
    "B01003_001E": "total_pop",
    "GEO_ID": "geoid"
}

# get a list of the values for the query
query_vars = list(core_vars.keys())

# build the Census object
c = Census(api_key, year=2022)
# send the query: county* means all counties in the state
data = c.acs5.get(query_vars, {'for': 'county:*', 'in': f'state:{states.NY.fips}'})

# # data is a list of dictionaries, so we can convert it to a DataFrame
df = pd.DataFrame(data)

# give our cols human-readable names
cols = core_vars.copy()
cols["state"] = "state_fips"
cols["COUNTY"] = "county_fips"
df.rename(columns=cols, inplace=True)
# clean it a little bit
df.total_pop = df.total_pop.astype(int)
# save it as a CSV
df.to_csv("./data/ny_county_pop.csv", index=False)

df


Unnamed: 0,county,total_pop,geoid,state_fips,county.1
0,"Albany County, New York",315041,0500000US36001,36,001
1,"Allegany County, New York",47222,0500000US36003,36,003
2,"Bronx County, New York",1443229,0500000US36005,36,005
3,"Broome County, New York",198365,0500000US36007,36,007
4,"Cattaraugus County, New York",77000,0500000US36009,36,009
...,...,...,...,...,...
57,"Washington County, New York",61310,0500000US36115,36,115
58,"Wayne County, New York",91324,0500000US36117,36,117
59,"Westchester County, New York",997904,0500000US36119,36,119
60,"Wyoming County, New York",40338,0500000US36121,36,121


Loading Variables from the API
==============================
Some helper data and functions to work with variables.

In [3]:
# get all of the census variables
# just run this once
url = "https://api.census.gov/data/2020/acs/acs5/variables.json"
response = requests.get(url)
variable_data = response.json()
variable_data = variable_data["variables"]
d = variable_data
all_vars = list(d.keys())
all_vars.sort()

In [4]:
def pick(vars):
    """get the label and concept for variables"""

    f = lambda x: x.replace("Estimate!!Total:!!", "").replace("!!", " ")
    found = {k: f"""{f(d[k]["label"])} | {d[k]["concept"]}""" for k in vars if k in variable_data}
    not_found = set(vars) - set(found.keys())
    if len(not_found) > 0:
        print("Unknown variables:", not_found)
    return found

def family(var):
    return pick([k for k in all_vars if k.startswith(var)])

In [5]:
# use pick() and family() to investigate variables
display(pick(["B01003_001E", "B08135_007E"]))

# this "table" is a family of variables
# saying where people are from
family("B04004")

{'B01003_001E': 'Estimate Total | TOTAL POPULATION',
 'B08135_007E': 'Estimate Aggregate travel time to work (in minutes): 30 to 34 minutes | AGGREGATE TRAVEL TIME TO WORK (IN MINUTES) OF WORKERS BY TRAVEL TIME TO WORK'}

{'B04004_001E': 'Estimate Total: | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_002E': 'Afghan | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_003E': 'Albanian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_004E': 'Alsatian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_005E': 'American | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_006E': 'Arab: | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_007E': 'Arab: Egyptian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_008E': 'Arab: Iraqi | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_009E': 'Arab: Jordanian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_010E': 'Arab: Lebanese | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_011E': 'Arab: Moroccan | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_012E': 'Arab: Palestinian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_013E': 'Arab: Syrian | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_014E': 'Arab: Arab | PEOPLE REPORTING SINGLE ANCESTRY',
 'B04004_015E': 'Arab: Other Arab | PEOPLE REPORTING SINGLE ANCESTRY',
 '

Commuting Data
==============

In [6]:
c = Census(api_key, year=2022)

state_fips = [states.NY.fips]
commute_time = {'B08135_001E': 'commute_time',
                'B08135_002E': 'commute_under_10',
                'B08135_003E': 'commute_10_14',
                'B08135_004E': 'commute_15_19',
                'B08135_005E': 'commute_20_24',
                'B08135_006E': 'commute_25_29',
                'B08135_007E': 'commute_30_34',
                'B08135_008E': 'commute_35_44',
                'B08135_009E': 'commute_45_59',
                'B08135_010E': 'commute_60_more'}


commute_vars = list(commute_time.keys())
commute_vars = ["NAME", "GEO_ID"] + commute_vars
data = c.acs5.get( commute_vars, {'for': 'county:*', 'in': f'state:{",".join(state_fips)}'})
df = pd.DataFrame(data)

cols = commute_time | {"state": "state_fips", "COUNTY": "county_fips", "NAME": "county", "GEO_ID": "geoid"}
df.rename(columns=cols, inplace=True)
df

Unnamed: 0,county,geoid,commute_time,commute_under_10,commute_10_14,commute_15_19,commute_20_24,commute_25_29,commute_30_34,commute_35_44,commute_45_59,commute_60_more,state_fips,county.1
0,"Albany County, New York",0500000US36001,2930645.0,90985.0,249090.0,463800.0,519795.0,282080.0,440990.0,213290.0,233615.0,436995.0,36,001
1,"Allegany County, New York",0500000US36003,402110.0,21595.0,26710.0,31450.0,40660.0,27935.0,59030.0,41890.0,46185.0,106650.0,36,003
2,"Bronx County, New York",0500000US36005,23456755.0,91085.0,242690.0,524700.0,829870.0,435815.0,2165060.0,1845760.0,4107000.0,13214775.0,36,005
3,"Broome County, New York",0500000US36007,1552935.0,71480.0,177635.0,259760.0,261125.0,106565.0,157740.0,95545.0,98215.0,324875.0,36,007
4,"Cattaraugus County, New York",0500000US36009,678055.0,36840.0,52485.0,51695.0,68160.0,38035.0,76975.0,64260.0,97875.0,191735.0,36,009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,"Washington County, New York",0500000US36115,670295.0,21310.0,32525.0,52165.0,63115.0,39550.0,99505.0,86190.0,122905.0,153025.0,36,115
58,"Wayne County, New York",0500000US36117,964495.0,32990.0,46690.0,70010.0,97160.0,91865.0,161135.0,157565.0,159485.0,147585.0,36,117
59,"Westchester County, New York",0500000US36119,14339350.0,186255.0,404675.0,758015.0,962565.0,578775.0,1516515.0,1195695.0,2015030.0,6721830.0,36,119
60,"Wyoming County, New York",0500000US36121,419325.0,16150.0,23055.0,25465.0,41070.0,27985.0,44305.0,46860.0,82800.0,111630.0,36,121
