# Data for multiple states and counties

In our [first notebook](01-census-basics.ipynb), we reviewed the basics of working with the Census API and some important considerations related to margins of error. 

In this notebook, we'll review how to pull data for more than a single state or county. This can be quite handy when you're trying to get a sense of the big-picture nationally, or within a state or region.

> Before proceeding, make sure you've obtained and stored a Census API key per [these instructions](README.md).


In [47]:
import altair as alt
from census import Census
import pandas as pd

from census_api_key import KEY

## Start the Census client

In [48]:
client = Census(KEY)

## Downloading data

So now let's define a few things to make it easier for us to query the API and download the relevant data:

In [49]:
state_fips = '06' # California state FIPS code
county_fips = '085' # FIPS code for Santa Clara County

We also need to specify what fields we want to download from the table we've identified. I like to do this in a dictionary where the keys are the variable IDs and the values are human-readable names of the variables. I do this so I can easily see what these are in my code and to make renaming columns easier later on.

In [50]:
fields = {
    'B19013_001E': 'median_hh_income',
    'B19013_001M': 'margin_of_error',
}
# I also create a list of the variable IDs to give to the Census client
field_codes = list(fields.keys())

## Downloading multiple geographic areas

Often we don't want to look at a single county or state. We can easily download all states in the nation, or counties in a state using a property of the Python Census library - `Census.ALL`. This is exactly the same as using `'*'` which is traditionally viewed as a character to represent wildcards.

### Download all states

In [None]:
raw = client.acs5.state(field_codes, Census.ALL)
len(raw)

In [None]:
data = pd.DataFrame(raw).rename(columns=fields)
data.head()

We don't have state names with this data, only FIPS codes. We can easily use the crosswalk the Census Bureau provides. The crosswalk is a pipe-delimited text file we can load directly into Pandas without downloading.

In [None]:
state_fips_url = 'https://www2.census.gov/geo/docs/reference/codes2020/national_state2020.txt'
# Notice we specified `|` for the delimiter
# We are also going to set the data type of all columns as a string 
# to keep leading zeroes in identifiers
state_codes = pd.read_csv(state_fips_url, delimiter='|', dtype=str) 
state_codes.head()

Let's merge these together.

In [None]:
merged = data.merge(
    state_codes,
    how='left',
    left_on='state',
    right_on='STATEFP'
)
merged.head()

Great we can now write out our results to a file for later use.

In [55]:
merged.to_csv('./median_hh_income_states.csv', index=False)

### Download multiple counties

The process is very similar to above, only we use the client's `state_county` method to get county-level data.

#### For a single state

In [None]:
raw = client.acs5.state_county(field_codes, state_fips, Census.ALL)
len(raw)

#### For all states

In [None]:
raw = client.acs5.state_county(field_codes, Census.ALL, Census.ALL)
len(raw)

In [None]:
data = pd.DataFrame(raw).rename(columns=fields)
data

#### Add county names

We can also use FIPS codes to add county and state names. We will use a different dataset this time that includes county FIPS codes.

In [None]:
county_fips_url = 'https://www2.census.gov/geo/docs/reference/codes2020/national_county2020.txt'
county_codes = pd.read_csv(county_fips_url, delimiter='|', dtype=str)
county_codes

County FIPS codes are unique within a state, but not nationally. So we need to include both state and county FIPS codes when joining the data.

In [None]:
data['geoid'] = data['state'] + data['county']
data.head()

In [None]:
county_codes['geoid'] = county_codes['STATEFP'] + county_codes['COUNTYFP']
county_codes

Now we can join the two dataframes as we did above using the `geoid` columns we just created.

In [None]:
merged = data.merge(
    county_codes,
    how='left',
    on='geoid'
)
merged.head()

And once again write out our data:

In [63]:
merged.to_csv('./median_hh_income_counties.csv', index=False)