## County FIPS codes from census

This is loading the FIPS codes from an excel spreadsheet downloaded from a US Census website; it's probably more accurate than wikipedia. This notebook gets the county FIPS/INCITS codes and also all the states with their state code, and puts them into a county dataframe and also a state dataframe. It also gets a list of all the states as strings. It stores the list of states, the state dataframe, and the county dataframe in a pickle.

In [1]:
import pandas as pd
import pickle

In [2]:
file_name = "all-geocodes-v2016.xlsx"
df = pd.read_excel(io=file_name)

In [3]:
df.head()

Unnamed: 0,Estimates Geography File: Vintage 2016,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,"Source: U.S. Census Bureau, Population Division",,,,,,
1,Internet Release Date: May 2017,,,,,,
2,,,,,,,
3,Summary Level,State Code (FIPS),County Code (FIPS),County Subdivision Code (FIPS),Place Code (FIPS),Consolidtated City Code (FIPS),Area Name (including legal/statistical area de...
4,010,00,000,00000,00000,00000,United States


In [4]:
desired_column_names = list(df.iloc[3])
current_column_names = df.columns
col_dict = dict(zip(current_column_names, desired_column_names))
col_dict

{'Estimates Geography File: Vintage 2016': 'Summary Level',
 'Unnamed: 1': 'State Code (FIPS)',
 'Unnamed: 2': 'County Code (FIPS)',
 'Unnamed: 3': 'County Subdivision Code (FIPS)',
 'Unnamed: 4': 'Place Code (FIPS)',
 'Unnamed: 5': 'Consolidtated City Code (FIPS)',
 'Unnamed: 6': 'Area Name (including legal/statistical area description)'}

In [5]:
# We'll clean up the column names that we're going to keep in the long run so that they're easier to refer to.
col_dict['Unnamed: 1'] = 'state_code'
col_dict['Unnamed: 2'] = 'county_code'
col_dict['Unnamed: 6'] = 'name'
col_dict

{'Estimates Geography File: Vintage 2016': 'Summary Level',
 'Unnamed: 1': 'state_code',
 'Unnamed: 2': 'county_code',
 'Unnamed: 3': 'County Subdivision Code (FIPS)',
 'Unnamed: 4': 'Place Code (FIPS)',
 'Unnamed: 5': 'Consolidtated City Code (FIPS)',
 'Unnamed: 6': 'name'}

In [6]:
clean_df = df.iloc[4:]
clean_df = clean_df.rename(col_dict, axis='columns')

In [7]:
clean_df.head()

Unnamed: 0,Summary Level,state_code,county_code,County Subdivision Code (FIPS),Place Code (FIPS),Consolidtated City Code (FIPS),name
4,10,0,0,0,0,0,United States
5,40,1,0,0,0,0,Alabama
6,50,1,1,0,0,0,Autauga County
7,50,1,3,0,0,0,Baldwin County
8,50,1,5,0,0,0,Barbour County


In [8]:
county_df = clean_df[clean_df['Summary Level'] == '050']
state_df = clean_df[clean_df['Summary Level'] == '040']

In [9]:
county_df.tail()

Unnamed: 0,Summary Level,state_code,county_code,County Subdivision Code (FIPS),Place Code (FIPS),Consolidtated City Code (FIPS),name
43933,50,72,145,0,0,0,Vega Baja Municipio
43934,50,72,147,0,0,0,Vieques Municipio
43935,50,72,149,0,0,0,Villalba Municipio
43936,50,72,151,0,0,0,Yabucoa Municipio
43937,50,72,153,0,0,0,Yauco Municipio


In [11]:
# This table includes Puerto Rico. The state code for PR is 72, so we'll drop that.
print(state_df["name"].unique())
state_df[state_df["name"] == 'Puerto Rico']

['Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado'
 'Connecticut' 'Delaware' 'District of Columbia' 'Florida' 'Georgia'
 'Hawaii' 'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky'
 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota'
 'Mississippi' 'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire'
 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota'
 'Ohio' 'Oklahoma' 'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina'
 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia'
 'Washington' 'West Virginia' 'Wisconsin' 'Wyoming' 'Puerto Rico']


Unnamed: 0,Summary Level,state_code,county_code,County Subdivision Code (FIPS),Place Code (FIPS),Consolidtated City Code (FIPS),name
43859,40,72,0,0,0,0,Puerto Rico


In [12]:
# Take out PR from both county and state dataframes.
county_df = county_df[county_df["state_code"] != "72"]
state_df = state_df[state_df["state_code"] != "72"]

In [13]:
county_df.shape

(3142, 7)

In [14]:
states = state_df["name"].unique()

In [15]:
unnecessary_columns = [0, 3, 4, 5]

drop_columns = [desired_column_names[num] for num in unnecessary_columns]
drop_columns

['Summary Level',
 'County Subdivision Code (FIPS)',
 'Place Code (FIPS)',
 'Consolidtated City Code (FIPS)']

In [16]:
# Let's drop unnecessary columns from both the county and state dataframes.
county_df = county_df.drop(columns=drop_columns, axis=1)
state_df = state_df.drop(columns=drop_columns, axis=1)

In [17]:
county_df.head()

Unnamed: 0,state_code,county_code,name
6,1,1,Autauga County
7,1,3,Baldwin County
8,1,5,Barbour County
9,1,7,Bibb County
10,1,9,Blount County


In [18]:
state_df.head()

Unnamed: 0,state_code,county_code,name
5,1,0,Alabama
534,2,0,Alaska
712,4,0,Arizona
819,5,0,Arkansas
1397,6,0,California


In [19]:
# Let's reset indexes for each of these dataframes.
county_df.reset_index(drop=True, inplace=True);
state_df.reset_index(drop=True, inplace=True);

In [20]:
state_df.head()

Unnamed: 0,state_code,county_code,name
0,1,0,Alabama
1,2,0,Alaska
2,4,0,Arizona
3,5,0,Arkansas
4,6,0,California


In [21]:
county_df.head()
county_df.columns

Index(['state_code', 'county_code', 'name'], dtype='object')

In [22]:
incits = county_df[['state_code', 'county_code']].apply(lambda x: ''.join(x), axis=1)
county_df = county_df.assign(INCITS=incits.values)

In [23]:
# save state dataframe county dataframe, and states to pickle.
with open("statedf_countydf_states.pkl", "wb") as picklefile:
    pickle.dump([state_df, county_df, states], picklefile)