# Python Tutorial Program: Gathering and Exporting Census Data

By Kenneth Burchfiel

This code is released under the MIT license; the datasets produced by the code are in the public domain.

You can find my blog post on this code at https://kburchfiel3.wordpress.com/2021/08/12/python-tutorial-program-retrieving-u-s-census-data/ .

This program demonstrates how Python (along with the census library, available at https://github.com/datamade/census) can be used to retrieve and export US Census data at the zip code, county, and state level. Although this tutorial program will focus on gathering education, family type, and income/poverty statistics from the American Community Survey, it should be a useful reference for those wishing to gather other types of census data instead.

Before being able to run the code below on your computer, you'll need to install the census libarry and obtain a free Census API key. See the above link for instructions.

First, I imported a number of libraries:

In [1]:
import time
start_time = time.time() # Allows the program's runtime to be measured
from census import Census
# import us I didn't end up using this library, but you may find it useful for your own Census query program. See https://github.com/datamade/census for more information.
import pandas as pd
import numpy as np
import statsmodels.api as sm

Instead of hard coding the year into my Census queries, I chose to set it as a variable so that the queries could be modified more easily. I picked 2019 because it was the recent year that American Community Survey census data was available.

In [2]:
year = 2019

Next, I imported my Census API key into the code. I stored the path to the key and the key itself in separate file locations. 

In [3]:
with open('..\\key_paths\\path_to_keys_folder.txt') as fin:
    api_folder_path = fin.readline()
with open(api_folder_path+'\\census_api_key.txt') as fin:
    api_key = fin.readline() 

In [4]:
c = Census(api_key) # See https://github.com/datamade/census

The next step was to locate the source of the data that I was interested in. For this program, I chose to retrieve zip code statistics for the following variables:

1. Household types (mostly married households vs. ones led by a female householder with no spouse present, which, for brevity's sake, I'll abbreviate as 'female-householder' homes.
2. The presence of children within these households
3. Median household income
4. Poverty status by family type
5. Poverty status by family type and the highest level of education completed

To search for this data, I used the Census's API site (https://api.census.gov/data.html). This is a very helpful site that provides links to different data sources, along with lists of groups and variables within those data sources.

For example, to access data from the 2019 American Community Survey, I searched in the above page in my web browser for 'acs5', then found the most recent year--which, in this case, happened to be 2019. To confirm that I could access data at the zip code level within this table, I could click on the 'geography' hyperlink (https://api.census.gov/data/2019/acs/acs5/geography.html). To figure out what types of data this survey provides, I clicked on its 'groups' hyperlink (https://api.census.gov/data/2019/acs/acs5/groups.html).

This groups page had 1,136 (!) different types of data that I could choose from. Fortunately, there were lots of options available for my variables of interest (marriage, income, education, household type, etc.)

The Census data site also provided an 'examples' page for accessing American Community Survey data (https://api.census.gov/data/2019/acs/acs5/examples.html), although the query format I used differed somewhat from the examples shown there.

I chose to query Census data in this program by:
1. Organizing different queries in dictionaries
2. Adding these dictionaries to a list (which I named 'metric_list')
3. Looping through this list
4. Storing the output of the queries in a DataFrame

The first two steps are shown below. I ended up adding many different queries to my dictionary, but you may choose to retrieve data for only a couple variables.

Each dictionary is based off information available on the Census Data page for a particular 'group.' For instance, to find data on the presence of children in households by household type, I chose to look into table B11005, 'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE' (which can be found on https://api.census.gov/data/2019/acs/acs5/groups.html). Clicking the 'selected variables' link for that group took me to https://api.census.gov/data/2019/acs/acs5/groups/B11005.html. This page shows all the different statistics available for the 'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE' group.

I stored the following information from these pages within the dictionaries below:

1. 'Name': the code on the Census website for that particular variable (e.g. B11005_001E). 

2. 'Label': the Census's text description of that variable (e.g. 'Estimate!!Total:')

3. 'Concept': the Census's text description of the group to which the variable belongs (e.g. 'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE''). 

I also added an 'Alias' key to store my own description of these metrics. These aliases then served as column names in the Pandas DataFrame that stored the results of these queries. That DataFrame will appear later in this program.

I could have made the dictionaries simpler by including only the 'Name' and 'Alias' components, as the 'Label' and 'Concept' keys are neither used in the census queries nor displayed in the table. However, they can serve as a helpful reference for distinguishing between subtly different variable types.

In [5]:
metric_list = []

# Group 1: Information on households by presence of children (see https://api.census.gov/data/2019/acs/acs5/groups/B11005.html)

metric_list.append({'Name':'B11005_001E', 'Label':'Estimate!!Total:', 'Concept':'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE','Alias':'Households'})

metric_list.append({'Name':'B11005_013E', 'Label':'	Estimate!!Total:!!Households with no people under 18 years:!!Family households:!!Married-couple family', 'Concept':'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE','Alias':'Married_couple_households_with_no_children'})

metric_list.append({'Name':'B11005_002E', 'Label':'Estimate!!Total:!!Households with one or more people under 18 years:', 'Concept':'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE:','Alias':'Households_with_1_or_more_children'})

metric_list.append({'Name':'B11005_004E', 'Label':'Estimate!!Total:!!Households with one or more people under 18 years:!!Family households:!!Married-couple family', 'Concept':'HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE','Alias':'Married_couple_households_with_1_or_more_children'})

# Group 2: median household income

metric_list.append({'Name':'B19013_001E', 'Label':'Estimate!!Median household income in the past 12 months (in YYYY inflation-adjusted dollars)', 'Concept':'MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN YYYY INFLATION-ADJUSTED DOLLARS)','Alias':'Median_household_income'})

# Group 3: Numbers of children below/not below the poverty level in different family types

metric_list.append({'Name':'B17006_002E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_below_poverty_level'})

metric_list.append({'Name':'B17006_016E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_at_or_above_poverty_level'})

metric_list.append({'Name':'B17006_003E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!In married-couple family:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_married_couple_families_below_poverty_level'})

metric_list.append({'Name':'B17006_017E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!In married-couple family:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_married_couple_families_at_or_above_poverty_level'})

metric_list.append({'Name':'B17006_012E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!In other family:!!Female householder, no spouse present:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_female_householder_families_below_poverty_level'})

metric_list.append({'Name':'B17006_026E', 'Label':'	Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!In other family:!!Female householder, no spouse present:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_female_householder_families_at_or_above_poverty_level'})

metric_list.append({'Name':'B17006_008E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!In other family:!!Male householder, no spouse present:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_male_householder_families_below_poverty_level'})

metric_list.append({'Name':'B17006_022E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!In other family:!!Male householder, no spouse present:', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF RELATED CHILDREN UNDER 18 YEARS BY FAMILY TYPE BY AGE OF RELATED CHILDREN UNDER 18 YEARS','Alias':'Children_in_male_householder_families_at_or_above_poverty_level'})

# Group 4: poverty status by household type by householder's highest education level

metric_list.append({'Name':'B17018_004E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!Less than high school graduate', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school'})

metric_list.append({'Name':'B17018_021E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!Less than high school graduate', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder_did_not_graduate_high_school'})

metric_list.append({'Name':'B17018_005E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!High school graduate (includes equivalency)', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'})

metric_list.append({'Name':'B17018_022E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!High school graduate (includes equivalency)', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'})

metric_list.append({'Name':'B17018_006E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!Some college, associate\'s degree', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'})

metric_list.append({'Name':'B17018_023E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!Some college, associate\'s degree', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'})

metric_list.append({'Name':'B17018_007E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!Bachelor\'s degree or higher', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'})

metric_list.append({'Name':'B17018_024E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!Bachelor\'s degree or higher', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'})

metric_list.append({'Name':'B17018_015E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other families:!!Female householder, no spouse present:!!Less than high school graduate', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school'})

metric_list.append({'Name':'B17018_032E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other families:!!Female householder, no spouse present:!!Less than high school graduate', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder_did_not_graduate_high_school'})

metric_list.append({'Name':'B17018_016E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other families:!!Female householder, no spouse present:!!High school graduate (includes equivalency)', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'})

metric_list.append({'Name':'B17018_033E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other families:!!Female householder, no spouse present:!!High school graduate (includes equivalency)', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'})

metric_list.append({'Name':'B17018_017E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other families:!!Female householder, no spouse present:!!Some college, associate\'s degree', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'})

metric_list.append({'Name':'B17018_034E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other families:!!Female householder, no spouse present:!!Some college, associate\'s degree', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'})

metric_list.append({'Name':'B17018_018E', 'Label':'Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other families:!!Female householder, no spouse present:!!Bachelor\'s degree or higher', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'})

metric_list.append({'Name':'B17018_035E', 'Label':'Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other families:!!Female householder, no spouse present:!!Bachelor\'s degree or higher', 'Concept':'POVERTY STATUS IN THE PAST 12 MONTHS OF FAMILIES BY HOUSEHOLD TYPE BY EDUCATIONAL ATTAINMENT OF HOUSEHOLDER','Alias':'Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'})


# Group 5: Highest education level completed 

metric_list.append({'Name':'B16010_001E', 'Label':'Estimate!!Total:', 'Concept':'EDUCATIONAL ATTAINMENT AND EMPLOYMENT STATUS BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 25 YEARS AND OVER','Alias':'Total_number_of_individuals_25+y/o_in_table_B16010'})

metric_list.append({'Name':'B16010_002E', 'Label':'	Estimate!!Total:!!Less than high school graduate:', 'Concept':'EDUCATIONAL ATTAINMENT AND EMPLOYMENT STATUS BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 25 YEARS AND OVER','Alias':'Number_of_individuals_25+y/o_who_did_not_graduate_high_school'})

metric_list.append({'Name':'B16010_015E', 'Label':'Estimate!!Total:!!High school graduate (includes equivalency):', 'Concept':'EDUCATIONAL ATTAINMENT AND EMPLOYMENT STATUS BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 25 YEARS AND OVER','Alias':'Number_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent'})

metric_list.append({'Name':'B16010_028E', 'Label':'Estimate!!Total:!!some_college_or_associate\'s degree:', 'Concept':'EDUCATIONAL ATTAINMENT AND EMPLOYMENT STATUS BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 25 YEARS AND OVER','Alias':'Number_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate\'s_degree'})

metric_list.append({'Name':'B16010_041E', 'Label':'Estimate!!Total:!!Bachelor\'s degree or higher:', 'Concept':'EDUCATIONAL ATTAINMENT AND EMPLOYMENT STATUS BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 25 YEARS AND OVER','Alias':'Number_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor\'s_degree_or_higher'})



# metric_list.append({'Name':'', 'Label':'', 'Concept':'','Alias':''}) # Template for additional dictioanries

In [6]:
len(metric_list) # Shows the number of queries to be processed by the Census API

34

The following code provides an example of how the census library works. It derives from the examples shown at https://github.com/datamade/census, but differs in that the query is returned for a particular zip code rather than for a particular state. 

The code following 'NAME' in this example is one of the variable codes entered into the dictionary list above.

In [7]:
sample_year = 2019
sample_zip = 10940
sample_query = c.acs5.get(('NAME', 'B17018_003E'), {'for': 'zip code tabulation area:{}'.format(sample_zip), 'in':'state:36', 'year':sample_year}) # 36 is the FIPS code for New York. This can be looked up online; alternately, you can find it using the us library referenced in https://github.com/datamade/census.
# the {}.format() convention allows a variable to be passed into the 'zip code tabulation area:' string. For an overview of this method, see https://docs.python.org/3/library/string.html#formatstrings. 
sample_query

[{'NAME': 'ZCTA5 10940',
  'B17018_003E': 551.0,
  'state': '36',
  'zip code tabulation area': '10940'}]

The output of the above query is a list of dictionaries. Since only one zip code was entered, the length of this list is 1, but it will be over 33,000 in the for loop below. The number following B17018_003E is the value represented by that code (in this case, the number of married-couple families below the poverty level) for the sample zip code.

The for loop below is the heart of this program. It performs census queries using the codes in the dictionaries within metric_list for all available zip codes, then converts the results into a Pandas DataFrame using list comprehensions and the pd.merge() function. 

In [8]:
for i in range(len(metric_list)):
    census_query = c.acs5.get(('NAME', metric_list[i]['Name']), {'for': 'zip code tabulation area:*', 'in':'state:*', 'year':year})
    #metric_list[i]['Name'] returns the variable code (e.g. B17006_012E) that the get() function will process. 
    # The asterisks after zip code tabulation area and state instruct the get() function to return results for all zip codes in all states. As shown in the earlier example, however, you can also select results from one particular zip code and/or state. 
    zip_list = [census_query[j]['NAME'][-5:] for j in range(len(census_query))] # Refer back to the output from the sample query. the 'NAME' component of census_query equals 'ZCTA5' plus a 5-digit zip code. Therefore, [-5:] is used to select only the 5-digit zip code.
    # zip_list, along with state_list and metric, use list comprehensions. See https://docs.python.org/3/tutorial/datastructures.html and the last example shown on https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.append.html .
    # j is used here to distinguish the range of zip codes shown in the list comprehension from the range of queries shown in the above for loop.
    state_list = [census_query[j]['state'] for j in range(len(census_query))] # Retrieves the state FIPS code from the output
    metric = [census_query[j][metric_list[i]['Name']] for j in range(len(census_query))] # census_query[j] retrieves a query result for a particular zip code. As shown in the sample query, the key for the metric equals the code for the current query. That code can be accessed through metric_list[i]['Name'].  
    df_metric = pd.DataFrame(data={'Zip':zip_list, 'State':state_list, metric_list[i]['Alias']:metric}) # This line creates a dictionary with 3 key-value pairs: (1) 'Zip' and zip_list; (2) 'State' and state_list; and (3) the 'Alias' value within metric_list for the current query and metric. Note that zip_list, state_list, and metric are all lists. This new dictionary is then converted to the DataFrame df_metric using pd.DataFrame.

    if i == 0: 
        df_results = df_metric # if the above instance of df_metric was the first to be created, it will serve as the basis for df_results.
    # df_list.append(df_test)
    else:
        df_results = df_results.merge(right=df_metric,how='outer') # Further instances of df_metric are added to the right of df_results using pd.merge. As a result, each 'Alias' value for each query will become a column label within df_results for its corresponding query.

df_results

Unnamed: 0,Zip,State,Households,Married_couple_households_with_no_children,Households_with_1_or_more_children,Married_couple_households_with_1_or_more_children,Median_household_income,Children_below_poverty_level,Children_at_or_above_poverty_level,Children_in_married_couple_families_below_poverty_level,...,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_=_high_school_graduate/equivalent,Number_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Number_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Total_number_of_individuals_25+y/o_in_table_B16010,Number_of_individuals_25+y/o_who_did_not_graduate_high_school,Number_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent,Number_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate's_degree,Number_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor's_degree_or_higher
0,72827,05,73.0,34.0,30.0,30.0,-666666666.0,0.0,27.0,0.0,...,0.0,0.0,0.0,0.0,0.0,139.0,26.0,33.0,72.0,8.0
1,72834,05,3594.0,1331.0,1329.0,733.0,43851.0,354.0,2264.0,67.0,...,82.0,88.0,207.0,8.0,29.0,6925.0,1363.0,2639.0,1948.0,975.0
2,72845,05,432.0,204.0,117.0,100.0,46786.0,56.0,139.0,56.0,...,3.0,0.0,0.0,0.0,6.0,785.0,160.0,267.0,217.0,141.0
3,72860,05,202.0,106.0,63.0,32.0,43698.0,0.0,133.0,0.0,...,0.0,0.0,0.0,0.0,31.0,441.0,29.0,235.0,75.0,102.0
4,72901,05,8870.0,1647.0,2356.0,1019.0,32424.0,1366.0,3028.0,750.0,...,317.0,105.0,344.0,14.0,77.0,14002.0,3292.0,4652.0,3909.0,2149.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33115,51007,19,204.0,87.0,70.0,49.0,77222.0,0.0,153.0,0.0,...,5.0,0.0,2.0,0.0,0.0,358.0,21.0,176.0,110.0,51.0
33116,51028,19,867.0,272.0,376.0,280.0,62981.0,52.0,640.0,0.0,...,2.0,27.0,37.0,4.0,0.0,1514.0,57.0,495.0,635.0,327.0
33117,51029,19,153.0,38.0,9.0,5.0,63015.0,0.0,16.0,0.0,...,5.0,0.0,9.0,0.0,0.0,242.0,10.0,89.0,103.0,40.0
33118,51047,19,303.0,118.0,69.0,57.0,48875.0,10.0,108.0,6.0,...,6.0,2.0,4.0,0.0,8.0,489.0,41.0,173.0,163.0,112.0


I admit that many of the column names are obscenely long and unwieldy. This is less of an issue when viewing the table as a CSV export (which I'll perform later), since spreadsheet software can make the columns a uniform width while allowing the full name to be displayed in a separate box. An alternative to these long names, though, would be to substitute in shorter names, then include a key explaining each of their meanings.

The following block of code reports the number of census datapoints that df_results currently contains. (Some row-column pairs are empty, so df.count() is used to determine the number of cells that do contain data.)

In [9]:
cell_counts = df_results.count() # Creates a series that contains the number of cells with data in each column.
total_count = 0
print(len(cell_counts))
for i in range(2, len(cell_counts)): # For loop starts at 2 to exclude the first two columns, which merely contain zip and state information.
    total_count += cell_counts[i]
print("There are",'{:,}'.format(total_count),"cells with census data in df_results so far.") # See https://docs.python.org/3/library/string.html#formatstrings for an overview of the .format() function



36
There are 1,125,030 cells with census data in df_results so far.


So far, the values shown in the DataFrame are nominal in nature. For example, the table reports on the number of married-couple households with one or more children, but doesn't say what *proportion* have at laest one child--which is much more useful when comparing different zip codes.

Therefore, in the following code block, I added additional columns to the DataFrame that generate various proportions. Some of these were generated using pre-existing totals as a denominator, whereas others used the sum of two diferent statistics as the denominator. (For example, to calculate the proportion of children below the poverty level for a given zip code, I divided the number of children below the poverty level by the sum of (1) children below the poverty level and (2) children above the poverty level. This was a useful strategy when a given Census table didn't have a 'totals' row.

(When creating proportions, be careful about using a total in one table as the denominator for a proportion calculation that involves a separate table. For example, if Table A says that there are 10,000 kids in a zip code, and Table B says that there are 2,000 kids below the poverty line, you may be tempted to conclude that the proportion of children below the poverty line equals 2,000/10,000 = 0.2. However, suppose not all the kids identified in Table A show up in Table B, and that Table B doesn't have a totals row. In that case, you'd want to divide the proportion of kids in Table B above below the poverty level (2,000) by the number in Table B above the poverty level (let's say it's 6,000) to arrive at a more accurate proportion--in this case, 2,000/(2,000+6,000) = 2,000/8,000 = 25%.)

In [10]:
def calculate_proportions(df_results): 
    df_results['Married_couple_households_with_one_or_more_children_as_proportion_of_all_households'] = df_results['Married_couple_households_with_1_or_more_children']/df_results['Households']

    df_results['Married_couple_households_with_one_or_more_children_as_proportion_of_all_households_with_one_or_more_children'] = df_results['Married_couple_households_with_1_or_more_children']/df_results['Households_with_1_or_more_children']

    df_results['Proportion_of_children_below_poverty_level'] = df_results['Children_below_poverty_level']/(df_results['Children_below_poverty_level'] + df_results['Children_at_or_above_poverty_level'])

    df_results['Proportion_of_children_in_married_couple_families_below_poverty_level'] = df_results['Children_in_married_couple_families_below_poverty_level']/(df_results['Children_in_married_couple_families_below_poverty_level'] + df_results['Children_in_married_couple_families_at_or_above_poverty_level'])

    df_results['Proportion_of_children_in_female_householder_families_below_poverty_level'] = df_results['Children_in_female_householder_families_below_poverty_level']/(df_results['Children_in_female_householder_families_below_poverty_level'] + df_results['Children_in_female_householder_families_at_or_above_poverty_level'])

    df_results['Proportion_of_children_in_male_householder_families_below_poverty_level'] = df_results['Children_in_male_householder_families_below_poverty_level']/(df_results['Children_in_male_householder_families_below_poverty_level'] + df_results['Children_in_male_householder_families_at_or_above_poverty_level'])

    # Calculating proportions of residents living below the poverty level by education and household type

    df_results['Proportion_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school'] = df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school']/(df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school']+df_results['Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder_did_not_graduate_high_school'])

    df_results['Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'] = df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent']/(df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent']+df_results['Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'])

    df_results['Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'] = df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree']/(df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree']+df_results['Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'])

    df_results['Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'] = df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher']/(df_results['Number_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher']+df_results['Number_of_married-couple_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'])


    df_results['Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school'] = df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school']/(df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school']+df_results['Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder_did_not_graduate_high_school'])

    df_results['Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'] = df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent']/(df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent']+df_results['Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent'])

    df_results['Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'] = df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree']/(df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree']+df_results['Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree'])

    df_results['Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'] = df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher']/(df_results['Number_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher']+df_results['Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'])

    df_results['Proportion_of_individuals_25+y/o_who_did_not_graduate_high_school'] = df_results['Number_of_individuals_25+y/o_who_did_not_graduate_high_school']/(df_results['Total_number_of_individuals_25+y/o_in_table_B16010'])

    df_results['Proportion_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent'] = df_results['Number_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent']/(df_results['Total_number_of_individuals_25+y/o_in_table_B16010'])

    df_results['Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate\'s_degree'] = df_results['Number_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate\'s_degree']/(df_results['Total_number_of_individuals_25+y/o_in_table_B16010'])

    df_results['Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor\'s_degree_or_higher'] = df_results['Number_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor\'s_degree_or_higher']/(df_results['Total_number_of_individuals_25+y/o_in_table_B16010'])


    # df_results[''] = df_results['']/(df_results['']+df_results[''])


    df_results.sort_values('Married_couple_households_with_one_or_more_children_as_proportion_of_all_households',ascending=False,inplace=True)
    df_results.reset_index(drop=True,inplace=True)
    return df_results

In [13]:
df_results = calculate_proportions(df_results)


Here's how df_results looks with the additional proportions columns added in:

In [17]:
df_results

Unnamed: 0,Zip,State,Households,Married_couple_households_with_no_children,Households_with_1_or_more_children,Married_couple_households_with_1_or_more_children,Median_household_income,Children_below_poverty_level,Children_at_or_above_poverty_level,Children_in_married_couple_families_below_poverty_level,...,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_high_school_graduate/equivalent,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_individuals_25+y/o_who_did_not_graduate_high_school,Proportion_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate's_degree,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor's_degree_or_higher
0,46922,18,11.0,0.0,11.0,11.0,-666666666.0,0.0,11.0,0.0,...,,0.0,,,,,0.000000,0.000000,0.388889,0.611111
1,31045,13,19.0,0.0,19.0,19.0,-666666666.0,0.0,14.0,0.0,...,0.0,,,,,,0.000000,0.000000,1.000000,0.000000
2,17010,42,66.0,0.0,66.0,66.0,-666666666.0,0.0,172.0,0.0,...,0.0,,,,,,0.223077,0.223077,0.553846,0.000000
3,77436,48,13.0,0.0,13.0,13.0,-666666666.0,0.0,17.0,0.0,...,,0.0,,,,,0.000000,0.000000,0.000000,1.000000
4,84540,49,24.0,0.0,24.0,24.0,-666666666.0,0.0,35.0,0.0,...,0.0,,,,,,0.000000,0.555556,0.444444,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33115,55155,27,0.0,0.0,0.0,0.0,-666666666.0,0.0,0.0,0.0,...,,,,,,,,,,
33116,55814,27,0.0,0.0,0.0,0.0,-666666666.0,0.0,0.0,0.0,...,,,,,,,0.012579,0.529874,0.369497,0.088050
33117,55905,27,0.0,0.0,0.0,0.0,-666666666.0,0.0,0.0,0.0,...,,,,,,,,,,
33118,56711,27,0.0,0.0,0.0,0.0,-666666666.0,0.0,0.0,0.0,...,,,,,,,,,,


A look at the first few rows in this table reveals that some median household income values are clearly inaccurate! $-666,666,666 is *not* the actual median household income in any zip code, yet that's the value listed for 2,194 counties, as shown below:

In [15]:
len(df_results.query("Median_household_income == -666666666"))

2194

This means that, when performing average calculations across the entire dataset, you must be extremely careful--otherwise, you'll end up with results like the one below:

In [16]:
np.mean(df_results['Median_household_income'])

-44154242.870303765

These results are, of course, skewed by the thousands of -666,666,666 values. The U.S. would be in dire shape if the average median household income among zip codes were truly $-44,154,242! 

I then exported two versions of this DataFrame to a CSV. The first version (df_results_1k_plus_households) only includes zip codes with at least 1,000 households, since lower sample sizes in smaller zip codes can skew the sample sizes shown. The second version contains all zip codes present in the dataset.

In [None]:
df_results_1k_plus_households = df_results.query("Households > 1000")
df_results_1k_plus_households.to_csv('census_query_results_1k_plus_households.csv')
df_results.to_csv('census_query_results.csv')

As shown below, running the same average median household calculation on the reduced dataset produces a more accurate-looking number. Nevertheless, it would still be better to look through the DataFrame beforehand and perform any necessary data cleaning.

In [None]:
np.mean(df_results_1k_plus_households['Median_household_income'])

That concludes the main part of this tutorial program. I hope that you find these examples useful in performing your own census data analysis!

These census DataFrames can also be a great source of information for regression analyses. The following code blocks show how one of the DataFrames can be modified to serve as a data source for regressions (albeit without any data cleaning or checking). In the future, I may move these regressions over to a separate tutorial program and provide detailed explanations of the code. In the meantime, I've left the code in place and added some brief explanations. 

The first regression examined the relationship between poverty rates and whether children were in a married-couple family as opposed to a female-householder one. This involved creating a reduced version of the df_results_1k_plus_households DataFrame:

In [None]:
df_regression_test = df_results_1k_plus_households.copy()
df_regression_test.dropna(subset=['Proportion_of_children_in_female_householder_families_below_poverty_level','Proportion_of_children_in_married_couple_families_below_poverty_level'],inplace=True)
df_regression_test = df_regression_test[['Zip','Proportion_of_children_in_female_householder_families_below_poverty_level','Proportion_of_children_in_married_couple_families_below_poverty_level']].copy()

In [None]:
df_regression_test

I then converted the two different variable columns into two different rows for each zip code using pd.melt(), which would make it easier to create categorical or 'dummy' variables for the regression analysis:

In [None]:
df_regression_test_melt = pd.melt(df_regression_test.copy(), id_vars = ['Zip']) # https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.melt.html
df_regression_test_melt

The following code block uses pd.get_dummies to generate categorical variables, then renames the resulting column for better legibility. 

In [None]:
df_regression_test_melt = pd.get_dummies(data = df_regression_test_melt.copy(), columns=['variable'], drop_first=True)
df_regression_test_melt.rename(columns={'variable_Proportion_of_children_in_married_couple_families_below_poverty_level':'in_married_household','value':'proportion_below_poverty_level'},inplace=True)
df_regression_test_melt

With this table in place, I was able to perform the regression analysis.

In [None]:
y = df_regression_test_melt['proportion_below_poverty_level'] # Contains the list of scores for the current grade (or for the school total in the case of the 'Total' column)
x_vars = df_regression_test_melt[['in_married_household']]
x_vars = sm.add_constant(x_vars) 
model = sm.OLS(y,x_vars)
results = model.fit() # the resulst variable contains the information needed to fill in the other rows within the DataFrame.
results.summary()

My second regression analysis aimed to evaluate the impact of family type (married vs. female-householder-only) and education level (no high school diploma; high school diploma/equivalent; associate's/some college; and bachelor's or higher) on poverty status. This first involved retrieving data on income for both family type and education.

In [None]:
df_regression_test_2 = df_results_1k_plus_households.copy()
df_regression_test_2.dropna(subset=['Proportion_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school','Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent', 'Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree', 'Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher'],inplace=True)
df_regression_test_2 = df_regression_test_2[['Zip','Proportion_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school','Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent', 'Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree', 'Proportion_of_married-couple_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_high_school_graduate/equivalent', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_=_some_college_or_associate\'s_degree', 'Proportion_of_female-householder_families_below_the_poverty_level_where_householder\'s_highest_education_level_=_bachelor\'s_degree_or_higher']].copy()

In [None]:
df_regression_test_2

Next, I once again 'melted' various columns into the same column in order to facilitate the creation of categorical variables. I also created columns that would store these categorical variables.

In [None]:
df_regression_test_2_melt = pd.melt(df_regression_test_2.copy(), id_vars = ['Zip'])
df_regression_test_2_melt['Married'] = 0
df_regression_test_2_melt['highest_ed_=_high_school_grad'] = 0
df_regression_test_2_melt['highest_ed_=_some_college_or_associate\'s'] = 0
df_regression_test_2_melt['highest_ed_=_bachelor\'s_or_higher'] = 0

In [None]:
df_regression_test_2_melt

The output of the following for loop served as a reference for which column numbers corresponded to which variables.

In [None]:
for i in range(len(df_regression_test_2_melt.columns)):
    print("Column",i,":\t",df_regression_test_2_melt.columns[i])

In the next for loop, I filled in the categorical variables by seeing whether certain keywords ('married', 'some_college', etc.) were present in the variable column. For instance, given the variable 'Proportion_of_married-couple_families_below_the_poverty_level_where_householder_did_not_graduate_high_school', the for loop returned 1 for the 'Married' column and 0 for the other columns. 

In [None]:
for i in range(len(df_regression_test_2_melt)):
    variable = df_regression_test_2_melt.iloc[i, 1]
    if 'married' in variable:
        df_regression_test_2_melt.iloc[i, 3] = 1
    if 'high_school_graduate' in variable:
        df_regression_test_2_melt.iloc[i, 4] = 1
    if 'some_college' in variable:
        df_regression_test_2_melt.iloc[i, 5] = 1
    if 'bachelor' in variable:
        df_regression_test_2_melt.iloc[i, 6] = 1


In [None]:
df_regression_test_2_melt.iloc[0,1]

In [None]:
df_regression_test_2_melt.rename(columns={'value':'proportion_below_poverty_level'},inplace=True)
df_regression_test_2.to_csv('marriage_education_poverty_regression.csv')
df_regression_test_2_melt

With the table complete, I performed a regression that used proportion_below_poverty_level as the dependent variable and various family type/education level values as the independent variables.

In [None]:
y = df_regression_test_2_melt['proportion_below_poverty_level']
x_vars = df_regression_test_2_melt[['Married',
       'highest_ed_=_high_school_grad',
       'highest_ed_=_some_college_or_associate\'s',
       'highest_ed_=_bachelor\'s_or_higher']]
x_vars = sm.add_constant(x_vars) 
model = sm.OLS(y,x_vars)
results_2 = model.fit() 
results_2.summary()

Finally, I will generate equivalent tables for US counties and states. Much of the code used to generate the zip code data table will also apply to creating the county and state data tables.

While the county table still uses the ACS 5-year survey, the state table uses the ACS 1-year survey, as it retrieves more current data. See https://www.census.gov/programs-surveys/acs/guidance/estimates.html for guidance on when to use each survey option.

Here's a look at county-level results for a given metric:

In [56]:
sample_year = 2019
census_query = c.acs5.get(('NAME', 'B17018_003E'), {'for': 'county:*', 'in':'state:*', 'year':sample_year})
census_query

[{'NAME': 'Sedgwick County, Kansas',
  'B17018_003E': 4307.0,
  'state': '20',
  'county': '173'},
 {'NAME': 'Republic County, Kansas',
  'B17018_003E': 46.0,
  'state': '20',
  'county': '157'},
 {'NAME': 'Graham County, Kansas',
  'B17018_003E': 17.0,
  'state': '20',
  'county': '065'},
 {'NAME': 'Douglas County, Kansas',
  'B17018_003E': 511.0,
  'state': '20',
  'county': '045'},
 {'NAME': 'Sheridan County, Kansas',
  'B17018_003E': 8.0,
  'state': '20',
  'county': '179'},
 {'NAME': 'Gray County, Kansas',
  'B17018_003E': 41.0,
  'state': '20',
  'county': '069'},
 {'NAME': 'Cameron County, Pennsylvania',
  'B17018_003E': 36.0,
  'state': '42',
  'county': '023'},
 {'NAME': 'Bucks County, Pennsylvania',
  'B17018_003E': 3075.0,
  'state': '42',
  'county': '017'},
 {'NAME': 'Lehigh County, Pennsylvania',
  'B17018_003E': 2828.0,
  'state': '42',
  'county': '077'},
 {'NAME': 'Clarion County, Pennsylvania',
  'B17018_003E': 336.0,
  'state': '42',
  'county': '031'},
 {'NAME': 'Gr

In [None]:
census_query[0]['NAME'].split(',')[0]

In [62]:
for i in range(len(metric_list)):
    census_query = c.acs5.get(('NAME', metric_list[i]['Name']), {'for': 'county:*', 'in':'state:*', 'year':year})
    county_list = [census_query[j]['NAME'] for j in range(len(census_query))] # The 'NAME' component of a county-based query includes the county name (including 'County'), a comma, and the state name. I chose to keep the state name within this result so that each county/state value would remain unique, which will prove helpful when creating choropleth maps.
    county_code_list = [census_query[j]['county'] for j in range(len(census_query))]
    state_list = [census_query[j]['state'] for j in range(len(census_query))]
    metric = [census_query[j][metric_list[i]['Name']] for j in range(len(census_query))]
    df_metric = pd.DataFrame(data={'County':county_list, 'County_Code': county_code_list, 'State':state_list, metric_list[i]['Alias']:metric}) 

    if i == 0: 
        county_results = df_metric
    else:
        county_results = county_results.merge(right=df_metric,how='outer')


county_results

Unnamed: 0,County,County_Code,State,Households,Married_couple_households_with_no_children,Households_with_1_or_more_children,Married_couple_households_with_1_or_more_children,Median_household_income,Children_below_poverty_level,Children_at_or_above_poverty_level,...,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_=_high_school_graduate/equivalent,Number_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Number_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Number_of_female-householder_families_at_or_above_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Total_number_of_individuals_25+y/o_in_table_B16010,Number_of_individuals_25+y/o_who_did_not_graduate_high_school,Number_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent,Number_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate's_degree,Number_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor's_degree_or_higher
0,"Sedgwick County, Kansas",173,20,195779.0,51310.0,63456.0,40484.0,54974.0,24090.0,107286.0,...,3747.0,2739.0,6761.0,457.0,4200.0,330375.0,35357.0,86432.0,106544.0,102042.0
1,"Republic County, Kansas",157,20,2248.0,827.0,462.0,361.0,48022.0,118.0,764.0,...,35.0,15.0,19.0,0.0,20.0,3488.0,169.0,1131.0,1333.0,855.0
2,"Graham County, Kansas",065,20,1216.0,379.0,276.0,202.0,40769.0,50.0,411.0,...,10.0,22.0,27.0,0.0,7.0,1901.0,168.0,665.0,569.0,499.0
3,"Douglas County, Kansas",045,20,46294.0,11383.0,11704.0,8368.0,55832.0,2597.0,19148.0,...,332.0,450.0,1013.0,111.0,1289.0,68491.0,3066.0,12342.0,18581.0,34502.0
4,"Sheridan County, Kansas",179,20,1134.0,386.0,301.0,203.0,56071.0,21.0,583.0,...,0.0,0.0,38.0,5.0,9.0,1826.0,99.0,607.0,730.0,390.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3215,"Adams County, Idaho",003,16,1675.0,687.0,346.0,218.0,45319.0,112.0,622.0,...,50.0,20.0,14.0,0.0,36.0,3065.0,318.0,1154.0,1009.0,584.0
3216,"Jerome County, Idaho",053,16,7828.0,2281.0,3053.0,2239.0,49306.0,1759.0,5083.0,...,136.0,137.0,135.0,22.0,125.0,14120.0,3150.0,4620.0,4252.0,2098.0
3217,"Lewis County, Idaho",061,16,1632.0,526.0,364.0,242.0,41326.0,151.0,721.0,...,19.0,9.0,15.0,11.0,12.0,2705.0,312.0,794.0,1130.0,469.0
3218,"Owyhee County, Idaho",073,16,4250.0,1405.0,1383.0,992.0,40430.0,936.0,2007.0,...,119.0,23.0,65.0,0.0,12.0,7436.0,1856.0,2552.0,2195.0,833.0


In [63]:
county_results = calculate_proportions(county_results)

In [64]:
county_results

Unnamed: 0,County,County_Code,State,Households,Married_couple_households_with_no_children,Households_with_1_or_more_children,Married_couple_households_with_1_or_more_children,Median_household_income,Children_below_poverty_level,Children_at_or_above_poverty_level,...,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_high_school_graduate/equivalent,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_individuals_25+y/o_who_did_not_graduate_high_school,Proportion_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate's_degree,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor's_degree_or_higher
0,"Morgan County, Utah",029,49,3306.0,1264.0,1666.0,1514.0,89274.0,52.0,3970.0,...,0.041084,0.006601,1.000000,0.500000,0.000000,0.000000,0.024083,0.177397,0.403746,0.394774
1,"Utah County, Utah",049,49,160649.0,46274.0,78439.0,67001.0,70408.0,19118.0,180124.0,...,0.072782,0.034821,0.404117,0.265647,0.289057,0.113312,0.058300,0.163084,0.377538,0.401077
2,"Yoakum County, Texas",501,48,2676.0,763.0,1226.0,1097.0,68814.0,257.0,2561.0,...,0.029748,0.080925,0.240000,0.350000,0.000000,1.000000,0.318363,0.288423,0.230140,0.163074
3,"Loudoun County, Virginia",107,51,125309.0,32553.0,60066.0,50779.0,136268.0,3768.0,105847.0,...,0.021654,0.005514,0.123235,0.197128,0.089500,0.082667,0.064863,0.123795,0.203433,0.607909
4,"Chattahoochee County, Georgia",053,13,2550.0,479.0,1306.0,1018.0,46453.0,553.0,1715.0,...,0.063425,0.000000,0.250000,0.722222,0.078652,0.245902,0.071751,0.258918,0.335793,0.333538
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3215,"Mineral County, Colorado",079,08,375.0,224.0,38.0,19.0,61058.0,14.0,64.0,...,0.072917,0.000000,,0.000000,1.000000,0.000000,0.032123,0.192737,0.375698,0.399441
3216,"Perry County, Alabama",105,01,3079.0,510.0,646.0,143.0,23561.0,1163.0,884.0,...,0.043478,0.029703,0.691489,0.298805,0.490421,0.364865,0.219261,0.394325,0.229407,0.157008
3217,"Sumter County, Florida",119,12,54636.0,30989.0,3634.0,2294.0,55228.0,1946.0,6659.0,...,0.039827,0.022946,0.316397,0.315700,0.249700,0.066667,0.084280,0.302428,0.302317,0.310976
3218,"Jefferson County, Mississippi",063,28,2530.0,436.0,664.0,98.0,20188.0,1104.0,584.0,...,0.000000,0.000000,0.395722,0.782805,0.217195,0.448000,0.251898,0.361167,0.236916,0.150020


In [65]:
county_results_1k_plus_households = county_results.query("Households > 1000")
county_results_1k_plus_households.to_csv('census_county_query_results_1k_plus_households.csv')
county_results.to_csv('census_county_query_results.csv')

In [54]:
for i in range(len(metric_list)):
    census_query = c.acs1.get(('NAME', metric_list[i]['Name']), {'for': 'state:*', 'year':year})
    state_list = [census_query[j]['NAME'] for j in range(len(census_query))] 
    metric = [census_query[j][metric_list[i]['Name']] for j in range(len(census_query))]
    df_metric = pd.DataFrame(data={'State':state_list, metric_list[i]['Alias']:metric}) 

    if i == 0: 
        state_results = df_metric
    else:
        state_results = state_results.merge(right=df_metric,how='outer')

state_results = calculate_proportions(state_results)

state_results

Unnamed: 0,State,Households,Married_couple_households_with_no_children,Households_with_1_or_more_children,Married_couple_households_with_1_or_more_children,Median_household_income,Children_below_poverty_level,Children_at_or_above_poverty_level,Children_in_married_couple_families_below_poverty_level,Children_in_married_couple_families_at_or_above_poverty_level,...,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_married-couple_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_female-householder_families_below_the_poverty_level_where_householder_did_not_graduate_high_school,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_high_school_graduate/equivalent,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_=_some_college_or_associate's_degree,Proportion_of_female-householder_families_below_the_poverty_level_where_householder's_highest_education_level_=_bachelor's_degree_or_higher,Proportion_of_individuals_25+y/o_who_did_not_graduate_high_school,Proportion_of_individuals_25+y/o_whose_highest_education_level_=_high_school_graduate/equivalent,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_some_college/associate's_degree,Proportion_of_individuals_25+_y/o_whose_highest_education_level_=_bachelor's_degree_or_higher
0,Utah,998891.0,291726.0,402100.0,316681.0,71414.0,85107.0,831284.0,35661.0,717052.0,...,0.035296,0.028653,0.482514,0.224251,0.204693,0.104639,0.075902,0.226191,0.34936,0.348547
1,Alaska,254551.0,67498.0,87148.0,60326.0,74346.0,24235.0,154802.0,9682.0,121297.0,...,0.036969,0.014952,0.511395,0.300421,0.142842,0.039212,0.066846,0.282872,0.348588,0.301694
2,Texas,9776083.0,2580101.0,3491908.0,2291873.0,60629.0,1522277.0,5763093.0,565024.0,4319511.0,...,0.050493,0.021615,0.461829,0.34433,0.250221,0.103376,0.160413,0.250464,0.286002,0.30312
3,Idaho,640270.0,202643.0,204505.0,146171.0,55583.0,61058.0,376332.0,24240.0,309008.0,...,0.047341,0.028906,0.4995,0.352743,0.2205,0.089711,0.090814,0.278224,0.354448,0.276514
4,California,13072122.0,3489337.0,4366906.0,2969086.0,75277.0,1504746.0,7320551.0,583498.0,5509268.0,...,0.044237,0.021018,0.379215,0.26088,0.193826,0.093569,0.162406,0.207318,0.288272,0.342005
5,New Jersey,3249567.0,940816.0,1026075.0,722978.0,81740.0,259521.0,1666275.0,83358.0,1302538.0,...,0.031132,0.015772,0.409942,0.231346,0.206674,0.091277,0.098154,0.263263,0.230619,0.407965
6,Hawaii,455309.0,135434.0,141643.0,99728.0,80212.0,34407.0,261742.0,8258.0,205336.0,...,0.028739,0.009005,0.305202,0.214158,0.168264,0.079962,0.080434,0.26791,0.316606,0.335051
7,Kansas,1133408.0,325328.0,353633.0,245412.0,58218.0,99649.0,588044.0,32110.0,465345.0,...,0.03369,0.011017,0.41611,0.333712,0.269059,0.093683,0.090002,0.254131,0.317824,0.338042
8,Colorado,2176757.0,605861.0,652257.0,462324.0,71953.0,143078.0,1104410.0,49947.0,867068.0,...,0.036061,0.014473,0.40469,0.233688,0.193471,0.10796,0.081306,0.208347,0.29357,0.416777
9,Nebraska,765490.0,218226.0,237634.0,161575.0,59566.0,57486.0,405287.0,19600.0,317325.0,...,0.034346,0.010351,0.403501,0.30833,0.253462,0.093585,0.085552,0.25773,0.332399,0.324319


In [55]:
state_results.to_csv('census_state_query_results.csv')

In [18]:
end_time = time.time()
run_time = end_time - start_time
run_minutes = run_time // 60
run_seconds = run_time % 60
print("Completed run at",time.ctime(end_time),"(local time)")
print("Total run time:",'{:.2f}'.format(run_time),"second(s) ("+str(run_minutes),"minute(s) and",'{:.2f}'.format(run_seconds),"second(s))") # Only valid when the program is run nonstop from start to finish

Completed run at Sat Jan  8 13:54:26 2022 (local time)
Total run time: 244.79 second(s) (4.0 minute(s) and 4.79 second(s))
