# Census Data Imports (Work in Progress)

By Kenneth Burchfiel

Released under the MIT License

The US Census is a fantastic source of free demographic data. Thankfully, we can easily access large amounts of this data at once via Python, as demonstrated by this notebook. The following code will retrieve data that can help answer two specific questions: (1) What would be the best place for college grads to raise a family? and (2) What is the correlation between marriage and poverty rates?

## Question 1: Deciding where to move to start a family

Let's say that some NVCU graduating seniors are interested in settling down and raising a family a few years after they graduate. Therefore, they'd like to know what parts of the US have especially high percentages of married families with kids. However, they'd also like to avoid having to spend too much money on a home, so they're interested in regions with relatively low property costs. And finally, because they'd prefer to live in a growing region rather than a declining one, they want to know which areas have the highest 5-year population growth rates.

In order to answer these questions, we'll use the Census API to retrieve marriage, home price, and population growth data for all US counties. We'll then combine this data together in order to create a 'Fit Index' that we can share with these seniors.

## Question 2: Understanding the correlation between marriage and poverty rates

NVCU's College of Social Work is committed to finding ways to reduce the poverty rate in the United States. As part of this mission, they would like to determine whether married couples are less likely to live below the poverty line than are non-married couples.\* They are especially interested in child poverty, so they'd also like to know whether married couples with kids have a lower poverty rate than do non-married households with kids.

*(Although Census data can help us determine the relative poverty rates between married and non-married households, it can't answer the more important question: does getting married actually *reduce* poverty? After all, it's possible that the causal path between marriage and poverty may run in reverse: individuals who are better off financially might be more likely to marry. In other words, poverty might reduce the marriage rate, not the other way around. The College of Social Work is aware of the danger of conflating correlation and causation, but they believe that this demographic data will still be worth exploring.)*

We'll also create choropleth maps in the Mapping section of Python for Nonprofits that visualize these Census data.


## An introduction to the American Community Survey

Many Americans probably associate the US Census Bureau with its decennial Census. However, the Census Bureau also conducts the [American Community Survey](https://www.census.gov/data/developers/data-sets/acs-5year.html) each year, making it a great resource for recent demographic data.

This notebook will source data from the American Community Survey's 5-year estimates, which show an average of results for the past 5 years. (For example, the 2022 ACS5 dataset shows results between 2018 and 2022). The [1-year ACS estimates](https://www.census.gov/data/developers/data-sets/acs-1year.html) offer results for a more recent timeframe; however, because the 5-year estimates are sourced from a larger pool of data, they may be more reliable (especially for smaller regions). In addition, 1-year estimates aren't available for certain regions, such as zip codes.

For the sake of brevity, I'll often refer to the American Community Survey's 5-year estimates as the 'ACS5' survey.

# Part 1: Introducing the Census API and Answering Our First Question

In the process of working to answer the first question (regarding where to move to start a family), we'll also explore the Census API's capabilities and craft a Python function that will use it to efficiently retrieve data.

Importing relevant libraries and setting two configuration variables:

In [1]:
import time
program_start_time = time.time()
import pandas as pd
import numpy as np
from iteration_utilities import duplicates
pd.set_option('display.max_columns', 1000)
import lxml # Necessary for reading online HTML tables into Pandas

render_for_pdf = False
if render_for_pdf == True:
    pd.set_option('display.max_columns', 4)


latest_acs5_year = 2022 # By updating this variable when future American 
# Community Surveys get released, you should be able to retrieve the most
# recent data possible. (If changes to the survey's format are made,
# however, updates to the scripts may be necessary.)
download_variable_list = False # If set to True, a new list of variables
# will be downloaded from the Census API website. If False, this list of 
# variables will instead be read in from a local .csv copy (thus saving
# processing time).

## Importing a Census API Key

You can obtain a free Census API key [at this website](https://api.census.gov/data/key_signup.html). The following cell imports my own personal key, so you'll need to replace this code with one that loads in your own API key.

In [2]:
with open ('census_api_key_path.txt') as file:
    key_path = file.read()
with open(key_path) as file:
    key = file.read()

The Census offers detailed API documentation that makes retrieving data from it relatively straightforward. For instance, you'll probably find the Census Data API User Guide](https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf) to be helpful in applying the Census API.

[This list](https://api.census.gov/data/2022/acs/acs5/examples.html) of ACS5 API call examples is another great resource. One of the sample URLs shown on this page for retrieving county-level data appears as follows:

https://api.census.gov/data/2022/acs/acs5?get=NAME,B01001_001E&for=county:*&key=YOUR_KEY_GOES_HERE

'B01001_001E' refers to the total population estimates for a given county. We can find this out by going to the 2021 ACS5's Detailed Tables page (https://api.census.gov/data/2021/acs/acs5/variables.html) and navigating to the row with a 'Name' value of 'B01001_001E'. This link, which may take a little while to load, is available on the 2021 ACS5 API Documentation Page (https://www.census.gov/data/developers/data-sets/acs-5year.html).

If you replace the 'YOUR_KEY_GOES_HERE' component of the URL with your actual key, then enter this link into your web browser, you'll receive a very long list of counties, population values, and state and county codes. The top of the list for the 2022 ACS5 looked like this:

```
[["NAME","B01001_001E","state","county"],
["Autauga County, Alabama","58761","01","001"],
["Baldwin County, Alabama","233420","01","003"],
["Barbour County, Alabama","24877","01","005"],
["Bibb County, Alabama","22251","01","007"],
["Blount County, Alabama","59077","01","009"],
["Bullock County, Alabama","10328","01","011"],
```

We can use `pd.read_json()` to easily read this same data into a DataFrame:

In [3]:
df_results = pd.read_json(
    f'https://api.census.gov/data/{latest_acs5_year}/\
acs/acs5?get=NAME,B01001_001E&for=county:*&key={key}')
# read_json documentation:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/
# pandas.read_json.html
df_results.head()

Unnamed: 0,0,1,2,3
0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,01,001
2,"Baldwin County, Alabama",233420,01,003
3,"Barbour County, Alabama",24877,01,005
4,"Bibb County, Alabama",22251,01,007


At this point, the DataFrame's columns are [0, 1, 2, 3], whereas the columns we want to use are stored within the first row. The following code sets these row values as our column values, then deletes this row:

In [4]:
df_results.columns = df_results.iloc[0]
df_results.drop(0, inplace = True)

df_results.head()

Unnamed: 0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,1,1
2,"Baldwin County, Alabama",233420,1,3
3,"Barbour County, Alabama",24877,1,5
4,"Bibb County, Alabama",22251,1,7
5,"Blount County, Alabama",59077,1,9


## Retrieving our Data

In order to determine which variable codes to enter into our script, we'll first need to review a list of all American Community Survey variables and the overall groups into which they fit. This list is available on the Census website ([here's the copy for 2022](https://api.census.gov/data/2022/acs/acs5/variables.html)), but we can also use Pandas to import them into a DataFrame, as shown below.

In [5]:
if download_variable_list == True:
    df_variables_page = pd.read_html(
        f'https://api.census.gov/data/{latest_acs5_year}/acs/acs5/variables.html')[0] 
    # [0] selects the first HTML table found on this page.
    # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
    # pandas.read_html.html
    # for more information on pd.read_html().
        
    # Some rows in this table contain items other than demographic 
    # variables (e.g. region names). We can exclude them by selecting 
    # only rows that begin with 'Estimate'. (Another option would have 
    # been to filter out rows with N/A 'Group' entries (i.e. 
    # df_variables.query("Group.isna() == False")), 
    # but this would have left a couple non-variable rows in place.
    
    df_variables = df_variables_page[
    df_variables_page['Label'].str[0:8] == 'Estimate'].copy(
    ).reset_index(drop=True)
    # Saving this table to a local .csv file:
    df_variables.to_csv(f'Datasets/{latest_acs5_year}_variables.csv', 
    index = False)
else: # Reading a local copy of this dataset instead, which should 
    # take much less time. 
    df_variables = pd.read_csv(
        f'Datasets/{latest_acs5_year}_variables.csv')
df_variables.head()

Unnamed: 0,Name,Label,Concept,Required,Attributes,Limit,Predicate Type,Group,Unnamed: 8
0,B01001_001E,Estimate!!Total:,Sex by Age,not required,"B01001_001EA, B01001_001M, B01001_001MA",0,int,B01001,
1,B01001_002E,Estimate!!Total:!!Male:,Sex by Age,not required,"B01001_002EA, B01001_002M, B01001_002MA",0,int,B01001,
2,B01001_003E,Estimate!!Total:!!Male:!!Under 5 years,Sex by Age,not required,"B01001_003EA, B01001_003M, B01001_003MA",0,int,B01001,
3,B01001_004E,Estimate!!Total:!!Male:!!5 to 9 years,Sex by Age,not required,"B01001_004EA, B01001_004M, B01001_004MA",0,int,B01001,
4,B01001_005E,Estimate!!Total:!!Male:!!10 to 14 years,Sex by Age,not required,"B01001_005EA, B01001_005M, B01001_005MA",0,int,B01001,


With over 28,000 individual variables, it could take a very long time to identify the items you'd like to retrieve from the Census. We can make this search process somewhat easier by creating a separate *groups* table that shows only unique group names and their written descriptions (e.g. 'Sex by Age').

In [6]:
df_groups = df_variables.drop_duplicates(
    'Group')[['Concept', 'Group']].copy(
    ).reset_index(drop=True)
df_groups.head()

Unnamed: 0,Concept,Group
0,Sex by Age,B01001
1,Sex by Age (White Alone),B01001A
2,Sex by Age (Black or African American Alone),B01001B
3,Sex by Age (American Indian and Alaska Native ...,B01001C
4,Sex by Age (Asian Alone),B01001D


We'll save this group table to a local .csv file as well:

In [7]:
df_groups.to_csv(f'Datasets/{latest_acs5_year}_groups.csv', 
                 index = False)

In order to find variables of interest, I recommend first searching for keywords of interest within the group table (which is much smaller in size) in order to identify relevant group IDs. Next, you can search for those group IDs inside the variables table in order to find the exact metrics to request from the Census API.

**[Note: the following variable list is just a placeholder. It will be replaced soon with the actual variables that can be used to answer Question 1.]**

In [8]:
grad_destinations_variable_list = [
    'B01001_001E', 'B01001_002E']

The demographic columns in the Census API's output are labeled with their variable names (e.g. 'B01001_001E'). These names are concise, but you'll need a copy of the original variable list to interpret them. Therefore, I chose to replace these column names with a combination of the 'Label', 'Concept', and 'Name' entries in the original variable list. These column names are very long, but they do make the output easier to interpret (while also preserving the original names for reference). 

In addition, if the description corresponding to a variable name happens to change from one year to another, the use of aliases will help you identify that change. (This will help prevent you from treating two different data types that happened to use the same variable code in different years as equal. However, I would imagine that the Census wouldn't repurpose variable codes in this way.)

The following function assists with this replacement by creating a dictionary whose keys are the original field names (e.g. 'B001_001E') and whose values are the replacement names (e.g. 'Sex by Age_Estimate!!Total:_B01001_001E').

In [9]:
def create_variable_aliases(df_variables, variable_list):
    '''This function creates a dictionary whose keys are 
    the original 'Name' values (e.g. 'B001_001E') within a variable
    list on the Census API website and whose values are the replacement 
    names (e.g. 'Sex by Age_Estimate!!Total:_B01001_001E').
    This resulting dictionary can then be passed to a df.rename() call
    within retrieve_census_data() in order to make the output of that
    function easier to interpret.
    
    df_variables: A DataFrame containing a list of Census variables. For
    an example of this list for the 2022 American Community Survey (5-Year 
    Estimates), visit: 
    https://api.census.gov/data/2022/acs/acs5/examples.html .
    
    variable_list: The list of variables to rename 
    (e.g. ['B01001_001E', 'B01001_002E']).
    '''
    # Creating a DataFrame that contains the information needed for the
    # updated column names:
    df_aliases = df_variables.query(
        "Name in @variable_list")[['Name', 'Label', 'Concept']].copy()
    # Creating a new 'Description' column that will replace the original
    # output field names:
    df_aliases['Description'] = (df_aliases['Concept'] 
                                 + '_' + df_aliases['Label'] 
                                 + ' (' + df_aliases['Name'] + ')')
    # Creating a dictionary whose keys are the original field names and 
    # whose values are the new 'Description' entries that were 
    # just created:
    alias_dict = df_aliases.set_index('Name').to_dict()['Description']
    # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
    # pandas.DataFrame.to_dict.html
    return alias_dict

Creating our aliases:

In [10]:
grad_destinations_alias_dict = create_variable_aliases(
    df_variables = df_variables, 
    variable_list = grad_destinations_variable_list)
grad_destinations_alias_dict

{'B01001_001E': 'Sex by Age_Estimate!!Total: (B01001_001E)',
 'B01001_002E': 'Sex by Age_Estimate!!Total:!!Male: (B01001_002E)'}

## Defining a Census data retrieval function

The following function simplifies the process of retrieving data from the Census API. It also enables the user to rename variable fields (e.g. 'B01001_001E') with aliases for those fields (e.g. 'Sex by Age_Estimate!!Total: (B01001_001E)'), but this option is disabled by default. In addition, it allows more than 50 variables to be retrieved at the same time, thus making it easier to retrieve especially large datasets.

[Note: currently, this function only supports data retrieval for the ACS 5-year and 1-year estimates. However, I may add in the ability to retrieve decennial Census data in the future.]

In [11]:
def retrieve_census_data(survey, year, region, key, variable_list,
                         rename_data_fields = False, 
                         field_names_dict = {}):
    '''This function (which I plan to expand) retrieves data from the US
    Census API. It accommodates more than 50 variables.
    
    survey: the survey from which to retrieve data. The only arguments
    currently supported are 'acs5' and 'acs1' (for the American Community 
    Survey 5-Year and 1-Year estimates, respectively).
    
    year: the year for which you wish to retrieve survey data. Note that,
    When region is set to 'acs5', the survey results will include data
    for the 5 years leading up to (and including) the 'year' argument.
    (For example, if you set 'year' to 2022, you'll retrieve ACS5 data
    from 2018 to 2022 (inclusive).)
    
    
    region: The geographic level at which you wish to retrieve data. 
    Examples include 'us', 'state', 'county', 'zip', 'msa' 
    (for metropolitan/micropolitan statistical area data), and 'csa' 
    (for combined statistical area data); 
    however, other regions are supported as well. Consult your survey's 
    API examples page for other options. (For instance, if you wanted to 
    retrieve data by urban area within the 2022 ACS5, you could go to 
    https://api.census.gov/data/2022/acs/acs5/examples.html, then search
    for 'urban area.' The Urban Area URL ends with
    '&for=urban%20area:*&key=YOUR_KEY_GOES_HERE'. Therefore, you'd want to
    use 'urban%20area' as your 'region' argument.)   

    (Note: 'zip' will retrieve results by Zip Code
    Tabulation Area, which are similar to (but not identical to)
    # zip codes. See 
    # https://en.wikipedia.org/wiki/ZIP_Code_Tabulation_Area
    # for more information.
    
    variable_list: The list of variables for which to retrieve data.

    key: your personal Census API key.

    rename_data_fields: set to True to replace column names in your 
    dataset with new entries of your choice.

    field_names_dict: A dictionary that stores the original variable names
    retrieved by the Census (e.g. 'B01001_001E' as keys and your desired
    replacements as values. Example: 
    {'B01001_001E': 'Sex by Age_Estimate!!Total:_B01001_001E',
     'B01001_002E': 'Sex by Age_Estimate!!Total:!!Male:_B01001_002E'}'
     
    '''

    # Using the iteration_utilities library to check for duplicate
    # values within variable_list (which could cause issues later on):
    # The following code is based on
    # https://iteration-utilities.readthedocs.io/en/latest/generated/
    # duplicates.html
    duplicate_variables = list(duplicates(variable_list))
    
    if len(duplicate_variables) > 0:
        raise ValueError(f"The following variables appear more than once \
in your variable list: {duplicate_variables}")
    
    if survey == 'acs5':
        survey_string = 'acs/acs5'

    elif survey == 'acs1':
        survey_string = 'acs/acs1'
    
    else:
        raise ValueError("This survey type is not currently supported by \
                         the function.")

    
    # Converting simplified region names into strings that the API 
    # function will recognize:
    if region == 'zip':
        region = 'zip%20code%20tabulation%20area' # Based on
        # the ZCTA example within
        # https://api.census.gov/data/2022/acs/acs5/examples.html
    
    if region == 'csa':
        region = 'combined%20statistical%20area'
    
    if region == 'msa':
        region = 'metropolitan%20statistical\
%20area/micropolitan%20statistical%20area'

    
    # Only 50 variables can be retrieved from the Census API at a time 
    # using the approach shown in this function. The following code 
    # accommodates this limitation by splitting variable_list into 
    # sublists of up to 49 variables. The data retrieved for the variables 
    # in these sublists will then get merged back together.
    # (49 variables are retrieved at a time instead of 50 because it 
    # appears that the initial 'NAME' variable also counts towards 
    # the 50-variable limit.)
    
    i = 0
       
    while i < len(variable_list): # i.e. while there
        # are still more variables to iterate through
        variable_sublist = variable_list[i:i+49] # This line reads the 
        # next 49 variables from variable_list into a sublist that can 
        # then be\ passed to the API
        # print("variable_sublist:", variable_sublist)
        # Converting the list of variables into a string that can be 
        # passed to the API call:
        # (The Census API guide at
        # https://www.census.gov/content/dam/Census/data/developers/
        # api-user-guide/api-guide.pdf
        # demonstrates how to call multiple census variables at once.)
        variable_string = ','.join(variable_sublist)
        # print("variable_string:",variable_string)
    
        # Retrieving data via the Census API:
        # This line was originally based on an example found in
        # https://api.census.gov/data/2022/acs/acs5/examples.html .
    
        # read_json documentation:
        # https://pandas.pydata.org/pandas-docs/stable/reference/api/
        # pandas.read_json.html

        api_url = f'https://api.census.gov/data/{year}/\
{survey_string}?get=NAME,{variable_string}&for={region}:*&key={key}'
        # print(api_url)
        
        df_results = pd.read_json(api_url)
    
        # At this point, the DataFrame's columns are a list of integers; 
        # the desired column names are stored within the first row. 
        # The following code resolves this issue by setting these row 
        # values as the column values and then deleting this row.
    
        df_results.columns = df_results.iloc[0]
        df_results.drop(0, inplace = True)


        # Determining which merge keys to use when combining API results
        # for different sublists together:
        # This is made more complicated by the fact that results for 
        # different regions will have different identifier
        # columns (e.g. 'NAME', 'county', and 'state' for county data but 
        # only 'NAME' and 'state' for state data). However, we can 
        # accommodate this behavior by simply initializing our list of 
        # merge keys as the set of all columns that are *not* also 
        # variable columns.
        if i == 0: # This step only needs to be performed for our first
            # sublist of variables, since merge keys for other sublists
            # will be identical.
            merge_keys = list(set(df_results.columns) 
              - set(variable_sublist))
            # print("merge_keys:",merge_keys)

        if i == 0: # Since this is the first set 
            # of results, we can initialize df_combined_results 
            # as a copy of df_results.
            df_combined_results = df_results.copy()
        else: # Merging our latest set of results into df_results:
            df_combined_results = df_combined_results.merge(
                df_results, on = merge_keys,
                how = 'outer').copy()
            # Added .copy() here in response to a data fragmentation 
        # warning

        i += 49 
        # Allows the function to iterate through the next 49 variables
        # within variable_list

        
    # Converting variable columns to numeric data types:
    for column in variable_list:
        # print(f"Now converting {column} to a numeric type.")
        df_combined_results[column] = pd.to_numeric(
            df_combined_results[column])
        # pd.to_numeric() allows for either integer or float outputs
        # depending on the nature of the original data.
        # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
        # pandas.to_numeric.html

    # Replacing column names with aliases if requested:
    if rename_data_fields == True:
        df_combined_results.rename(
            columns = field_names_dict, inplace = True)

    # The following for loop moves all of the merge keys (e.g. geographic
    # identifiers) to the left side of the table. This is particularly
    # useful when retrieving longer lists of variables, as otherwise,
    # certain keys can get buried in the middle of the dataset
    for i in range(len(merge_keys)):
        df_combined_results.insert(
            i, merge_keys[i], 
            df_combined_results.pop(merge_keys[i]))
        
    
    return df_combined_results

(The following code allowed me to test out retrieve_census_data for a particularly long variable list.)

In [12]:
# test_list = list(df_variables['Name'][0:151])

# test_alias_dict = create_variable_aliases(
#     df_variables = df_variables, 
#     variable_list = test_list)

# test_acs5_data = retrieve_census_data(
#     survey = 'acs5', year = latest_acs5_year, region = 'county',
#     variable_list = test_list, 
#     rename_data_fields = True, 
#     field_names_dict = test_alias_dict, key = key)

# test_acs5_data

Calling retrieve_census_data to retrieve variables that can help us answer Question 1:

In [13]:
df_grad_destinations_acs5_data = retrieve_census_data(
    survey = 'acs5', year = latest_acs5_year, 
    region = 'county',
    variable_list = grad_destinations_variable_list, 
    rename_data_fields = True, field_names_dict = grad_destinations_alias_dict,
    key = key)

df_grad_destinations_acs5_data.head()

Unnamed: 0,state,county,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Sex by Age_Estimate!!Total:!!Male: (B01001_002E)
1,1,1,"Autauga County, Alabama",58761,28663
2,1,3,"Baldwin County, Alabama",233420,114077
3,1,5,"Barbour County, Alabama",24877,12973
4,1,7,"Bibb County, Alabama",22251,11897
5,1,9,"Blount County, Alabama",59077,29864


Saving this data as a .csv file so that we can create maps of it within the Mapping section of Python for Nonprofits:

In [14]:
df_grad_destinations_acs5_data.to_csv(
    f'Datasets/grad_destinations_acs_data_{latest_acs5_year}.csv', 
    index = False)

## Part 2: Retrieving data on marriage and poverty in order to answer Question 2


In [15]:
marriage_poverty_variable_list = [
    'B01001_001E', 'B17010_001E', 'B17010_003E', 'B17010_004E',
    'B17010_011E', 'B17010_016E', 'B17010_017E', 'B17010_023E',
    'B17010_024E', 'B17010_031E', 'B17010_036E', 'B17010_037E',
    'B11003_001E', 'B11003_002E', 'B11003_003E', 'B11004_001E',
    'B11004_002E', 'B11004_003E', 'B17017_002E', 'B17017_004E',
    'B17017_015E', 'B17017_009E', 'B17017_020E', 'B17017_031E',
    'B17017_033E', 'B17017_038E', 'B17017_044E', 'B17017_049E'
]

marriage_poverty_alias_dict = create_variable_aliases(
    df_variables = df_variables, 
    variable_list = marriage_poverty_variable_list)
# marriage_poverty_alias_dict

In [16]:
df_marriage_poverty_acs5_data = retrieve_census_data(
    survey = 'acs5', year = latest_acs5_year, region = 'county',
    variable_list = marriage_poverty_variable_list, 
    rename_data_fields = True, 
    field_names_dict = marriage_poverty_alias_dict, key = key)

# Showing a shortened version of this DataFrame if render_for_pdf
# is set to True so as to prevent its text from getting cut off:

In [17]:
if render_for_pdf == True:
    pd.set_option('display.max_columns', 3)


df_marriage_poverty_acs5_data.head()

Unnamed: 0,state,county,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E)
1,1,1,"Autauga County, Alabama",58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827
2,1,3,"Baldwin County, Alabama",233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523
3,1,5,"Barbour County, Alabama",24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318
4,1,7,"Bibb County, Alabama",22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570
5,1,9,"Blount County, Alabama",59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310


In [18]:
# Allowing for larger number of columns to get displayed within
# subsequent DataFrame displays:
if render_for_pdf == True:
    pd.set_option('display.max_columns', 4)

## Performing additional calculations

The following cell uses fields within df_marriage_poverty_acs5_data to calculate poverty rates for:

1. Married-couple households
2. Non-married-couple households
3. Households with 1+ kids below 18 headed by a married couple
4. Households with 1+ kids below 18 *not* headed by a married couple

In addition, it will also calculate differences in poverty rates between:
1. Non-married and married couple households
2. Non-married households with 1+ kids below 18 and married households with 1+ kids below 18

In [19]:
df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level'] = (df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months by Household Type by \
Age of Householder_Estimate!!Total:!!Income in the past 12 months \
below poverty level: (B17017_002E)'] 
- df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'])

df_marriage_poverty_acs5_data['Non-married-couple households at or above \
poverty level'] = (df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months by Household Type \
by Age of Householder_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level: (B17017_031E)'
] 
- df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'])

df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
below poverty level'] = (df_marriage_poverty_acs5_data['Poverty Status \
in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months below poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_011E)'] 
+ df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
below poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_017E)'])

df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
at or above poverty level'] = (df_marriage_poverty_acs5_data['Poverty \
Status in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months at or above poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_031E)'] 
+ df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_037E)'])

df_marriage_poverty_acs5_data[
'% of Married Households Below Poverty Level'] = 100 * (
    df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'] / 
    (df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'] 
    + df_marriage_poverty_acs5_data['Poverty Status in the Past 12 \
Months by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)']))


df_marriage_poverty_acs5_data[
'% of Non-Married Households Below Poverty Level'] = 100 * (
    df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level'] / 
    (df_marriage_poverty_acs5_data['Non-married-couple households \
at or above poverty level'] 
+ df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level']))


df_marriage_poverty_acs5_data['% of Married Households With \
1+ Kids Below Poverty Level'] = 100* (
df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)'] / 
(df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children Under \
18 Years by Age of Related Children_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_024E)'] 
+ df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)']))

df_marriage_poverty_acs5_data[
'% of Non-Married Households With 1+ Kids Below Poverty Level'] = 100 * (
df_marriage_poverty_acs5_data[
'Non-married households with 1+ kids below poverty level'] / (
df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
below poverty level'] +
df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
at or above poverty level']))

# Creating columns that show the difference in poverty rates between
# married and non-married households:

df_marriage_poverty_acs5_data['Non-Married/Married Household \
Poverty Rate Difference'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households Below Poverty Level'] 
    - df_marriage_poverty_acs5_data[
    '% of Married Households Below Poverty Level'])

df_marriage_poverty_acs5_data['Non-Married Household With 1+ Kids/\
Married Household With 1+ Kids Poverty Rate Difference'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households With 1+ Kids Below Poverty Level'] 
    - df_marriage_poverty_acs5_data[
    '% of Married Households With 1+ Kids Below Poverty Level'])

# Creating similar columns that show the *ratio* between these two rates:
# (This approach can better adjust for differences in overall poverty 
# rates across counties, but it does have one shortcoming that we'll 
# discuss later in this cell.)

df_marriage_poverty_acs5_data['Non-Married/Married Household \
Poverty Rate Ratio'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households Below Poverty Level'] 
    / df_marriage_poverty_acs5_data[
    '% of Married Households Below Poverty Level'])

df_marriage_poverty_acs5_data['Non-Married Household With 1+ Kids/\
Married Household With 1+ Kids Poverty Rate Ratio'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households With 1+ Kids Below Poverty Level'] 
    / df_marriage_poverty_acs5_data['% of Married Households \
With 1+ Kids Below Poverty Level'])

# It appears that dividing by 0 within a DataFrame operation produces
# inf values, which may cause issues during subsequent calculations. 
# Replacing inf values created by the previous operation with NaNs:

df_marriage_poverty_acs5_data.replace(np.inf, np.nan, inplace = True)

df_marriage_poverty_acs5_data.head()

Unnamed: 0,state,county,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E),Non-married-couple households below poverty level,Non-married-couple households at or above poverty level,Non-married households with 1+ kids below poverty level,Non-married households with 1+ kids at or above poverty level,% of Married Households Below Poverty Level,% of Non-Married Households Below Poverty Level,% of Married Households With 1+ Kids Below Poverty Level,% of Non-Married Households With 1+ Kids Below Poverty Level,Non-Married/Married Household Poverty Rate Difference,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Difference,Non-Married/Married Household Poverty Rate Ratio,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Ratio
1,1,1,"Autauga County, Alabama",58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827,1689,8730,443,1492,5.946673,16.210769,6.111957,22.894057,10.264095,16.782099,2.726023,3.745782
2,1,3,"Baldwin County, Alabama",233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523,8475,31308,2040,4355,3.6065,21.303069,5.10798,31.899922,17.69657,26.791942,5.906855,6.245115
3,1,5,"Barbour County, Alabama",24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318,1933,3652,752,707,6.878461,34.610564,9.26672,51.542152,27.732103,42.275432,5.031731,5.562071
4,1,7,"Bibb County, Alabama",22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570,1343,2256,495,438,6.248272,37.315921,11.937984,53.054662,31.067649,41.116678,5.972199,4.444189
5,1,9,"Blount County, Alabama",59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310,2574,6986,463,1577,7.417537,26.924686,10.776292,22.696078,19.507149,11.919786,3.629869,2.106112


## Reviewing our marriage and poverty data:

We can use df.describe() to take a quick look at the poverty rate columns that we just calculated. The 50% row is particularly helpful, as it shows the median results in our dataset.

The 2022 copy of these results showed that the median poverty rate by county for married couples was 4.76% compared to 22.04% for non-married couples. Meanwhile, married-couple households with kids had a median poverty rate of 6.17% compared to 31.77% for non-married couple households with kids. Therefore, although we can't determine the *direction* of causation within the marriage/poverty relationship using this data alone, it's evident that married households tend to have lower poverty rates than do non-married households.

We'll create choropleth maps that illustrate some of these data points within the Mapping section of Python for Nonprofits.

In [20]:
df_marriage_poverty_acs5_data[['% of Married Households Below \
Poverty Level',
       '% of Non-Married Households Below Poverty Level',
       '% of Married Households With 1+ Kids Below Poverty Level',
       '% of Non-Married Households With 1+ Kids Below Poverty Level',
       'Non-Married/Married Household Poverty Rate Difference',
       'Non-Married Household With 1+ Kids/Married Household \
With 1+ Kids Poverty Rate Difference',
       'Non-Married/Married Household Poverty Rate Ratio',
       'Non-Married Household With 1+ Kids/Married Household \
With 1+ Kids Poverty Rate Ratio']].describe()

Unnamed: 0,% of Married Households Below Poverty Level,% of Non-Married Households Below Poverty Level,% of Married Households With 1+ Kids Below Poverty Level,% of Non-Married Households With 1+ Kids Below Poverty Level,Non-Married/Married Household Poverty Rate Difference,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Difference,Non-Married/Married Household Poverty Rate Ratio,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Ratio
count,3222.0,3222.0,3221.0,3218.0,3222.0,3218.0,3205.0,3125.0
mean,5.98573,23.797786,7.926897,33.4297,17.812056,25.495413,5.394284,6.959111
std,5.168185,9.3747,7.177521,14.799942,6.576608,13.033753,5.982786,9.73897
min,0.0,0.0,0.0,0.0,-12.225705,-52.0,0.527273,0.0
25%,3.203336,17.453112,3.649704,23.834888,13.324396,17.867021,3.436623,3.299141
50%,4.764387,22.0414,6.170981,31.771598,16.947933,24.481687,4.631352,4.805861
75%,6.908005,28.248617,9.789157,41.17132,21.537258,32.149544,6.12061,7.269667
max,52.100457,72.039474,67.663551,100.0,49.85537,100.0,202.401575,158.195489


Saving this dataset as a .csv file:

In [21]:
df_marriage_poverty_acs5_data.to_csv(
    f'Datasets/marriage_poverty_acs5_data_{latest_acs5_year}.csv', 
    index = False)

# Appendix

## 1: The Census Python Library

It's worth noting that there is also a 'census' Python library (available via pypi and conda) that helps simplify the process of requesting API data. You may choose to use it for your own Census research, but I ended up not needing it for the data retrieval tasks shown above. In addition, foregoing the library allowed me to demonstrate how to retrieve data directly from an API, which you may find helpful when working with APIs that don't have a corresponding Python library. 

Here's an example of the Census library in use:

In [22]:
## Example of reading data from the Census library into a 
# Pandas DataFrame:
from census import Census
c = Census(key)
pd.DataFrame(c.acs5.get(('NAME', 'B01001_001E'),
{'for': 'county:*'}))

Unnamed: 0,NAME,B01001_001E,state,county
0,"Autauga County, Alabama",58761.0,01,001
1,"Baldwin County, Alabama",233420.0,01,003
2,"Barbour County, Alabama",24877.0,01,005
3,"Bibb County, Alabama",22251.0,01,007
4,"Blount County, Alabama",59077.0,01,009
...,...,...,...,...
3217,"Vega Baja Municipio, Puerto Rico",54182.0,72,145
3218,"Vieques Municipio, Puerto Rico",8199.0,72,147
3219,"Villalba Municipio, Puerto Rico",21984.0,72,149
3220,"Yabucoa Municipio, Puerto Rico",30313.0,72,151


## 2: The requests library

We can also use Python's *requests* library to retrieve data from the Census API, then convert it to JSON format:

In [23]:
# The following code borrows from the requests library documentation at 
# https://docs.python-requests.org/en/latest/index.html
import requests
r = requests.get(f'https://api.census.gov/data/{latest_acs5_year}/\
acs/acs5?get=NAME,B01001_001E&for=county:*&key={key}')
# Printing the first 300 characters of this output:
print("r.text:\n",r.text[0:300],'\n')
# Printing the first 5 lines of r.json:
print("r.json:\n",r.json()[0:5],'\n')

r.text:
 [["NAME","B01001_001E","state","county"],
["Autauga County, Alabama","58761","01","001"],
["Baldwin County, Alabama","233420","01","003"],
["Barbour County, Alabama","24877","01","005"],
["Bibb County, Alabama","22251","01","007"],
["Blount County, Alabama","59077","01","009"],
["Bullock County, Ala 

r.json:
 [['NAME', 'B01001_001E', 'state', 'county'], ['Autauga County, Alabama', '58761', '01', '001'], ['Baldwin County, Alabama', '233420', '01', '003'], ['Barbour County, Alabama', '24877', '01', '005'], ['Bibb County, Alabama', '22251', '01', '007']] 



Converting our response to JSON allows it to be easily read into a Pandas DataFrame, as shown below:

In [24]:
pd.DataFrame(r.json()).head()
# Note that pd.DataFrame(r.text) would produce the following error:
# "ValueError: DataFrame constructor not properly called!"

Unnamed: 0,0,1,2,3
0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,01,001
2,"Baldwin County, Alabama",233420,01,003
3,"Barbour County, Alabama",24877,01,005
4,"Bibb County, Alabama",22251,01,007


I included this approach in the appendix because you may find the requests library useful for other online data retrieval tasks. However, our use of `pd.read_json()` to import Census data rendered an explicit call to the requests library unnecessary.

In [25]:
program_end_time = time.time()
run_time = round(program_end_time - program_start_time, 3)
print(f"Finished running script in {run_time} seconds.")

Finished running script in 16.623 seconds.
