# Marriage & Poverty Census Data Analyses (Work in Progress)

By Kenneth Burchfiel

Released under the MIT License

This script will import Census data that will help answer two questions: 
1. Are married couples are less likely to live below the poverty line than non-married couples?
2. Do married couples with kids have a lower poverty rate than do non-married households with kids?

*Note: (Although Census data can help us determine the relative poverty rates between married and non-married households, it can't answer the more important question: does getting married actually *reduce* poverty? After all, it's possible that the causal path between marriage and poverty may run in reverse: individuals who are better off financially might be more likely to marry. In other words, poverty might reduce the marriage rate, not the other way around. The College of Social Work is aware of the danger of conflating correlation and causation, but they believe that this demographic data will still be worth exploring.)*

Note: much of this code used to reside within [census_data_imports_v2.ipynb](https://github.com/kburchfiel/pfn/blob/main/Census_Data_Imports/census_data_imports_v2.ipynb) within my Python for Nonprofits GitHub project. In order to simplify that project, I decided to move that file (along with its corresponding mapping code) into a separate GitHub project.

Importing relevant libraries and setting two configuration variables:

In [1]:
import time
program_start_time = time.time()
import pandas as pd
import numpy as np
from iteration_utilities import duplicates
pd.set_option('display.max_columns', 1000)
import lxml # Necessary for reading online HTML tables into Pandas

render_for_pdf = False
if render_for_pdf == True:
    pd.set_option('display.max_columns', 4)


acs5_year = 2022 # By updating this variable when future American 
# Community Surveys get released, you should be able to retrieve the most
# recent data possible. (If changes to the survey's format are made,
# however, updates to the scripts may be necessary.)

# Note: I had originally set acs5_year to 2022, the latest year for which
# ACS5 data were available at the time. However, due to a recent change
# in Connecticut's county-equivalent boundaries (see
# https://www.federalregister.gov/documents/2022/06/06/2022-12063/
# change-to-county-equivalents-in-the-state-of-connecticut for more
# information), ACS5 population growth data between previous
# years and 2022 appeared to be unavailable for that state. Therefore,
# I chose to retrieve data for 2021 instead. 

download_variable_list = False # If set to True, a new list of variables
# will be downloaded from the Census API website. If False, this list of 
# variables will instead be read in from a local .csv copy (thus saving
# processing time).

## Importing a Census API Key

You can obtain a free Census API key [at this website](https://api.census.gov/data/key_signup.html). The following cell imports my own personal key, so you'll need to replace this code with one that loads in your own API key.

In [2]:
with open ('census_api_key_path.txt') as file:
    key_path = file.read()
with open(key_path) as file:
    key = file.read()

In [3]:
if download_variable_list == True:
    df_variables_page = pd.read_html(
        f'https://api.census.gov/data/{acs5_year}/acs/acs5/variables.html')[0] 
    # [0] selects the first HTML table found on this page.
    # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
    # pandas.read_html.html
    # for more information on pd.read_html().
        
    # Some rows in this table contain items other than demographic 
    # variables (e.g. region names). We can exclude them by selecting 
    # only rows that begin with 'Estimate'. (Another option would have 
    # been to filter out rows with N/A 'Group' entries (i.e. 
    # df_variables.query("Group.isna() == False")), 
    # but this would have left a couple non-variable rows in place.
    
    df_variables = df_variables_page[
    df_variables_page['Label'].str[0:8] == 'Estimate'].copy(
    ).reset_index(drop=True)
    # Saving this table to a local .csv file:
    df_variables.to_csv(f'Datasets/{acs5_year}_variables.csv', 
    index = False)
else: # Reading a local copy of this dataset instead, which should 
    # take much less time. 
    df_variables = pd.read_csv(
        f'Datasets/{acs5_year}_variables.csv')
df_variables.head()

Unnamed: 0,Name,Label,Concept,Required,Attributes,Limit,Predicate Type,Group,Unnamed: 8
0,B01001_001E,Estimate!!Total:,Sex by Age,not required,"B01001_001EA, B01001_001M, B01001_001MA",0,int,B01001,
1,B01001_002E,Estimate!!Total:!!Male:,Sex by Age,not required,"B01001_002EA, B01001_002M, B01001_002MA",0,int,B01001,
2,B01001_003E,Estimate!!Total:!!Male:!!Under 5 years,Sex by Age,not required,"B01001_003EA, B01001_003M, B01001_003MA",0,int,B01001,
3,B01001_004E,Estimate!!Total:!!Male:!!5 to 9 years,Sex by Age,not required,"B01001_004EA, B01001_004M, B01001_004MA",0,int,B01001,
4,B01001_005E,Estimate!!Total:!!Male:!!10 to 14 years,Sex by Age,not required,"B01001_005EA, B01001_005M, B01001_005MA",0,int,B01001,


In [4]:
def create_variable_aliases(df_variables, variable_list):
    '''This function creates a dictionary whose keys are 
    the original 'Name' values (e.g. 'B001_001E') within a variable
    list on the Census API website and whose values are the replacement 
    names (e.g. 'Sex by Age_Estimate!!Total:_B01001_001E').
    This resulting dictionary can then be passed to a df.rename() call
    within retrieve_census_data() in order to make the output of that
    function easier to interpret.
    
    df_variables: A DataFrame containing a list of Census variables. For
    an example of this list for the 2021 American Community Survey (5-Year 
    Estimates), visit: 
    https://api.census.gov/data/2021/acs/acs5/examples.html .
    
    variable_list: The list of variables to rename 
    (e.g. ['B01001_001E', 'B01001_002E']).
    '''
    # Creating a DataFrame that contains the information needed for the
    # updated column names:
    df_aliases = df_variables.query(
        "Name in @variable_list")[['Name', 'Label', 'Concept']].copy()
    # Creating a new 'Description' column that will replace the original
    # output field names:
    df_aliases['Description'] = (df_aliases['Concept'] 
                                 + '_' + df_aliases['Label'] 
                                 + ' (' + df_aliases['Name'] + ')')
    # Creating a dictionary whose keys are the original field names and 
    # whose values are the new 'Description' entries that were 
    # just created:
    alias_dict = df_aliases.set_index('Name').to_dict()['Description']
    # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
    # pandas.DataFrame.to_dict.html
    return alias_dict

Creating our aliases:

## Defining a Census data retrieval function

The following function simplifies the process of retrieving data from the Census API. It also enables the user to rename variable fields (e.g. 'B01001_001E') with aliases for those fields (e.g. 'Sex by Age_Estimate!!Total: (B01001_001E)'), but this option is disabled by default. In addition, it allows more than 50 variables to be retrieved at the same time, thus making it easier to retrieve especially large datasets.

[Note: currently, this function only supports data retrieval for the ACS 5-year and 1-year estimates. However, I may add in the ability to retrieve decennial Census data in the future.]

In [5]:
def retrieve_census_data(survey, year, region, key, variable_list,
                         rename_data_fields = False, 
                         field_names_dict = {}):
    '''This function (which I plan to expand) retrieves data from the US
    Census API. It accommodates more than 50 variables.
    
    survey: the survey from which to retrieve data. The only arguments
    currently supported are 'acs5' and 'acs1' (for the American Community 
    Survey 5-Year and 1-Year estimates, respectively).
    
    year: the year for which you wish to retrieve survey data. Note that,
    When region is set to 'acs5', the survey results will include data
    for the 5 years leading up to (and including) the 'year' argument.
    (For example, if you set 'year' to 2021, you'll retrieve ACS5 data
    from 2017 to 2021 (inclusive).)
    
    
    region: The geographic level at which you wish to retrieve data. 
    Examples include 'us', 'state', 'county', 'zip', 'msa' 
    (for metropolitan/micropolitan statistical area data), and 'csa' 
    (for combined statistical area data); 
    however, other regions are supported as well. Consult your survey's 
    API examples page for other options. (For instance, if you wanted to 
    retrieve data by urban area within the 2021 ACS5, you could go to 
    https://api.census.gov/data/2021/acs/acs5/examples.html, then search
    for 'urban area.' The Urban Area URL ends with
    '&for=urban%20area:*&key=YOUR_KEY_GOES_HERE'. Therefore, you'd want to
    use 'urban%20area' as your 'region' argument.)   

    (Note: 'zip' will retrieve results by Zip Code
    Tabulation Area, which are similar to (but not identical to)
    # zip codes. See 
    # https://en.wikipedia.org/wiki/ZIP_Code_Tabulation_Area
    # for more information.
    
    variable_list: The list of variables for which to retrieve data.

    key: your personal Census API key.

    rename_data_fields: set to True to replace column names in your 
    dataset with new entries of your choice.

    field_names_dict: A dictionary that stores the original variable names
    retrieved by the Census (e.g. 'B01001_001E' as keys and your desired
    replacements as values. Example: 
    {'B01001_001E': 'Sex by Age_Estimate!!Total:_B01001_001E',
     'B01001_002E': 'Sex by Age_Estimate!!Total:!!Male:_B01001_002E'}'
     
    '''

    # Using the iteration_utilities library to check for duplicate
    # values within variable_list (which could cause issues later on):
    # The following code is based on
    # https://iteration-utilities.readthedocs.io/en/latest/generated/
    # duplicates.html
    duplicate_variables = list(duplicates(variable_list))
    
    if len(duplicate_variables) > 0:
        raise ValueError(f"The following variables appear more than once \
in your variable list: {duplicate_variables}")
    
    if survey == 'acs5':
        survey_string = 'acs/acs5'

    elif survey == 'acs1':
        survey_string = 'acs/acs1'
    
    else:
        raise ValueError("This survey type is not currently supported by \
                         the function.")

    
    # Converting simplified region names into strings that the API 
    # function will recognize:
    if region == 'zip':
        region = 'zip%20code%20tabulation%20area' # Based on
        # the ZCTA example within
        # https://api.census.gov/data/2021/acs/acs5/examples.html
    
    if region == 'csa':
        region = 'combined%20statistical%20area'
    
    if region == 'msa':
        region = 'metropolitan%20statistical\
%20area/micropolitan%20statistical%20area'

    
    # Only 50 variables can be retrieved from the Census API at a time 
    # using the approach shown in this function. The following code 
    # accommodates this limitation by splitting variable_list into 
    # sublists of up to 49 variables. The data retrieved for the variables 
    # in these sublists will then get merged back together.
    # (49 variables are retrieved at a time instead of 50 because it 
    # appears that the initial 'NAME' variable also counts towards 
    # the 50-variable limit.)
    
    i = 0
       
    while i < len(variable_list): # i.e. while there
        # are still more variables to iterate through
        variable_sublist = variable_list[i:i+49] # This line reads the 
        # next 49 variables from variable_list into a sublist that can 
        # then be\ passed to the API
        # print("variable_sublist:", variable_sublist)
        # Converting the list of variables into a string that can be 
        # passed to the API call:
        # (The Census API guide at
        # https://www.census.gov/content/dam/Census/data/developers/
        # api-user-guide/api-guide.pdf
        # demonstrates how to call multiple census variables at once.)
        variable_string = ','.join(variable_sublist)
        # print("variable_string:",variable_string)
    
        # Retrieving data via the Census API:
        # This line was originally based on an example found in
        # https://api.census.gov/data/2022/acs/acs5/examples.html .
    
        # read_json documentation:
        # https://pandas.pydata.org/pandas-docs/stable/reference/api/
        # pandas.read_json.html

        api_url = f'https://api.census.gov/data/{year}/\
{survey_string}?get=NAME,{variable_string}&for={region}:*&key={key}'
        # print(api_url)
        
        df_results = pd.read_json(api_url)
    
        # At this point, the DataFrame's columns are a list of integers; 
        # the desired column names are stored within the first row. 
        # The following code resolves this issue by setting these row 
        # values as the column values and then deleting this row.
    
        df_results.columns = df_results.iloc[0]
        df_results.drop(0, inplace = True)


        # Determining which merge keys to use when combining API results
        # for different sublists together:
        # This is made more complicated by the fact that results for 
        # different regions will have different identifier
        # columns (e.g. 'NAME', 'county', and 'state' for county data but 
        # only 'NAME' and 'state' for state data). However, we can 
        # accommodate this behavior by simply initializing our list of 
        # merge keys as the set of all columns that are *not* also 
        # variable columns.
        if i == 0: # This step only needs to be performed for our first
            # sublist of variables, since merge keys for other sublists
            # will be identical.
            merge_keys = list(set(df_results.columns) 
              - set(variable_sublist))
            # print("merge_keys:",merge_keys)

        if i == 0: # Since this is the first set 
            # of results, we can initialize df_combined_results 
            # as a copy of df_results.
            df_combined_results = df_results.copy()
        else: # Merging our latest set of results into df_results:
            df_combined_results = df_combined_results.merge(
                df_results, on = merge_keys,
                how = 'outer').copy()
            # Added .copy() here in response to a data fragmentation 
        # warning

        i += 49 
        # Allows the function to iterate through the next 49 variables
        # within variable_list

        
    # Converting variable columns to numeric data types:
    for column in variable_list:
        # print(f"Now converting {column} to a numeric type.")
        df_combined_results[column] = pd.to_numeric(
            df_combined_results[column])
        # pd.to_numeric() allows for either integer or float outputs
        # depending on the nature of the original data.
        # See https://pandas.pydata.org/pandas-docs/stable/reference/api/
        # pandas.to_numeric.html

    # Replacing column names with aliases if requested:
    if rename_data_fields == True:
        df_combined_results.rename(
            columns = field_names_dict, inplace = True)

    # The following for loop moves all of the merge keys (e.g. geographic
    # identifiers) to the left side of the table. This is particularly
    # useful when retrieving longer lists of variables, as otherwise,
    # certain keys can get buried in the middle of the dataset
    for i in range(len(merge_keys)):
        df_combined_results.insert(
            i, merge_keys[i], 
            df_combined_results.pop(merge_keys[i]))

    # Adding a 'Year' column to the left of all existing DataFrame columns:
    # (this will prove particularly
    # helpful when comparing data from different years.)
    df_combined_results.insert(0, 'Year', year)
    
    return df_combined_results

(The following code allowed me to test out retrieve_census_data for a particularly long variable list.)

Next, we'll define a list of years for which we would like to retrieve Census data. In order to make this code easier to use in future years, I'll define these years as an offset of acs5_year rather than hardcoding them.

## Retrieving data on marriage and poverty


In [6]:
marriage_poverty_variable_list = [
    'B01001_001E', 'B17010_001E', 'B17010_003E', 'B17010_004E',
    'B17010_011E', 'B17010_016E', 'B17010_017E', 'B17010_023E',
    'B17010_024E', 'B17010_031E', 'B17010_036E', 'B17010_037E',
    'B11003_001E', 'B11003_002E', 'B11003_003E', 'B11004_001E',
    'B11004_002E', 'B11004_003E', 'B17017_002E', 'B17017_004E',
    'B17017_015E', 'B17017_009E', 'B17017_020E', 'B17017_031E',
    'B17017_033E', 'B17017_038E', 'B17017_044E', 'B17017_049E'
]

marriage_poverty_alias_dict = create_variable_aliases(
    df_variables = df_variables, 
    variable_list = marriage_poverty_variable_list)
# marriage_poverty_alias_dict

In [7]:
df_marriage_poverty_acs5_data = retrieve_census_data(
    survey = 'acs5', year = acs5_year, region = 'county',
    variable_list = marriage_poverty_variable_list, 
    rename_data_fields = True, 
    field_names_dict = marriage_poverty_alias_dict, key = key)

# Showing a shortened version of this DataFrame if render_for_pdf
# is set to True so as to prevent its text from getting cut off:

In [8]:
if render_for_pdf == True:
    pd.set_option('display.max_columns', 3)


df_marriage_poverty_acs5_data.head()

Unnamed: 0,Year,county,NAME,state,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E)
1,2022,1,"Autauga County, Alabama",1,58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827
2,2022,3,"Baldwin County, Alabama",1,233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523
3,2022,5,"Barbour County, Alabama",1,24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318
4,2022,7,"Bibb County, Alabama",1,22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570
5,2022,9,"Blount County, Alabama",1,59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310


In [9]:
# Allowing for larger number of columns to get displayed within
# subsequent DataFrame displays:
if render_for_pdf == True:
    pd.set_option('display.max_columns', 4)

## Performing additional calculations

The following cell uses fields within df_marriage_poverty_acs5_data to calculate poverty rates for:

1. Married-couple households
2. Non-married-couple households
3. Households with 1+ kids below 18 headed by a married couple
4. Households with 1+ kids below 18 *not* headed by a married couple

In addition, it will also calculate differences in poverty rates between:
1. Non-married and married couple households
2. Non-married households with 1+ kids below 18 and married households with 1+ kids below 18

In [10]:
df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level'] = (df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months by Household Type by \
Age of Householder_Estimate!!Total:!!Income in the past 12 months \
below poverty level: (B17017_002E)'] 
- df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'])

df_marriage_poverty_acs5_data['Non-married-couple households at or above \
poverty level'] = (df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months by Household Type \
by Age of Householder_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level: (B17017_031E)'
] 
- df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'])

df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
below poverty level'] = (df_marriage_poverty_acs5_data['Poverty Status \
in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months below poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_011E)'] 
+ df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
below poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_017E)'])

df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
at or above poverty level'] = (df_marriage_poverty_acs5_data['Poverty \
Status in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months at or above poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_031E)'] 
+ df_marriage_poverty_acs5_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_037E)'])

df_marriage_poverty_acs5_data[
'% of Married Households Below Poverty Level'] = 100 * (
    df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'] / 
    (df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'] 
    + df_marriage_poverty_acs5_data['Poverty Status in the Past 12 \
Months by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)']))


df_marriage_poverty_acs5_data[
'% of Non-Married Households Below Poverty Level'] = 100 * (
    df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level'] / 
    (df_marriage_poverty_acs5_data['Non-married-couple households \
at or above poverty level'] 
+ df_marriage_poverty_acs5_data['Non-married-couple households below \
poverty level']))


df_marriage_poverty_acs5_data['% of Married Households With \
1+ Kids Below Poverty Level'] = 100* (
df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)'] / 
(df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children Under \
18 Years by Age of Related Children_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_024E)'] 
+ df_marriage_poverty_acs5_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)']))

df_marriage_poverty_acs5_data[
'% of Non-Married Households With 1+ Kids Below Poverty Level'] = 100 * (
df_marriage_poverty_acs5_data[
'Non-married households with 1+ kids below poverty level'] / (
df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
below poverty level'] +
df_marriage_poverty_acs5_data['Non-married households with 1+ kids \
at or above poverty level']))

# Creating columns that show the difference in poverty rates between
# married and non-married households:

df_marriage_poverty_acs5_data['Non-Married/Married Household \
Poverty Rate Difference'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households Below Poverty Level'] 
    - df_marriage_poverty_acs5_data[
    '% of Married Households Below Poverty Level'])

df_marriage_poverty_acs5_data['Non-Married Household With 1+ Kids/\
Married Household With 1+ Kids Poverty Rate Difference'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households With 1+ Kids Below Poverty Level'] 
    - df_marriage_poverty_acs5_data[
    '% of Married Households With 1+ Kids Below Poverty Level'])

# Creating similar columns that show the *ratio* between these two rates:
# (This approach can better adjust for differences in overall poverty 
# rates across counties, but it does have one shortcoming that we'll 
# discuss later in this cell.)

df_marriage_poverty_acs5_data['Non-Married/Married Household \
Poverty Rate Ratio'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households Below Poverty Level'] 
    / df_marriage_poverty_acs5_data[
    '% of Married Households Below Poverty Level'])

df_marriage_poverty_acs5_data['Non-Married Household With 1+ Kids/\
Married Household With 1+ Kids Poverty Rate Ratio'] = (
    df_marriage_poverty_acs5_data[
    '% of Non-Married Households With 1+ Kids Below Poverty Level'] 
    / df_marriage_poverty_acs5_data['% of Married Households \
With 1+ Kids Below Poverty Level'])

# It appears that dividing by 0 within a DataFrame operation produces
# inf values, which may cause issues during subsequent calculations. 
# Replacing inf values created by the previous operation with NaNs:

df_marriage_poverty_acs5_data.replace(np.inf, np.nan, inplace = True)

df_marriage_poverty_acs5_data.head()

Unnamed: 0,Year,county,NAME,state,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E),Non-married-couple households below poverty level,Non-married-couple households at or above poverty level,Non-married households with 1+ kids below poverty level,Non-married households with 1+ kids at or above poverty level,% of Married Households Below Poverty Level,% of Non-Married Households Below Poverty Level,% of Married Households With 1+ Kids Below Poverty Level,% of Non-Married Households With 1+ Kids Below Poverty Level,Non-Married/Married Household Poverty Rate Difference,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Difference,Non-Married/Married Household Poverty Rate Ratio,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Ratio
1,2022,1,"Autauga County, Alabama",1,58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827,1689,8730,443,1492,5.946673,16.210769,6.111957,22.894057,10.264095,16.782099,2.726023,3.745782
2,2022,3,"Baldwin County, Alabama",1,233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523,8475,31308,2040,4355,3.6065,21.303069,5.10798,31.899922,17.69657,26.791942,5.906855,6.245115
3,2022,5,"Barbour County, Alabama",1,24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318,1933,3652,752,707,6.878461,34.610564,9.26672,51.542152,27.732103,42.275432,5.031731,5.562071
4,2022,7,"Bibb County, Alabama",1,22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570,1343,2256,495,438,6.248272,37.315921,11.937984,53.054662,31.067649,41.116678,5.972199,4.444189
5,2022,9,"Blount County, Alabama",1,59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310,2574,6986,463,1577,7.417537,26.924686,10.776292,22.696078,19.507149,11.919786,3.629869,2.106112


## Reviewing our marriage and poverty data:

We can use df.describe() to take a quick look at the poverty rate columns that we just calculated. The 50% row is particularly helpful, as it shows the median results in our dataset.

The 2022 copy of these results showed that the median poverty rate by county for married couples was 4.76% compared to 22.04% for non-married couples. Meanwhile, married-couple households with kids had a median poverty rate of 6.17% compared to 31.77% for non-married couple households with kids. Therefore, although we can't determine the *direction* of causation within the marriage/poverty relationship using this data alone, it's evident that married households tend to have lower poverty rates than do non-married households.

We'll create choropleth maps that illustrate some of these data points within the Mapping section of Python for Nonprofits.

In [11]:
df_marriage_poverty_acs5_data[['% of Married Households Below \
Poverty Level',
       '% of Non-Married Households Below Poverty Level',
       '% of Married Households With 1+ Kids Below Poverty Level',
       '% of Non-Married Households With 1+ Kids Below Poverty Level',
       'Non-Married/Married Household Poverty Rate Difference',
       'Non-Married Household With 1+ Kids/Married Household \
With 1+ Kids Poverty Rate Difference',
       'Non-Married/Married Household Poverty Rate Ratio',
       'Non-Married Household With 1+ Kids/Married Household \
With 1+ Kids Poverty Rate Ratio']].describe()

Unnamed: 0,% of Married Households Below Poverty Level,% of Non-Married Households Below Poverty Level,% of Married Households With 1+ Kids Below Poverty Level,% of Non-Married Households With 1+ Kids Below Poverty Level,Non-Married/Married Household Poverty Rate Difference,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Difference,Non-Married/Married Household Poverty Rate Ratio,Non-Married Household With 1+ Kids/Married Household With 1+ Kids Poverty Rate Ratio
count,3222.0,3222.0,3221.0,3218.0,3222.0,3218.0,3205.0,3125.0
mean,5.98573,23.797786,7.926897,33.4297,17.812056,25.495413,5.394284,6.959111
std,5.168185,9.3747,7.177521,14.799942,6.576608,13.033753,5.982786,9.73897
min,0.0,0.0,0.0,0.0,-12.225705,-52.0,0.527273,0.0
25%,3.203336,17.453112,3.649704,23.834888,13.324396,17.867021,3.436623,3.299141
50%,4.764387,22.0414,6.170981,31.771598,16.947933,24.481687,4.631352,4.805861
75%,6.908005,28.248617,9.789157,41.17132,21.537258,32.149544,6.12061,7.269667
max,52.100457,72.039474,67.663551,100.0,49.85537,100.0,202.401575,158.195489


Saving this dataset as a .csv file:

In [12]:
df_marriage_poverty_acs5_data.to_csv(
    f'Datasets/marriage_poverty_acs5_data_{acs5_year}.csv', 
    index = False)

I included this approach in the appendix because you may find the requests library useful for other online data retrieval tasks. However, our use of `pd.read_json()` to import Census data rendered an explicit call to the requests library unnecessary.

In [13]:
program_end_time = time.time()
run_time = round(program_end_time - program_start_time, 3)
print(f"Finished running script in {run_time} seconds.")

Finished running script in 2.119 seconds.
