# Census Data Imports (Work in Progress)

By Kenneth Burchfiel

Released under the MIT License

The US Census is a fantastic source of free demographic data. Thankfully, we can easily access large amounts of this data at once via Python.

Let's say that some NVCU graduating seniors are interested in settling down and raising a family a few years after they graduate. Therefore, they'd like to know what parts of the US have especially high percentages of married families with kids. However, they'd also like to avoid having to spend too much money on a home, so they're interested in regions with relatively low property costs. And finally, because they'd prefer to live in a growing region rather than a declining one, they want to know which areas have the highest 5-year population growth rates.

In order to answer these questions, we'll use the Census API to retrieve marriage, home price, and population growth data for all US counties. We'll then combine this data together in order to create a 'Fit Index' that we can share with these seniors.

In [1]:
mylist = [20, 30, 25, 20, 20]
from iteration_utilities import duplicates
list(duplicates(mylist))

[20, 20]

In [2]:
import pandas as pd
from iteration_utilities import duplicates
pd.set_option('display.max_columns', 1000)
import lxml # Necessary for reading online HTML tables into Pandas

latest_acs5_year = 2022 # By updating this variable when future American 
# Community Surveys get released, you should be able to retrieve the most
# recent data possible. (If changes to the survey's format are made,
# however, updates to the scripts may be necessary.)
download_variable_list = False # If set to True, a new list of variables
# will be downloaded from the Census API website. If False, this list of 
# variables will instead be read in from a local .csv copy (thus saving
# processing time).

# Part 1: Introducing the Census API

## Importing a Census API Key

You can obtain a free Census API key [at this website](https://api.census.gov/data/key_signup.html). The following cell imports my own personal key, so you'll need to replace this code with one that loads in your own API key.

In [3]:
with open ('census_api_key_path.txt') as file:
    key_path = file.read()
with open(key_path) as file:
    key = file.read()

Note: There is also a 'census' Python library (available via pypi and conda) that helps simplify the process of requesting API data. You may choose to use it for your own Census research, but I ended up not needing it for the data retrieval tasks shown below. In addition, foregoing the library allowed me to demonstrate how to retrieve data directly from an API, which you may find helpful when working with APIs that don't have a corresponding Python library. 

In [4]:
## Example of reading data from the Census library into a Pandas DataFrame:
from census import Census
c = Census(key)
pd.DataFrame(c.acs5.get(('NAME', 'B01001_001E'),
{'for': 'county:*'}))

Unnamed: 0,NAME,B01001_001E,state,county
0,"Autauga County, Alabama",58761.0,01,001
1,"Baldwin County, Alabama",233420.0,01,003
2,"Barbour County, Alabama",24877.0,01,005
3,"Bibb County, Alabama",22251.0,01,007
4,"Blount County, Alabama",59077.0,01,009
...,...,...,...,...
3217,"Vega Baja Municipio, Puerto Rico",54182.0,72,145
3218,"Vieques Municipio, Puerto Rico",8199.0,72,147
3219,"Villalba Municipio, Puerto Rico",21984.0,72,149
3220,"Yabucoa Municipio, Puerto Rico",30313.0,72,151


As discussed above, we will bypass the Census Python library and work directly with the Census API. Thankfully, the Census has detailed API documentation that makes this process relatively straightforward.

[This list](https://api.census.gov/data/2022/acs/acs5/examples.html) (for the 2022 American Community Survey) provides a set of API call examples. For instance, here's one of the sample URLs shown on this page for retrieving county-level data:

https://api.census.gov/data/2022/acs/acs5?get=NAME,B01001_001E&for=county:*&key=YOUR_KEY_GOES_HERE

(B01001_001E refers to the total population estimates for a given county. We can find this out by going to the [Detailed Tables](https://api.census.gov/data/2022/acs/acs5/variables.html) page for the 2022 American Community Survey and navigating to the row with a 'Name' value of B01001_001E. This link, which may take a little while to load, is available on the [API Documentation Page](https://www.census.gov/data/developers/data-sets/acs-5year.html) for the 5-Year-Estimates version of the American Community Survey.

If you replace the 'YOUR_KEY_GOES_HERE' component of the URL with your actual key, then enter this link into your web browser, you'll receive a very long list of counties, population values, and state and county codes. The top of the list looks like this:

```
[["NAME","B01001_001E","state","county"],
["Autauga County, Alabama","58761","01","001"],
["Baldwin County, Alabama","233420","01","003"],
["Barbour County, Alabama","24877","01","005"],
["Bibb County, Alabama","22251","01","007"],
["Blount County, Alabama","59077","01","009"],
["Bullock County, Alabama","10328","01","011"],
```

We can create a similar list within Python by using the requests library to retrieve data from this URL, then convert it to JSON format:

In [5]:
# The following code borrows from the requests library documentation at 
# https://docs.python-requests.org/en/latest/index.html
import requests
r = requests.get(f'https://api.census.gov/data/2022/\
acs/acs5?get=NAME,B01001_001E&for=county:*&key={key}')
# Printing the first 300 characters of this output:
print("r.text:\n",r.text[0:300],'\n')
# Printing the first 5 lines of r.json:
print("r.json:\n",r.json()[0:5],'\n')

r.text:
 [["NAME","B01001_001E","state","county"],
["Autauga County, Alabama","58761","01","001"],
["Baldwin County, Alabama","233420","01","003"],
["Barbour County, Alabama","24877","01","005"],
["Bibb County, Alabama","22251","01","007"],
["Blount County, Alabama","59077","01","009"],
["Bullock County, Ala 

r.json:
 [['NAME', 'B01001_001E', 'state', 'county'], ['Autauga County, Alabama', '58761', '01', '001'], ['Baldwin County, Alabama', '233420', '01', '003'], ['Barbour County, Alabama', '24877', '01', '005'], ['Bibb County, Alabama', '22251', '01', '007']] 



Converting our response to JSON allows it to be easily read into a Pandas DataFrame, as shown below:

In [6]:
# pd.DataFrame(r.text) # Produces the following error:
# "ValueError: DataFrame constructor not properly called!"
pd.DataFrame(r.json()).head()

Unnamed: 0,0,1,2,3
0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,01,001
2,"Baldwin County, Alabama",233420,01,003
3,"Barbour County, Alabama",24877,01,005
4,"Bibb County, Alabama",22251,01,007


I wanted to introduce the requests library here because you'll likely find it useful for extracting data from the internet. However, in this case, we can read Census data directly into a Pandas DataFrame via pd.read_json():

In [7]:
df_results = pd.read_json(
    f'https://api.census.gov/data/2022/\
acs/acs5?get=NAME,B01001_001E&for=county:*&key={key}')
# read_json documentation:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
df_results.head()

Unnamed: 0,0,1,2,3
0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,01,001
2,"Baldwin County, Alabama",233420,01,003
3,"Barbour County, Alabama",24877,01,005
4,"Bibb County, Alabama",22251,01,007


In [8]:
# At this point, the DataFrame's columns are [0, 1, 2, 3], whereas the columns
# we want to use are stored within the first row. The following code 
# sets these row values as our column values, then deletes this row:

df_results.columns = df_results.iloc[0]
df_results.drop(0, inplace = True)
df_results

df_results

Unnamed: 0,NAME,B01001_001E,state,county
1,"Autauga County, Alabama",58761,01,001
2,"Baldwin County, Alabama",233420,01,003
3,"Barbour County, Alabama",24877,01,005
4,"Bibb County, Alabama",22251,01,007
5,"Blount County, Alabama",59077,01,009
...,...,...,...,...
3218,"Vega Baja Municipio, Puerto Rico",54182,72,145
3219,"Vieques Municipio, Puerto Rico",8199,72,147
3220,"Villalba Municipio, Puerto Rico",21984,72,149
3221,"Yabucoa Municipio, Puerto Rico",30313,72,151


# Part 2: Retrieving our Data

In order to determine which variable codes to enter into our script, we'll first need to review a list of the variables and the overall groups into which they fit. You can find the 2022 variable names on this site, but we can also use Pandas to import them into a DataFrame as shown below.

In [9]:
if download_variable_list == True:
    df_variables_page = pd.read_html(
        'https://api.census.gov/data/2022/acs/acs5/variables.html')[0] 
    # Some rows in this table contain items other than demographic variables 
    # (e.g. region names). We can exclude them by selecting only rows 
    # that begin with 'Estimate'. (Another option would have been to filter 
    # out rows with N/A 'Group' entries (i.e. 
    # df_variables.query("Group.isna() == False")), 
    # but this would have left a couple non-variable rows in place.
    # [0] selects the first HTML table found on this page.
    df_variables = df_variables_page[
    df_variables_page['Label'].str[0:8] == 'Estimate'].copy(
    ).reset_index(drop=True)
    # Saving this table to a local .csv file:
    df_variables.to_csv('Datasets/2022_variables.csv', index = False)
else: # Reading a local copy of this dataset instead, which will probably
    # take much less time. 
    df_variables = pd.read_csv('Datasets/2022_variables.csv')
df_variables

Unnamed: 0,Name,Label,Concept,Required,Attributes,Limit,Predicate Type,Group,Unnamed: 8
0,B01001_001E,Estimate!!Total:,Sex by Age,not required,"B01001_001EA, B01001_001M, B01001_001MA",0,int,B01001,
1,B01001_002E,Estimate!!Total:!!Male:,Sex by Age,not required,"B01001_002EA, B01001_002M, B01001_002MA",0,int,B01001,
2,B01001_003E,Estimate!!Total:!!Male:!!Under 5 years,Sex by Age,not required,"B01001_003EA, B01001_003M, B01001_003MA",0,int,B01001,
3,B01001_004E,Estimate!!Total:!!Male:!!5 to 9 years,Sex by Age,not required,"B01001_004EA, B01001_004M, B01001_004MA",0,int,B01001,
4,B01001_005E,Estimate!!Total:!!Male:!!10 to 14 years,Sex by Age,not required,"B01001_005EA, B01001_005M, B01001_005MA",0,int,B01001,
...,...,...,...,...,...,...,...,...,...
28147,C27021_011E,Estimate!!Total:!!In family households:!!In ot...,Health Insurance Coverage Status by Living Arr...,not required,"C27021_011EA, C27021_011M, C27021_011MA",0,int,C27021,
28148,C27021_012E,Estimate!!Total:!!In family households:!!In ot...,Health Insurance Coverage Status by Living Arr...,not required,"C27021_012EA, C27021_012M, C27021_012MA",0,int,C27021,
28149,C27021_013E,Estimate!!Total:!!In non-family households and...,Health Insurance Coverage Status by Living Arr...,not required,"C27021_013EA, C27021_013M, C27021_013MA",0,int,C27021,
28150,C27021_014E,Estimate!!Total:!!In non-family households and...,Health Insurance Coverage Status by Living Arr...,not required,"C27021_014EA, C27021_014M, C27021_014MA",0,int,C27021,


With over 28,000 individual variables, it could take a very long time to identify the items you'd like to retrieve from the Census. We can make this search process somewhat easier by creating a separate *groups* table that shows only unique group names, along with their written descriptions (e.g. 'Sex by Age' and corresponding attributes).

In [10]:
df_groups = df_variables.drop_duplicates(
    'Group')[['Concept', 'Attributes', 'Group']].copy(
    ).reset_index(drop=True)
df_groups

Unnamed: 0,Concept,Attributes,Group
0,Sex by Age,"B01001_001EA, B01001_001M, B01001_001MA",B01001
1,Sex by Age (White Alone),"B01001A_001EA, B01001A_001M, B01001A_001MA",B01001A
2,Sex by Age (Black or African American Alone),"B01001B_001EA, B01001B_001M, B01001B_001MA",B01001B
3,Sex by Age (American Indian and Alaska Native ...,"B01001C_001EA, B01001C_001M, B01001C_001MA",B01001C
4,Sex by Age (Asian Alone),"B01001D_001EA, B01001D_001M, B01001D_001MA",B01001D
...,...,...,...
1177,Public Health Insurance by Work Experience,"C27014_001EA, C27014_001M, C27014_001MA",C27014
1178,Health Insurance Coverage Status by Ratio of I...,"C27016_001EA, C27016_001M, C27016_001MA",C27016
1179,Private Health Insurance by Ratio of Income to...,"C27017_001EA, C27017_001M, C27017_001MA",C27017
1180,Public Health Insurance by Ratio of Income to ...,"C27018_001EA, C27018_001M, C27018_001MA",C27018


We'll save this group table to a local .csv file as well:

In [11]:
df_groups.to_csv('Datasets/2022_groups.csv', index = False)

In order to find variables of interest, I recommend first searching for keywords of interest within the group table (which is much smaller in size) in order to identify relevant group IDs. Next, you can search for those group IDs inside the variables table in order to find the exact metrics to request from the Census API.

In [12]:
variable_list = ['B01001_001E', 'B01001_002E']

The demographic columns in the Census API's output are labeled with their variable names (e.g. 'B01001_001E'). These names are concise, but you'll need a copy of the original variable list to interpret them. Therefore, I chose to replace these column names with a combination of the 'Label', 'Concept', and 'Name' entries in the original variable list. These column names are very long, but they do make the output easier to interpret (while also preserving the original names for reference).

The following function assists with this replacement by creating a dictionary whose keys are the original field names (e.g. 'B001_001E') and whose values are the replacement names (e.g. 'Sex by Age_Estimate!!Total:_B01001_001E').

In [13]:
def create_variable_aliases(df_variables, variable_list):
    '''This function creates a dictionary whose keys are 
    the original 'Name' values (e.g. 'B001_001E') within a variable
    list on the Census API website and whose values are the replacement 
    names (e.g. 'Sex by Age_Estimate!!Total:_B01001_001E').
    This resulting dictionary can then be passed to a df.rename() call
    within retrieve_census_data() in order to make the output of that
    function easier to interpret.
    df_variables: A DataFrame containing a list of Census variables. For
    an example of this list, visit
    https://api.census.gov/data/2022/acs/acs5/examples.html .
    variable_list: The list of variables to rename 
    (e.g. ['B01001_001E', 'B01001_002E']).
    '''
    # Creating a DataFrame that contains the information needed for the
    # updated column names:
    df_aliases = df_variables.query(
        "Name in @variable_list")[['Name', 'Label', 'Concept']].copy()
    # Creating a new 'Description' column that will replace the original
    # output field names:
    df_aliases['Description'] = (df_aliases['Concept'] 
                                 + '_' + df_aliases['Label'] 
                                 + ' (' + df_aliases['Name'] + ')')
    # Creating a dictionary whose keys are the original field names and whose
    # values are the new 'Description' entries that were just created:
    alias_dict = df_aliases.set_index('Name').to_dict()['Description']
    # See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html
    return alias_dict

In [14]:
alias_dict = create_variable_aliases(df_variables = df_variables, 
                        variable_list = variable_list)

In [15]:
def retrieve_census_data(dataset, year, region, key, variable_list,
                        rename_data_fields = False, field_names_dict = {}):
    '''This function (which I plan to expand) retrieves data from the US
    Census API.
    
    dataset: the dataset from which to retrieve data. The only argument
    currently supported is 'acs5' (for the American Community Survey
    (5-Year Estimates), but this list will be expanded in the future.
    
    Year: the year of data to retrieve.
    
    region: the type of data to retrieve. This function can currently
    only retrieve county data, but this will be expanded also.
    
    variable_list: The list of variables for which to retrieve data.

    key: your personal Census API key.

    rename_data_fields: set to True to replace column names in your dataset
    with new entries of your choice.

    field_names_dict: A dictionary that stores the original variable names
    retrieved by the Census (e.g. 'B01001_001E' as keys and your desired
    replacements as values. Example: 
    {'B01001_001E': 'Sex by Age_Estimate!!Total:_B01001_001E',
     'B01001_002E': 'Sex by Age_Estimate!!Total:!!Male:_B01001_002E'}'
     
    '''

    # Using the iteration_utilities library to check for duplicate
    # values within variable_list (which could cause issues later on):
    # The following code is based on
    # https://iteration-utilities.readthedocs.io/en/latest/generated/duplicates.html
    duplicate_variables = list(duplicates(variable_list))
    
    if len(duplicate_variables) > 0:
        raise ValueError(f"The following variables appear more than once \
in your variable list: {duplicate_variables}")
    
    
    if dataset == 'acs5':
        dataset_string = 'acs/acs5'
    # Add other dataset options in here
    else:
        raise ValueError("This dataset type is not yet supported by \
                         the function.")
    # Converting the list of variables into a string that can be 
    # passed to the API call:
    # (The Census API guide at
    # https://www.census.gov/content/dam/Census/data/developers/api-user-guide/api-guide.pdf
    # demonstrates how to call multiple census variables at once.)
    variable_string = ','.join(variable_list)
    
    # Retrieving data via the Census API:
    # This line was originally based on an example found in
    # https://api.census.gov/data/2022/acs/acs5/examples.html .

    # read_json documentation:
    # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
    
    df_results = pd.read_json(
        f'https://api.census.gov/data/{year}/\
{dataset_string}?get=NAME,{variable_string}&for={region}:*&key={key}')

    # At this point, the DataFrame's columns are a list of integers; the desired
    # column names are stored within the first row. The following code 
    # resolves this issue by setting these row values as the column values and
    # then deleting this row.

    df_results.columns = df_results.iloc[0]
    df_results.drop(0, inplace = True)

    # Converting variable columns to numeric data types:
    for column in variable_list:
        # print(f"Now converting {column} to a numeric type.")
        df_results[column] = pd.to_numeric(df_results[column])
        # pd.to_numeric() allows for either integer or float outputs
        # depending on the nature of the original data.
        # See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html

    if rename_data_fields == True:
        df_results.rename(columns = field_names_dict, inplace = True)

    return df_results

In [16]:
df_census_data = retrieve_census_data(dataset = 'acs5', year = 2022, region = 'county',
                    variable_list = ['B01001_001E', 'B01001_002E'], 
                     rename_data_fields = True, field_names_dict = alias_dict,
                     key = key)

df_census_data

Unnamed: 0,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Sex by Age_Estimate!!Total:!!Male: (B01001_002E),state,county
1,"Autauga County, Alabama",58761,28663,01,001
2,"Baldwin County, Alabama",233420,114077,01,003
3,"Barbour County, Alabama",24877,12973,01,005
4,"Bibb County, Alabama",22251,11897,01,007
5,"Blount County, Alabama",59077,29864,01,009
...,...,...,...,...,...
3218,"Vega Baja Municipio, Puerto Rico",54182,25849,72,145
3219,"Vieques Municipio, Puerto Rico",8199,4179,72,147
3220,"Villalba Municipio, Puerto Rico",21984,10634,72,149
3221,"Yabucoa Municipio, Puerto Rico",30313,14624,72,151


In [17]:
df_census_data.to_csv('Datasets/census_data.csv', index = False)

## Retrieving additional data for a comparison of poverty rates by marital status


In [18]:
marriage_poverty_variable_list = [
    'B01001_001E',
    'B17010_001E',
    'B17010_003E',
    'B17010_004E',
    'B17010_011E',
    'B17010_016E',
    'B17010_017E',
    'B17010_023E',
    'B17010_024E',
    'B17010_031E',
    'B17010_036E',
    'B17010_037E',
    'B11003_001E',
    'B11003_002E',
    'B11003_003E',
    'B11004_001E',
    'B11004_002E',
    'B11004_003E',
    'B17017_002E',
    'B17017_004E',
    'B17017_015E',
    'B17017_009E',
    'B17017_020E',
    'B17017_031E',
    'B17017_033E',
    'B17017_038E',
    'B17017_044E',
    'B17017_049E'
]

# Note: at least within B17017 ('Poverty Status in the Past 12 Months 
# by Household Type by Age of Householder'), every household appears
# to fall into one of 3 categories: (1) a married-couple family household; 
# (2) a non-married-couple family household (classified as 'other family');
# or (3) a non-family household. Therefore, in order to determine 
# the correlation between marriage and household poverty rates,
# we can compare the percentage of married couple households and
# non-married couple households below the poverty level (with the latter
# defined as non-married-couple family households plus 
# non-family households).



marriage_poverty_alias_dict = create_variable_aliases(
    df_variables = df_variables, 
    variable_list = marriage_poverty_variable_list)
marriage_poverty_alias_dict

{'B01001_001E': 'Sex by Age_Estimate!!Total: (B01001_001E)',
 'B11003_001E': 'Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E)',
 'B11003_002E': 'Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E)',
 'B11003_003E': 'Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E)',
 'B11004_001E': 'Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E)',
 'B11004_002E': 'Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E)',
 'B11004_003E': 'Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E)',
 'B17010_001E': 'Poverty Status in the Pa

In [19]:
df_marriage_poverty_census_data = retrieve_census_data(
    dataset = 'acs5', year = 2022, region = 'county',
    variable_list = marriage_poverty_variable_list, 
    rename_data_fields = True, 
    field_names_dict = marriage_poverty_alias_dict, key = key)

df_marriage_poverty_census_data.head()

Unnamed: 0,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E),state,county
1,"Autauga County, Alabama",58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827,1,1
2,"Baldwin County, Alabama",233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523,1,3
3,"Barbour County, Alabama",24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318,1,5
4,"Bibb County, Alabama",22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570,1,7
5,"Blount County, Alabama",59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310,1,9


In [20]:
# df_marriage_poverty_census_data.columns

The following cell uses certain columns retrieved from this dataset to calculate poverty rates for:

1. Married-couple households
2. Non-married-couple households
3. Households with 1+ kids below 18 headed by a married couple
4. Households with 1+ kids below 18 *not* headed by a married couple

In addition, it will also calculate differences in poverty rates between:
1. Non-married and married couple households
2. Non-married households with 1+ kids below 18 and married households with 1+ kids below 18

In [21]:
df_marriage_poverty_census_data['Non-married-couple households below \
poverty level'] = (df_marriage_poverty_census_data[
'Poverty Status in the Past 12 Months by Household Type by \
Age of Householder_Estimate!!Total:!!Income in the past 12 months \
below poverty level: (B17017_002E)'] 
- df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'])

df_marriage_poverty_census_data['Non-married-couple households at or above \
poverty level'] = (df_marriage_poverty_census_data[
'Poverty Status in the Past 12 Months by Household Type \
by Age of Householder_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level: (B17017_031E)'
] 
- df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'])

df_marriage_poverty_census_data['Non-married households with 1+ kids \
below poverty level'] = (df_marriage_poverty_census_data['Poverty Status \
in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months below poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_011E)'] 
+ df_marriage_poverty_census_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
below poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_017E)'])

df_marriage_poverty_census_data['Non-married households with 1+ kids \
at or above poverty level'] = (df_marriage_poverty_census_data['Poverty Status \
in the Past 12 Months of Families by Family Type by Presence \
of Related Children Under 18 Years by Age of Related Children_Estimate!!\
Total:!!Income in the past 12 months at or above poverty level:!!Other \
family:!!Male householder, no spouse present:!!With related \
children of the householder under 18 years: (B17010_031E)'] 
+ df_marriage_poverty_census_data[
'Poverty Status in the Past 12 Months of Families by Family Type \
by Presence of Related Children Under 18 Years by Age of \
Related Children_Estimate!!Total:!!Income in the past 12 months \
at or above poverty level:!!Other family:!!Female householder, \
no spouse present:!!With related children \
of the householder under 18 years: (B17010_037E)'])

df_marriage_poverty_census_data['Married Couple Household Poverty Rate'] = (
    df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)'] / 
    (df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!\
Family households:!!Married-couple family: (B17017_033E)'] 
    + df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
by Household Type by Age of Householder_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Family \
households:!!Married-couple family: (B17017_004E)']))


df_marriage_poverty_census_data[
'Non-Married Household Poverty Rate'] = (
    df_marriage_poverty_census_data['Non-married-couple households below \
poverty level'] / 
    (df_marriage_poverty_census_data['Non-married-couple households \
at or above poverty level'] 
+ df_marriage_poverty_census_data['Non-married-couple households below \
poverty level']))


df_marriage_poverty_census_data['Married Couple With 1+ Kids Poverty Rate'] = (
df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)'] / 
(df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children Under \
18 Years by Age of Related Children_Estimate!!Total:!!Income in the \
past 12 months at or above poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_024E)'] 
+ df_marriage_poverty_census_data['Poverty Status in the Past 12 Months \
of Families by Family Type by Presence of Related Children \
Under 18 Years by Age of Related Children_Estimate!!Total:!!Income \
in the past 12 months below poverty level:!!Married-couple \
family:!!With related children of the householder \
under 18 years: (B17010_004E)']))

df_marriage_poverty_census_data[
'Non-Married Household With 1+ Kids Poverty Rate'] = (
df_marriage_poverty_census_data[
'Non-married households with 1+ kids below poverty level'] / (
df_marriage_poverty_census_data['Non-married households with 1+ kids \
below poverty level'] +
df_marriage_poverty_census_data['Non-married households with 1+ kids \
at or above poverty level']))

df_marriage_poverty_census_data['Non-Married/Married Household \
Poverty Rate Difference'] = (
    df_marriage_poverty_census_data['Non-Married Household Poverty Rate'] 
    - df_marriage_poverty_census_data['Married Couple Household Poverty Rate'])

df_marriage_poverty_census_data['Non-Married Household With 1+ Kids/Married\
Household With 1+ Kids Poverty Rate Difference'] = (
    df_marriage_poverty_census_data['Non-Married Household With 1+ Kids Poverty Rate'] 
    - df_marriage_poverty_census_data['Married Couple With 1+ Kids Poverty Rate'])


df_marriage_poverty_census_data

Unnamed: 0,NAME,Sex by Age_Estimate!!Total: (B01001_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total: (B17010_001E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family: (B17010_003E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_004E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_011E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present: (B17010_016E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_017E)",Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family: (B17010_023E),Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Married-couple family:!!With related children of the householder under 18 years: (B17010_024E),"Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Male householder, no spouse present:!!With related children of the householder under 18 years: (B17010_031E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present: (B17010_036E)","Poverty Status in the Past 12 Months of Families by Family Type by Presence of Related Children Under 18 Years by Age of Related Children_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Other family:!!Female householder, no spouse present:!!With related children of the householder under 18 years: (B17010_037E)",Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total: (B11003_001E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11003_002E),Family Type by Presence and Age of Own Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With own children of the householder under 18 years: (B11003_003E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total: (B11004_001E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family: (B11004_002E),Family Type by Presence and Age of Related Children Under 18 Years_Estimate!!Total:!!Married-couple family:!!With related children of the householder under 18 years: (B11004_003E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level: (B17017_002E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Married-couple family: (B17017_004E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_015E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Family households:!!Other family: (B17017_009E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months below poverty level:!!Nonfamily households: (B17017_020E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level: (B17017_031E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Married-couple family: (B17017_033E),Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family: (B17017_038E),"Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Family households:!!Other family:!!Female householder, no spouse present: (B17017_044E)",Poverty Status in the Past 12 Months by Household Type by Age of Householder_Estimate!!Total:!!Income in the past 12 months at or above poverty level:!!Nonfamily households: (B17017_049E),state,county,Non-married-couple households below poverty level,Non-married-couple households at or above poverty level,Non-married households with 1+ kids below poverty level,Non-married households with 1+ kids at or above poverty level,Married Couple Household Poverty Rate,Non-Married Household Poverty Rate,Married Couple With 1+ Kids Poverty Rate,Non-Married Household With 1+ Kids Poverty Rate,Non-Married/Married Household Poverty Rate Difference,Non-Married Household With 1+ Kids/MarriedHousehold With 1+ Kids Poverty Rate Difference
1,"Autauga County, Alabama",58761,15363,707,321,50,509,393,11182,4931,404,2219,1088,15363,11889,5027,15363,11889,5252,2396,707,509,571,1118,19912,11182,2903,2219,5827,01,001,1689,8730,443,1492,0.059467,0.162108,0.061120,0.228941,0.102641,0.167821
2,"Baldwin County, Alabama",233420,61277,1840,939,519,1888,1521,49179,17444,1393,5537,2962,61277,51019,16575,61277,51019,18383,10315,1840,1888,2473,6002,80487,49179,7785,5537,23523,01,003,8475,31308,2040,4355,0.036065,0.213031,0.051080,0.318999,0.176966,0.267919
3,"Barbour County, Alabama",24877,5722,236,115,31,875,721,3195,1126,186,938,521,5722,3431,1089,5722,3431,1241,2169,236,875,957,976,6847,3195,1334,938,2318,01,005,1933,3652,752,707,0.068785,0.346106,0.092667,0.515422,0.277321,0.422754
4,"Bibb County, Alabama",22251,4871,226,154,69,449,426,3391,1136,213,305,225,4871,3617,1069,4871,3617,1290,1569,226,449,568,775,5647,3391,686,305,1570,01,007,1343,2256,495,438,0.062483,0.373159,0.119380,0.530547,0.310676,0.411167
5,"Blount County, Alabama",59077,15416,895,540,78,506,385,11171,4471,445,1853,1132,15416,12066,4415,15416,12066,5011,3469,895,506,674,1900,18157,11171,2676,1853,4310,01,009,2574,6986,463,1577,0.074175,0.269247,0.107763,0.226961,0.195071,0.119198
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3218,"Vega Baja Municipio, Puerto Rico",54182,13088,1897,793,525,2239,1488,5248,1366,504,1771,526,13088,7145,1784,13088,7145,2159,9042,1897,2239,3104,4041,11056,5248,2839,1771,2969,72,145,7145,5808,2013,1030,0.265500,0.551610,0.367300,0.661518,0.286109,0.294219
3219,"Vieques Municipio, Puerto Rico",8199,1588,196,61,33,472,275,561,44,0,182,105,1588,757,105,1588,757,105,1329,196,472,582,551,1279,561,249,182,469,72,147,1133,718,308,105,0.258917,0.612102,0.580952,0.745763,0.353185,0.164810
3220,"Villalba Municipio, Puerto Rico",21984,5861,855,288,340,966,578,2306,710,196,779,363,5861,3161,753,5861,3161,998,3606,855,966,1452,1299,4155,2306,1248,779,601,72,149,2751,1849,918,559,0.270484,0.598043,0.288577,0.621530,0.327559,0.332953
3221,"Yabucoa Municipio, Puerto Rico",30313,7791,1253,527,411,1657,989,3036,750,209,822,274,7791,4289,945,7791,4289,1277,5964,1253,1657,2241,2470,5718,3036,1261,822,1421,72,151,4711,2682,1400,483,0.292143,0.637224,0.412686,0.743494,0.345082,0.330808


In [22]:
df_marriage_poverty_census_data.to_csv(
    'Datasets/marriage_poverty_census_data.csv', index = False)