# Census Folium Tutorial

## Kenneth Burchfiel

Released under the MIT license

This tutorial demonstrates how to use the functions in census_folium_viewer.py to generate interactive zip-, county-, and state-level choropleth maps based on census data and shapefiles from the US Census Bureau. Within the tutorial, I will generate maps of two data types: (1) median household income and (2) the percentage of households that consist of a married couple with at least one child below the age of 18.

The tutorial also demonstrates how to incorporate a custom vertical legend into Folium maps. These have advantages over the default horizontal legend in certain circumstances.

**Note**: Some files in this project (such as the zip-code-level HTML maps) were too large to upload to GitHub. You can instead access those files via the following Google Drive folder: https://drive.google.com/drive/folders/11h1jnaVOA5A6ubbOJnC-kPEvdnJU00yv?usp=sharing


Citation info for color_schemes_from_branca.json:

Source: https://github.com/python-visualization/branca/blob/master/branca/_schemes.json

I believe these schemes were originally created by Cynthia Brewer, and are licensed under the Apache License, Version 2.0. See http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_updates.html

## Preliminary steps

First, you'll need to download zip code, county, and state shapefiles from the US Census bureau: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html

I used 2020 zip code shapefiles for this project, which are available here: https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=ZIP+Code+Tabulation+Areas

See this note regarding use of the shapefiles: https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2021/TGRSHP2021_TechDoc_Ch1.pdf

Once you download the shapefiles to your computer, extract them using an unzipping utility. The shapefile document within this unzipped folder ends in .shp; for the 2020 zip code data, the file name is tl_2020_us_zcta520.shp. 

It appears that the other files within this folder may also be used by Geopandas in the creation of GeoDataFrames, so I recommend accessing the file within the unzipped folder rather than copying it into your project folder.

This project already contains zip-, county-, and state-level American Community Survey data (5-year estimates) for various demographic variables. I created these data files via my Census query tutorial available at https://github.com/kburchfiel/census_query_tutorial . 


In [None]:
import time
start_time = time.time()
import census_folium_viewer
import geopandas
import numpy as np
import pandas as pd
import folium

## Part 1: Zip Code-Level Maps

The following code block uses census_folium_viewer.py's prepare_zip_table() function to create a GeoDataFrame storing both zip code and census data. It took 126.6 seconds for the function to create a table, whereas importing a saved version of this file took only 8.9 seconds. Therefore, to save time, I edited the code block so that it would only regenerate the GeoDataFrame if instructed to do so.

I recommend reading the documentation in census_folium_viewer.py for prepare_zip_table so that you'll better understand the inputs used by this function.

The zip code census data file only includes zip codes with at least 1,000 households, preventing outliers in the data related to low sample sizes.

In [None]:
create_new_zip_census_table = False

if create_new_zip_census_table == True:

    zip_and_census_table = census_folium_viewer.prepare_zip_table(
        shapefile_path = r'C:\Users\kburc\Downloads\tl_2020_us_zcta520\tl_2020_us_zcta520.shp',
        shape_feature_name = 'ZCTA5CE20', tolerance = 0.005, data_path =
        'acs5_2019_zip_results_1k_plus_households.csv',
        data_feature_name = 'NAME')
    print("Exporting data:")
    zip_and_census_table.to_file('zip_and_census_table.geojson',
    driver = 'GeoJSON') 
    # The above line exports the GeoDataFrame created by prepare_zip_table 
    # so that it can be imported back into the program, which takes less time
    # than does recreating the GeoDataFrame.

After being stored within the project folder, this table gets re-read into Python. That way, the program will still function even if create_new_zip_census_table was set to false.

In [None]:
zip_and_census_table = geopandas.read_file('zip_and_census_table.geojson')

First, I will create a map displaying median household income by zip code. (Note: I will exclude Puerto Rico from these maps so as to focus on the 50 US states and DC. Puerto Rico has the code 72 in the 'state' column of the merged zip code data table, hence the use of zip_and_census_table.query("state != 72") as the argument for the merged_data_table parameter.)

If you haven't done so already, I highly recommend reading the documentation for generate_map so that you'll know what inputs are necessary for the function to run correctly.

In [None]:
zip_and_census_table

In [None]:
zip_median_hh_income_map = census_folium_viewer.generate_map(
    merged_data_table = zip_and_census_table.query("state != 72 & Median_household_income >= 0"),
    shape_feature_name = 'ZCTA5CE20',
    data_variable = "Median_household_income", feature_text = 'Zip Code',
    data_variable_text = 'Median Household Income',
    popup_variable_text = 'Income', 
    variable_decimals = None,
    map_name = 'zip_median_hh_income',
    fill_color = 'RdYlGn',
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial',
    screenshot_save_path = 'census_folium_map_screenshots',
    bin_type = 'percentiles', rows_to_map = 0, multiply_data_by = 1, 
    vertical_legend = True)


In [None]:
# zip_median_hh_income_map

Next, I'll create a map showing the proportion of households in each zip code in the dataset that consist of a married couple with at least one child.

In [None]:
zip_married_couples_with_children_map = census_folium_viewer.generate_map(
    merged_data_table = zip_and_census_table.query("state != 72"),
    shape_feature_name = 'ZCTA5CE20',
    data_variable = "Married_couple_households_with_one_or_more_children_as_proportion_of_all_households",
    feature_text = 'Zip Code',
    data_variable_text= '% of households that consist of a married couple \
with at least one child',
    popup_variable_text = 'Percentage', 
    map_name = 'zip_married_couples_with_kids', fill_color = 'PuOr', 
    bin_count = 8,
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial',
    screenshot_save_path = 'census_folium_map_screenshots',
    bin_type = 'percentiles', rows_to_map = 0, multiply_data_by = 100,
    variable_decimals = 2, vertical_legend = True)

In [None]:
# zip_married_couples_with_children_map

## Part 2: County-Level Maps

The steps for generating county-level maps are similar. Note that the county census data file being imported only contains counties with at least 1,000 households.

In [None]:
county_and_census_table = census_folium_viewer.prepare_county_table(
    shapefile_path = r'C:\Users\kburc\Downloads\tl_2021_us_county\tl_2021_us_county.shp',
    shape_state_code_column = 'STATEFP', shape_county_code_column = 'COUNTYFP',
    tolerance = 0.005,
    data_path = 'acs5_2019_county_results_1k_plus_households.csv',
    data_state_code_column = 'state', data_county_code_column = 'county')
print("Exporting data:")
county_and_census_table.to_file('county_and_census_table.geojson',
    driver = 'GeoJSON') 

In [None]:
county_and_census_table

In [None]:
county_hh_income_map = census_folium_viewer.generate_map(
    merged_data_table = county_and_census_table.query("state != 72 & Median_household_income >= 0"), 
    shape_feature_name = 'NAME_y', # NAME_y is the copy of the 'NAME' column
    # from the data table. It contains both county and state names.
    data_variable = 'Median_household_income', feature_text = 'County',
    data_variable_text = 'Median Household Income',
    map_name = 'county_median_hh_income', 
    variable_decimals = None,
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial',
    screenshot_save_path = 'census_folium_map_screenshots',
    popup_variable_text = 'Income', fill_color = 'RdYlGn', 
    rows_to_map = 0, bin_type = 'percentiles', multiply_data_by = 1, 
    vertical_legend = True)

# county_hh_income_map

In [None]:
county_married_couples_with_children_map = census_folium_viewer.generate_map(
    merged_data_table = county_and_census_table.query("state != 72"),
    shape_feature_name = 'NAME_y',
    data_variable = 'Married_couple_households_with_one_or_more_children_as_proportion_of_all_households',
    feature_text = 'County', data_variable_text = '% of households \
that consist of a married couple with at least one child', 
    map_name = 'county_married_couples_with_kids', 
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial', 
    screenshot_save_path = 'census_folium_map_screenshots', 
    popup_variable_text = 'Percentage', fill_color = 'PuOr', bin_count = 8,
    rows_to_map = 0, bin_type = 'percentiles', multiply_data_by = 100,
    variable_decimals = 2, vertical_legend = True)

# county_married_couples_with_children_map

# Part 3: State-Level Maps

Finally, I'll create state-level maps of median household income and married-couple family prevalence.

In [None]:
state_and_census_table = census_folium_viewer.prepare_state_table(
    shapefile_path = r'C:\Users\kburc\Downloads\tl_2020_us_state\tl_2020_us_state.shp', 
    shape_feature_name = 'NAME', tolerance = 0.005, data_path = 
    'acs5_2019_state_results.csv', data_feature_name = 'NAME')
print("Exporting data:")
state_and_census_table.to_file('state_and_census_table.geojson',
driver = 'GeoJSON') 

In [None]:
state_and_census_table

In [None]:
state_married_couples_with_children_map = census_folium_viewer.generate_map(
    merged_data_table = state_and_census_table.query("NAME != 'Puerto Rico'"),
    shape_feature_name = 'NAME',
    data_variable = 'Married_couple_households_with_one_or_more_children_as_proportion_of_all_households', 
    feature_text = 'State', 
    data_variable_text = '% of households that consist of a married \
couple with at least one child', 
    map_name = 'state_married_couples_with_kids', 
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial',
    screenshot_save_path = 'census_folium_map_screenshots', 
    popup_variable_text = 'Percentage', fill_color = 'PuOr', 
    rows_to_map = 0, bin_type = 'percentiles', bin_count = 8, 
    multiply_data_by = 100,
    variable_decimals = 2, vertical_legend = True)

In [None]:
state_median_hh_income_map = census_folium_viewer.generate_map(
    merged_data_table = state_and_census_table.query("state != 72 & Median_household_income >= 0"), shape_feature_name = 'NAME',
    data_variable = 'Median_household_income', feature_text = 'State', 
    data_variable_text = 'Median Household Income', 
    map_name = 'state_median_hh_income', 
    variable_decimals = None,
    html_save_path = r'C:\Users\kburc\D1V1\Documents\!Dell64docs\Programming\py\kjb3_programs\census_folium_tutorial', 
    screenshot_save_path = 'census_folium_map_screenshots', 
    popup_variable_text = 'Income', fill_color = 'RdYlGn', 
    rows_to_map = 0, bin_count = 8,
    bin_type = 'percentiles', multiply_data_by = 1, vertical_legend = True, 
    generate_image = True)

In [None]:
end_time = time.time()
run_time = end_time - start_time
run_minutes = run_time // 60
run_seconds = run_time % 60
print("Completed run at",time.ctime(end_time),"(local time)")
print("Total run time:",'{:.2f}'.format(run_time),
"second(s) ("+str(run_minutes),"minute(s) and",'{:.2f}'.format(run_seconds),
"second(s))") 
# Only meaningful when the program is run nonstop from start to finish