# Creating Choropleth Maps with Folium [Work in progress]

By Kenneth Burchfiel

Released under the MIT license

*Note: In order to reduce this notebook's size (and thus allow it to display correctly on GitHub), the interactive maps created by this code will not be displayed here. However, you can find Google Sites-hosted copies of the maps via the following links:*

*[Link to Net Migration by County map](https://sites.google.com/view/pfn2-choropleth-maps/net-migration-by-county?authuser=0)*

*[Link to Net Migration by State map](https://sites.google.com/view/pfn2-choropleth-maps/net-migration-by-state?authuser=0)*

This code shows how to use Python's Folium library to generate choropleth maps of county- and state-level net domestic migration rates. The first part of the code demonstrates how to use Folium's built-in choropleth function; the second part demonstrates how to create a more efficient version of this same choropleth map using a separate function.

**A quick overview of domestic net migration**

Domestic net migration data comprises movements from one part of a country to another. For instance, if someone moves from Fairfax County, VA to Harris County, TX, Fairfax County's net migration totals for the year would decrease by 1 whereas Harris County's would increase by 1. Note that international migration, births, and deaths do *not* factor into net migration totals. 

This data offers an intriguing look into which parts of a country are attracting residents who are already here--and which are failing to attract (or retain) citizens. Let's say that NVCU's seniors want to know which parts of the country are particularly popular areas to which to move. In order to help answer this question, you've decided to create county-level net domestic migration maps. (The seniors will need to interpret this analysis with caution: for instance, they might not necessarily want to move to a place whose high net domestic migration rates are driven by retirees. Then again, that might be the *perfect* destination for a college grad who loves playing bridge and pickleball!)

**Choropleth maps**

Choropleth maps assign different colors to different regions depending on some underlying data point. Perhaps the most famous example of choropleth maps, at least in the US, are presidential election maps. Recent versions of these maps assign red and blue colors to states that are won by the Republican and Democratic nominees for president, respectively. During election night, they provide a useful overview of which states have been called for a particular candidate so far and also reveal interesting geographic trends. 

Our net migration maps will differ from these election maps in that colors will be assigned based on numerical data (net migration rates) rather than categorical data (a presidential party). States and counties with the lowest net migration rates will be colored blue, whereas those with the highest rates will be colored red. Those in the middle will take on a yellow hue. 

We'll use net migration *rates* (i.e. net migration totals divided by population totals) rather than nominal net migration counts because the former measure better reflects individuals' likelihood of moving to or from a given region. For instance, a county with 1 million residents and a positive net migration total of 2,000 would be a less 'hot' destination than a county with 100,000 residents and a net migration total of 1,000. The latter county's total is 50% that of the former's, but its rate (1000 / 100,000, or 1%) is five times that of the larger county's (2,000 / 1,000,000, or 0.2%), thus indicating a greater interest in moving there relative to its size.

In [1]:
import time
program_start_time = time.time()
import pandas as pd
import folium
import geopandas
import numpy as np

from branca.utilities import color_brewer
# color_brewer source code: 
# https://github.com/python-visualization/branca/blob/main/branca/utilities.py

from selenium import webdriver # selenium will be used to generate
# screenshots of our HTML-based maps. I found that selenium didn't work 
# correctly within JupyterLab Desktop right after the library was 
# installed; however, when I restarted my kernel and then restarted 
# JupyterLab Desktop, it ended up working fine. (Restarting JupyterLab Desktop
# alone probably would have resolved this issue.)


import time

import os

import branca.colormap as cm
# From https://python-visualization.github.io/folium/latest/advanced_guide/colormaps.html#StepColormap
from branca.colormap import StepColormap # From Folium's features.py
# source code: 
# https://github.com/python-visualization/folium/blob/main/folium/features.py

pd.set_option('display.max_columns', 1000) # Allows more columns to be viewed within the output

# Part 1: Gathering required data

The 'co-est2023-alldata.csv' file located within the same folder as this code contains net migration data for US states and counties during the 2020-2023 time period. I found this data on the Census site's [**County Population Totals and Components of Change: 2020-2023**](https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-total.html) page. The dataset is listed under a relatively long title:

"Annual Resident Population Estimates, Estimated Components of Resident Population Change, and Rates of the Components of Resident Population Change for States and Counties: April 1, 2020 to July 1, 2023 (CO-EST2023-alldata)"

Definitions of each column within this file can be found within CO-EST2023-ALLDATA.pdf, which I also downloaded from the aforementioned website to this section's folder.

You may want to check the Census website to see whether a later version of the file exists. As long as that newer file matches the format of 'co-est2023-alldata.csv', the rest of the code should still work correctly.

In [2]:
df_nm = pd.read_csv('co-est2023-alldata.csv', encoding = 'latin_1') # nm = 'net migration. This dataset contains other data as well.
# I needed to add in 'encoding = latin_1' because the default encoding setting produced an error message.
df_nm.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023
0,40,3,6,1,0,Alabama,Alabama,5024294,5031864,5050380,5073903,5108468,7570,18516,23523,34565,13867,57184,58106,58251,15165,69135,67208,59813,-1298,-11951,-9102,-1562,125,1806,4374,5384,9615,27715,28464,30744,9740,29521,32838,36128,-872,946,-213,-1,127914,131372,134653,141654,144014,11.343506,11.478541,11.441539,13.714209,13.276595,11.748344,-2.370702,-1.798053,-0.306805,0.358254,0.864061,1.057514,5.497784,5.622917,6.038672,5.856038,6.486978,7.096186
1,50,3,6,1,1,Alabama,Autauga County,58809,58915,59203,59726,60342,106,288,523,616,162,686,706,714,176,696,687,621,-14,-10,19,93,0,15,22,34,100,242,507,491,100,257,529,525,20,41,-25,-2,484,484,484,484,484,11.615503,11.87263,11.89326,11.784825,11.553112,10.344138,-0.169322,0.319518,1.549122,0.253983,0.369969,0.566346,4.097597,8.526095,8.178699,4.351581,8.896064,8.745044
2,50,3,6,1,3,Alabama,Baldwin County,231768,233227,239439,246531,253507,1459,6212,7092,6976,560,2337,2511,2531,602,2948,3022,2640,-42,-611,-511,-109,11,105,250,291,1613,6972,7036,6804,1624,7077,7286,7095,-123,-254,317,-10,3549,3448,3351,3468,3493,9.888589,10.333971,10.123231,12.473925,12.436982,10.559198,-2.585335,-2.10301,-0.435967,0.444288,1.02887,1.163912,29.500747,28.95652,27.213932,29.945035,29.98539,28.377843
3,50,3,6,1,5,Alabama,Barbour County,25229,24969,24533,24700,24585,-260,-436,167,-115,60,270,278,267,92,390,363,350,-32,-120,-85,-83,0,0,2,13,-186,-313,237,-45,-186,-313,239,-32,-42,-3,13,0,2721,2482,2248,2702,2702,10.90865,11.293238,10.83494,15.756939,14.746207,14.203104,-4.848289,-3.452969,-3.368165,0.0,0.081246,0.527544,-12.645954,9.627689,-1.826113,-12.645954,9.708935,-1.29857
4,50,3,6,1,7,Alabama,Bibb County,22301,22188,22359,21986,21868,-113,171,-373,-118,56,240,236,240,55,325,314,290,1,-85,-78,-50,0,1,1,1,-101,254,-303,-69,-101,255,-302,-68,-13,1,7,0,2141,2060,1994,2025,1894,10.775136,10.643816,10.94541,14.591331,14.161687,13.225703,-3.816194,-3.517871,-2.280294,0.044896,0.045101,0.045606,11.403686,-13.665577,-3.146805,11.448582,-13.620476,-3.101199


## Calculating total net migration across all years within the dataset:

As specified within the ALLDATA.pdf file, 'DOMESTICMIG' columns show net domestic migration values. Note that there's a column for each year, and that all but one of these years begins on July 1 and ends on June 30. (The one exception is DOMESTICMIG2020, which includes uses a date range of 2020-04-01 to 2020-06-30).

For our choropleth map, we'll look into net migration across the entire date range in the dataset (e.g. 2020-04-01 to the latest date available). As of 2024-05-24, we had data up to June 30, 2023, but the Census will probably provide data for additional years in the future.

Therefore, rather than explicitly specifying the years available within our code, we'll instead add in Python code that determines the first and last years with net migration data. This *should* make it easier to update the code with a later dataset. (I say 'should' because, if the Census bureau decides to change the column names within this dataset, we'll need to rewrite the code anyway to handle those new columns.)


In [3]:
# Determining all columns with nominal net migration data:
# (These columns all begin with DOMESTICMIG. There are also 
# RDOMESTICMIG columns that show migration rates, but checking
# to see whether the *first* 11 characters contain 'DOMESTICMIG'
# will exclude them. 
nm_cols = [column for column in df_nm.columns 
           if column[0:11] == 'DOMESTICMIG']
nm_cols.sort() # This ensures that the first and last columns in our 
# list will show the earliest and latest years with net migration data,
# respectively.
nm_cols

['DOMESTICMIG2020', 'DOMESTICMIG2021', 'DOMESTICMIG2022', 'DOMESTICMIG2023']

Using our list of net migration columns to identify the earliest and latest years with net migration data:

In [4]:
first_nm_year = nm_cols[0][-4:] # -4: retrieves the final 4 characters
# within the column names (i.e. the year).
last_nm_year = nm_cols[-1][-4:]
first_nm_year, last_nm_year

('2020', '2023')

We can now use our column list and starting/ending years to create a column that sums up all domestic net migration values for all of the years present in the dataset. In addition, we'll create a column that divides this sum by the population values in the first year of the dataset, thus allowing us to determine total net migration rates.

In [5]:
# Adding all of the net migration values within nm_cols together:
df_nm[f'{first_nm_year}-{last_nm_year} Total Domestic \
Net Migration'] = df_nm[nm_cols].sum(axis = 1)

# Creating our domestic migration rate column:
total_nm_rate_col = f'{first_nm_year}-{last_nm_year} Total Domestic Net \
Migration as % of {first_nm_year} Population'

df_nm[total_nm_rate_col] = (
    100 * df_nm[f'{first_nm_year}-{last_nm_year} Total Domestic Net Migration'] /
    df_nm[f'ESTIMATESBASE{first_nm_year}']) # Multiplying by 100 converts
    # the proportions into percentages
# As of 2024-05-24, the starting population selected by this code will be
# ESTIMATEBASE2020, which shows population estimates on 2020-04-01.
# There's also a POPESTIMATE2020 column, but its reference date is 
# 2020-07-01. Since we're incorporating net migration data from 2020-04-01
# to 2020-07-01 into our analysis, the ESTIMATEBASE2020 column is the best
# set of population data to use.

df_nm.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population
0,40,3,6,1,0,Alabama,Alabama,5024294,5031864,5050380,5073903,5108468,7570,18516,23523,34565,13867,57184,58106,58251,15165,69135,67208,59813,-1298,-11951,-9102,-1562,125,1806,4374,5384,9615,27715,28464,30744,9740,29521,32838,36128,-872,946,-213,-1,127914,131372,134653,141654,144014,11.343506,11.478541,11.441539,13.714209,13.276595,11.748344,-2.370702,-1.798053,-0.306805,0.358254,0.864061,1.057514,5.497784,5.622917,6.038672,5.856038,6.486978,7.096186,96538,1.921424
1,50,3,6,1,1,Alabama,Autauga County,58809,58915,59203,59726,60342,106,288,523,616,162,686,706,714,176,696,687,621,-14,-10,19,93,0,15,22,34,100,242,507,491,100,257,529,525,20,41,-25,-2,484,484,484,484,484,11.615503,11.87263,11.89326,11.784825,11.553112,10.344138,-0.169322,0.319518,1.549122,0.253983,0.369969,0.566346,4.097597,8.526095,8.178699,4.351581,8.896064,8.745044,1340,2.278563
2,50,3,6,1,3,Alabama,Baldwin County,231768,233227,239439,246531,253507,1459,6212,7092,6976,560,2337,2511,2531,602,2948,3022,2640,-42,-611,-511,-109,11,105,250,291,1613,6972,7036,6804,1624,7077,7286,7095,-123,-254,317,-10,3549,3448,3351,3468,3493,9.888589,10.333971,10.123231,12.473925,12.436982,10.559198,-2.585335,-2.10301,-0.435967,0.444288,1.02887,1.163912,29.500747,28.95652,27.213932,29.945035,29.98539,28.377843,22425,9.675624
3,50,3,6,1,5,Alabama,Barbour County,25229,24969,24533,24700,24585,-260,-436,167,-115,60,270,278,267,92,390,363,350,-32,-120,-85,-83,0,0,2,13,-186,-313,237,-45,-186,-313,239,-32,-42,-3,13,0,2721,2482,2248,2702,2702,10.90865,11.293238,10.83494,15.756939,14.746207,14.203104,-4.848289,-3.452969,-3.368165,0.0,0.081246,0.527544,-12.645954,9.627689,-1.826113,-12.645954,9.708935,-1.29857,-307,-1.216854
4,50,3,6,1,7,Alabama,Bibb County,22301,22188,22359,21986,21868,-113,171,-373,-118,56,240,236,240,55,325,314,290,1,-85,-78,-50,0,1,1,1,-101,254,-303,-69,-101,255,-302,-68,-13,1,7,0,2141,2060,1994,2025,1894,10.775136,10.643816,10.94541,14.591331,14.161687,13.225703,-3.816194,-3.517871,-2.280294,0.044896,0.045101,0.045606,11.403686,-13.665577,-3.146805,11.448582,-13.620476,-3.101199,-219,-0.982019


This table contains both county-specific and statewide data. Rows with COUNTY values of 0 represent statewide totals; therefore, we can create a county-level dataset by choosing only rows with a non-zero county value and a statewide dataset by choosing only rows with 0 COUNTY values.

(We could also have used SUMLEV column values (50 for counties and 40 for states) as a basis for this split.)

In [6]:
df_nm_county = df_nm.query("COUNTY != 0").copy()
df_nm_county.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population
1,50,3,6,1,1,Alabama,Autauga County,58809,58915,59203,59726,60342,106,288,523,616,162,686,706,714,176,696,687,621,-14,-10,19,93,0,15,22,34,100,242,507,491,100,257,529,525,20,41,-25,-2,484,484,484,484,484,11.615503,11.87263,11.89326,11.784825,11.553112,10.344138,-0.169322,0.319518,1.549122,0.253983,0.369969,0.566346,4.097597,8.526095,8.178699,4.351581,8.896064,8.745044,1340,2.278563
2,50,3,6,1,3,Alabama,Baldwin County,231768,233227,239439,246531,253507,1459,6212,7092,6976,560,2337,2511,2531,602,2948,3022,2640,-42,-611,-511,-109,11,105,250,291,1613,6972,7036,6804,1624,7077,7286,7095,-123,-254,317,-10,3549,3448,3351,3468,3493,9.888589,10.333971,10.123231,12.473925,12.436982,10.559198,-2.585335,-2.10301,-0.435967,0.444288,1.02887,1.163912,29.500747,28.95652,27.213932,29.945035,29.98539,28.377843,22425,9.675624
3,50,3,6,1,5,Alabama,Barbour County,25229,24969,24533,24700,24585,-260,-436,167,-115,60,270,278,267,92,390,363,350,-32,-120,-85,-83,0,0,2,13,-186,-313,237,-45,-186,-313,239,-32,-42,-3,13,0,2721,2482,2248,2702,2702,10.90865,11.293238,10.83494,15.756939,14.746207,14.203104,-4.848289,-3.452969,-3.368165,0.0,0.081246,0.527544,-12.645954,9.627689,-1.826113,-12.645954,9.708935,-1.29857,-307,-1.216854
4,50,3,6,1,7,Alabama,Bibb County,22301,22188,22359,21986,21868,-113,171,-373,-118,56,240,236,240,55,325,314,290,1,-85,-78,-50,0,1,1,1,-101,254,-303,-69,-101,255,-302,-68,-13,1,7,0,2141,2060,1994,2025,1894,10.775136,10.643816,10.94541,14.591331,14.161687,13.225703,-3.816194,-3.517871,-2.280294,0.044896,0.045101,0.045606,11.403686,-13.665577,-3.146805,11.448582,-13.620476,-3.101199,-219,-0.982019
5,50,3,6,1,9,Alabama,Blount County,59130,59107,59079,59516,59816,-23,-28,437,300,137,654,693,698,199,875,846,776,-62,-221,-153,-78,1,9,8,24,21,141,589,358,22,150,597,382,17,43,-7,-4,616,616,616,616,616,11.067301,11.686833,11.698455,14.807168,14.267043,13.005732,-3.739868,-2.58021,-1.307277,0.152302,0.134913,0.402239,2.386069,9.932965,6.000067,2.538372,10.067878,6.402306,1109,1.875528


In [7]:
df_nm_state = df_nm.query("COUNTY == 0").copy()
df_nm_state.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population
0,40,3,6,1,0,Alabama,Alabama,5024294,5031864,5050380,5073903,5108468,7570,18516,23523,34565,13867,57184,58106,58251,15165,69135,67208,59813,-1298,-11951,-9102,-1562,125,1806,4374,5384,9615,27715,28464,30744,9740,29521,32838,36128,-872,946,-213,-1,127914,131372,134653,141654,144014,11.343506,11.478541,11.441539,13.714209,13.276595,11.748344,-2.370702,-1.798053,-0.306805,0.358254,0.864061,1.057514,5.497784,5.622917,6.038672,5.856038,6.486978,7.096186,96538,1.921424
68,40,4,9,2,0,Alaska,Alaska,733374,732964,734923,733276,733406,-410,1959,-1647,130,2406,9453,9356,9266,1171,5330,6340,5581,1235,4123,3016,3685,63,873,2356,2004,-1663,-2948,-7202,-5543,-1600,-2075,-4846,-3539,-45,-89,183,-16,30285,30262,30654,29833,29815,12.879738,12.744866,12.635322,7.262139,8.636431,7.610375,5.617599,4.108435,5.024947,1.189465,3.209374,2.732699,-4.016658,-9.810659,-7.558557,-2.827193,-6.601285,-4.825859,-17356,-2.366596
99,40,4,8,4,0,Arizona,Arizona,7157902,7186683,7272487,7365684,7431344,28781,85804,93197,65660,18110,75693,79137,78494,18025,80276,79807,70792,85,-4583,-670,7702,253,8010,22296,21635,29980,82290,69798,36179,30233,90300,92094,57814,-1537,87,1773,144,166668,165653,150826,161194,162728,10.469896,10.812416,10.609428,11.103819,10.903958,9.568408,-0.633923,-0.091541,1.04102,1.107947,3.046282,2.924236,11.382396,9.536437,4.890036,12.490343,12.58272,7.814272,218247,3.049036
115,40,3,7,5,0,Arkansas,Arkansas,3011490,3014348,3028443,3046404,3067732,2858,14095,17961,21328,8509,34927,36298,35566,8475,40180,40362,36473,34,-5253,-4064,-907,94,1346,3215,4096,2537,17806,18841,18106,2631,19152,22056,22202,193,196,-31,33,82490,82455,82617,84268,85120,11.55989,11.95026,11.634023,13.298491,13.288236,11.930713,-1.738601,-1.337976,-0.29669,0.44549,1.058463,1.339846,5.893303,6.202955,5.922668,6.338793,7.261417,7.262514,57290,1.902381
191,40,4,9,6,0,California,California,39538212,39503200,39145060,39040616,38965193,-35012,-358140,-104444,-75423,103133,412506,423922,414120,74587,345181,318145,302704,28546,67325,105777,111416,1383,44116,126517,150982,-67932,-458862,-332785,-338371,-66549,-414746,-206268,-187389,2991,-10719,-3953,550,917958,897669,771170,886334,891995,10.489895,10.843981,10.617671,8.777842,8.138191,7.761063,1.712053,2.70579,2.856608,1.121856,3.236322,3.871045,-11.668713,-8.512685,-8.675533,-10.546858,-5.276363,-4.804488,-1197950,-3.029854


## Adding percentile data to each dataset

In order to make comparisons between net migration rates easier, we'll create a column that stores percentile data.

(Applying this code to each dataset separately prevents state values from influencing county percentiles and vice versa.)

In [8]:
percentile_col = f'{first_nm_year}-{last_nm_year} \
Domestic Net Migration Percentile'
percentile_col

for df in [df_nm_state, df_nm_county]: # Using a for loop allows us to apply
    # the same code to our state and county DataFrames.
    df[percentile_col] = 100 * df[total_nm_rate_col].rank(pct = True)

df_nm_county.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
1,50,3,6,1,1,Alabama,Autauga County,58809,58915,59203,59726,60342,106,288,523,616,162,686,706,714,176,696,687,621,-14,-10,19,93,0,15,22,34,100,242,507,491,100,257,529,525,20,41,-25,-2,484,484,484,484,484,11.615503,11.87263,11.89326,11.784825,11.553112,10.344138,-0.169322,0.319518,1.549122,0.253983,0.369969,0.566346,4.097597,8.526095,8.178699,4.351581,8.896064,8.745044,1340,2.278563,68.225191
2,50,3,6,1,3,Alabama,Baldwin County,231768,233227,239439,246531,253507,1459,6212,7092,6976,560,2337,2511,2531,602,2948,3022,2640,-42,-611,-511,-109,11,105,250,291,1613,6972,7036,6804,1624,7077,7286,7095,-123,-254,317,-10,3549,3448,3351,3468,3493,9.888589,10.333971,10.123231,12.473925,12.436982,10.559198,-2.585335,-2.10301,-0.435967,0.444288,1.02887,1.163912,29.500747,28.95652,27.213932,29.945035,29.98539,28.377843,22425,9.675624,96.660305
3,50,3,6,1,5,Alabama,Barbour County,25229,24969,24533,24700,24585,-260,-436,167,-115,60,270,278,267,92,390,363,350,-32,-120,-85,-83,0,0,2,13,-186,-313,237,-45,-186,-313,239,-32,-42,-3,13,0,2721,2482,2248,2702,2702,10.90865,11.293238,10.83494,15.756939,14.746207,14.203104,-4.848289,-3.452969,-3.368165,0.0,0.081246,0.527544,-12.645954,9.627689,-1.826113,-12.645954,9.708935,-1.29857,-307,-1.216854,22.073791
4,50,3,6,1,7,Alabama,Bibb County,22301,22188,22359,21986,21868,-113,171,-373,-118,56,240,236,240,55,325,314,290,1,-85,-78,-50,0,1,1,1,-101,254,-303,-69,-101,255,-302,-68,-13,1,7,0,2141,2060,1994,2025,1894,10.775136,10.643816,10.94541,14.591331,14.161687,13.225703,-3.816194,-3.517871,-2.280294,0.044896,0.045101,0.045606,11.403686,-13.665577,-3.146805,11.448582,-13.620476,-3.101199,-219,-0.982019,24.80916
5,50,3,6,1,9,Alabama,Blount County,59130,59107,59079,59516,59816,-23,-28,437,300,137,654,693,698,199,875,846,776,-62,-221,-153,-78,1,9,8,24,21,141,589,358,22,150,597,382,17,43,-7,-4,616,616,616,616,616,11.067301,11.686833,11.698455,14.807168,14.267043,13.005732,-3.739868,-2.58021,-1.307277,0.152302,0.134913,0.402239,2.386069,9.932965,6.000067,2.538372,10.067878,6.402306,1109,1.875528,64.408397


Let's now take a look at the states with the highest and lowest net migration rates in our dataset.

Highest rates:

In [9]:
df_nm_state.sort_values(
    percentile_col, ascending = False).head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
565,40,4,8,16,0,Idaho,Idaho,1839117,1849339,1904537,1938996,1964726,10222,55198,34459,25730,5467,21850,22418,22456,3629,17141,18938,16817,1838,4709,3480,5639,58,698,1779,4664,8814,51524,28586,15389,8872,52222,30365,20053,-488,-1733,614,38,49704,49640,50083,50140,50256,11.641301,11.665309,11.504918,9.132427,9.854475,8.61588,2.508873,1.810834,2.889038,0.371882,0.925711,2.389514,27.451093,14.874856,7.88427,27.822976,15.800567,10.273785,104313,5.671907,100.0
2358,40,3,5,45,0,South Carolina,South Carolina,5118422,5132151,5193848,5282955,5373555,13729,61697,89107,90600,13492,55815,57508,57581,14065,64911,64636,58852,-573,-9096,-7128,-1271,335,4028,10467,9291,14724,67368,83401,82562,15059,71396,93868,91853,-757,-603,2367,18,138600,136629,135870,139073,139093,10.810576,10.978158,10.806728,12.572343,12.338879,11.045267,-1.761767,-1.36072,-0.23854,0.780167,1.998129,1.743723,13.048229,15.921078,15.495129,13.828396,17.919207,17.238852,248055,4.846318,98.039216
1626,40,4,8,30,0,Montana,Montana,1084244,1087211,1106366,1122878,1132812,2967,19155,16512,9934,2812,10841,11219,11206,2518,12151,12896,11377,294,-1310,-1677,-171,44,712,1793,609,2746,20376,15837,9485,2790,21088,17630,10094,-117,-623,559,11,29305,29291,29311,29306,29307,9.884312,10.065296,9.935762,11.078708,11.569842,10.087379,-1.194396,-1.504546,-0.151617,0.649168,1.608617,0.539968,18.577875,14.208404,8.409844,19.227043,15.817021,8.949811,48444,4.467998,96.078431
325,40,3,5,10,0,Delaware,Delaware,989946,991862,1004881,1019459,1031890,1916,13019,14578,12431,2579,10174,10883,10725,2959,11228,11488,10897,-380,-1054,-605,-172,62,939,2378,2277,2195,13284,12669,10320,2257,14223,15047,12597,39,-150,136,6,22718,22409,22717,23809,23737,10.190595,10.752146,10.456534,11.246315,11.349872,10.624228,-1.055719,-0.597726,-0.167695,0.940532,2.349408,2.220003,13.305668,12.516672,10.061672,14.2462,14.86608,12.281674,38468,3.885869,94.117647
331,40,3,5,12,0,Florida,Florida,21538216,21591299,21830708,22245521,22610726,53083,239409,414813,365205,50294,207942,222003,223578,56383,250389,260220,231181,-6089,-42447,-38217,-7603,2840,46865,121233,178432,61782,244619,317923,194438,64622,291484,439156,372870,-5450,-9628,13874,-62,464639,470385,475056,479972,480002,9.577724,10.073593,9.968645,11.532816,11.807725,10.307639,-1.955092,-1.734132,-0.338994,2.158583,5.50106,7.955726,11.267052,14.426053,8.669383,13.425635,19.927113,16.625109,818762,3.801438,92.156863


Lowest rates:

*Note that, while the percentile for the highest state is 100, the lowest state's percentile is 1.96 (100 / 51). I believe this is because Pandas is calculating each entry's percentile as the percentage of entries that are *equal to* or lower than all entries within the dataset. Therefore, since the lowest entry's percentile is at least equal to itself, its percentile won't be 0. This isn't the only way to define percentiles, but it works well enough for our purposes.*

In [10]:
df_nm_state.sort_values(percentile_col, ascending = True).head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
1862,40,1,2,36,0,New York,New York,20202320,20104710,19854526,19673200,19571216,-97610,-250184,-181326,-101984,53056,205414,210106,207450,72698,177088,176382,165914,-19642,28326,33724,41536,1343,28769,77285,73867,-72398,-295159,-298341,-216778,-71055,-266390,-221056,-142911,-6913,-12120,6006,-609,609872,576054,610499,613035,616970,10.281178,10.630816,10.572205,8.863433,8.92447,8.455419,1.417745,1.706347,2.116785,1.439917,3.91042,3.764459,-14.773005,-15.095278,-11.047584,-13.333088,-11.184858,-7.283125,-882676,-4.369181,1.960784
329,40,3,5,11,0,District of Columbia,District of Columbia,689548,670839,669037,670949,678972,-18709,-1802,1912,8023,2187,8508,8399,7627,2004,5837,5572,5020,183,2671,2827,2607,50,1745,4527,6969,-15475,-6551,-4917,-1509,-15425,-4806,-390,5460,-3467,333,-525,-44,40699,21719,38289,36723,37601,12.699683,12.535952,11.299921,8.712747,8.316505,7.437472,3.986936,4.219447,3.862448,2.604719,6.756787,10.325049,-9.778517,-7.338883,-2.235686,-7.173798,-0.582096,8.089362,-28452,-4.126181,3.921569
191,40,4,9,6,0,California,California,39538212,39503200,39145060,39040616,38965193,-35012,-358140,-104444,-75423,103133,412506,423922,414120,74587,345181,318145,302704,28546,67325,105777,111416,1383,44116,126517,150982,-67932,-458862,-332785,-338371,-66549,-414746,-206268,-187389,2991,-10719,-3953,550,917958,897669,771170,886334,891995,10.489895,10.843981,10.617671,8.777842,8.138191,7.761063,1.712053,2.70579,2.856608,1.121856,3.236322,3.871045,-11.668713,-8.512685,-8.675533,-10.546858,-5.276363,-4.804488,-1197950,-3.029854,5.882353
559,40,4,9,15,0,Hawaii,Hawaii,1455274,1451181,1446745,1439399,1435138,-4093,-4436,-7346,-4261,3836,15528,15746,15167,2843,12152,13375,12812,993,3376,2371,2355,119,2043,5917,4627,-4831,-9982,-15664,-11193,-4712,-7939,-9747,-6566,-374,127,30,-50,40759,38589,40519,40072,40128,10.71663,10.911444,10.552656,8.386688,9.268422,8.914131,2.329942,1.643023,1.638525,1.409974,4.100281,3.219301,-6.889065,-10.854621,-7.787689,-5.479091,-6.754341,-4.568388,-41670,-2.863378,7.843137
610,40,2,3,17,0,Illinois,Illinois,12813469,12790357,12690341,12582515,12549689,-23112,-100016,-107826,-32826,33452,131888,129614,127235,35296,128385,127064,116782,-1844,3503,2550,10453,504,11447,31224,40492,-22545,-115656,-142403,-83839,-22041,-104209,-111179,-43347,773,690,803,68,276332,278270,278180,278163,278186,10.351993,10.257171,10.125256,10.077039,10.055373,9.293415,0.274953,0.201798,0.831841,0.898484,2.470951,3.22232,-9.07793,-11.269245,-6.671838,-8.179446,-8.798293,-3.449518,-364443,-2.844218,9.803922


We'll now perform the same analysis for counties. However, in order to prevent unusual migration patterns within smaller counties from skewing our results, we'll limit our results to counties that had at least 100,000 residents in 2020.

In [11]:
print(f"{len(df_nm_county.query(\
    'POPESTIMATE2020 >= 100000'))} counties out of {len(df_nm_county)} \
had over 100,000 residents in 2020.")

605 counties out of 3144 had over 100,000 residents in 2020.


In [12]:
df_nm_county.query("POPESTIMATE2020 >= 100000").sort_values(
    percentile_col, ascending = False).head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
2697,50,3,7,48,257,Texas,Kaufman County,145307,147140,158313,172611,185690,1833,11173,14298,13079,467,2124,2359,2412,288,1421,1458,1287,179,703,901,1125,1,52,166,179,1797,10962,12616,11819,1798,11014,12782,11998,-144,-544,615,-44,1523,1523,1530,1564,1568,13.907213,14.25705,13.46354,9.304214,8.811691,7.183904,4.602999,5.445359,6.279636,0.340478,1.003252,0.99916,71.775363,76.247114,65.972464,72.115841,77.250366,66.971624,37194,25.59684,100.0
391,50,3,5,12,119,Florida,Sumter County,129745,130290,134870,144767,151565,545,4580,9897,6798,105,483,532,541,573,2598,2830,2530,-468,-2115,-2298,-1989,2,32,51,104,1176,7073,11472,8684,1178,7105,11523,8788,-165,-410,672,-1,8542,7838,7476,8378,8389,3.643083,3.804933,3.65131,19.595716,20.240526,17.075442,-15.952632,-16.435593,-13.424132,0.241364,0.364759,0.701915,53.348921,82.049228,58.609938,53.590285,82.413987,59.311853,28405,21.892944,99.968193
2767,50,3,7,48,397,Texas,Rockwall County,107844,109158,116612,123342,131307,1314,7454,6730,7965,270,1210,1336,1367,165,848,868,781,105,362,468,586,2,66,139,156,1299,7380,5896,7267,1301,7446,6035,7423,-92,-354,227,-44,662,662,662,723,723,10.718873,11.135468,10.736347,7.51207,7.23472,6.133933,3.206803,3.900748,4.602414,0.584666,1.158555,1.225216,65.376268,49.142752,57.07464,65.960934,50.301308,58.299856,21842,20.253329,99.936387
2614,50,3,7,48,91,Texas,Comal County,161493,163659,174977,184749,193928,2166,11318,9772,9179,389,1722,1807,1822,355,1778,1751,1550,34,-56,56,272,3,10,42,45,2377,12003,9237,8891,2380,12013,9279,8936,-248,-639,437,-29,1517,1517,1483,1486,1486,10.170212,10.046535,9.622977,10.500951,9.735187,8.186396,-0.330739,0.311348,1.436581,0.05906,0.233511,0.23767,70.890277,51.355754,46.958226,70.949338,51.589265,47.195895,32508,20.129665,99.90458
1935,50,3,5,37,19,North Carolina,Brunswick County,136695,138168,144843,152908,159964,1473,6675,8065,7056,221,1082,1139,1127,416,1876,2102,1921,-195,-794,-963,-794,0,26,22,25,1930,7899,8446,7796,1930,7925,8468,7821,-262,-456,560,29,961,961,900,941,942,7.646346,7.650688,7.204224,13.257435,14.11918,12.279782,-5.611089,-6.468492,-5.075558,0.183738,0.147774,0.15981,55.821152,56.731967,49.835076,56.00489,56.879742,49.994886,26071,19.072387,99.840967


In [13]:
df_nm_county.query("POPESTIMATE2020 >= 100000").sort_values(
    percentile_col, ascending = True).head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
1865,50,1,2,36,5,New York,Bronx County,1472653,1461151,1424084,1381808,1356476,-11502,-37067,-42276,-25332,4391,16929,16893,16614,7099,11382,11243,10648,-2708,5547,5650,5966,294,4168,10618,10224,-8289,-44747,-61394,-41473,-7995,-40579,-50776,-31249,-799,-2035,2850,-49,52840,51113,53647,53512,53988,11.734919,12.041091,12.134607,7.889825,8.013851,7.777133,3.845094,4.02724,4.357474,2.889193,7.56836,7.46745,-31.017924,-43.760772,-30.291233,-28.128731,-36.192412,-22.823783,-155903,-10.58654,0.445293
1886,50,1,2,36,47,New York,Kings County,2736119,2718447,2637522,2589531,2561225,-17672,-80925,-47991,-28306,8395,33117,32458,31066,11048,18381,18101,16892,-2653,14736,14357,14174,255,5078,14105,13292,-14297,-95883,-78188,-55308,-14042,-90805,-64083,-42016,-977,-4856,1735,-464,45983,42094,43258,45331,45473,12.36639,12.419235,12.062695,6.863744,6.925891,6.559037,5.502646,5.493344,5.503658,1.896202,5.396923,5.161184,-35.804165,-29.916666,-21.475682,-33.907963,-24.519744,-16.314498,-243676,-8.905899,0.604326
229,50,4,9,6,75,California,San Francisco County,873950,870518,811935,807774,808988,-3432,-58583,-4161,1214,2098,7532,7343,7240,1601,6790,6510,6200,497,742,833,1040,84,2041,5601,6201,-3744,-57177,-10206,-5925,-3660,-55136,-4605,276,-269,-4189,-389,-102,27872,27831,22406,25648,26584,8.953593,9.067061,8.956173,8.071548,8.038481,7.669651,0.882045,1.02858,1.286522,2.426219,6.916057,7.670888,-67.968615,-12.602264,-7.329465,-65.542396,-5.686207,0.341423,-77052,-8.816523,0.636132
1903,50,1,2,36,81,New York,Queens County,2405425,2388864,2329008,2278558,2252196,-16561,-59856,-50450,-26362,6240,23189,23688,23304,10534,16443,16586,15544,-4294,6746,7102,7760,232,6388,16958,16290,-11650,-69771,-77029,-50161,-11418,-63383,-60071,-33871,-849,-3219,2519,-251,35116,33404,33843,34696,35103,9.830279,10.282218,10.287029,6.970516,7.199463,6.861551,2.859764,3.082756,3.425478,2.708001,7.360936,7.190856,-29.577318,-33.435875,-22.142451,-26.869317,-26.074938,-14.951595,-208611,-8.672521,0.763359
1253,50,1,1,25,25,Massachusetts,Suffolk County,800930,797181,774514,768812,768425,-3749,-22667,-5702,-387,2197,8070,8267,8210,2103,4993,5071,4932,94,3077,3196,3278,190,3508,8914,10606,-3813,-27830,-18094,-14207,-3623,-24322,-9180,-3601,-220,-1422,282,-64,50897,49884,49533,49918,49889,10.269168,10.713226,10.681502,6.35365,6.571522,6.416707,3.915518,4.141704,4.264795,4.46397,11.551675,13.798783,-35.413996,-23.44806,-18.483812,-30.950025,-11.896385,-4.685029,-63944,-7.983719,0.954198


# Importing county and state boundaries

In order to create maps of net migration data by state and county, we'll need to import *shapefiles* that show the outlines of each state and county. These can be accessed via [this Census link](https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html).

The shapefile folder contains unzipped 1:500,000-scale versions of the county and state shapefiles available on the Census website. (I could also have downloaded less detailed shapefiles, which offer smaller file sizes, but it's easy to create smaller versions of these files via Python, as I'll demonstrate soon.)

Note that each folder contains a number of files. My experience has been that all of these files need to be present in the folder containing a given shapefile in order for that shapefile to work correctly within Python. Your experience may vary, though.

In [14]:
gdf_counties = geopandas.read_file('shapefiles/cb_2023_us_county_500k/cb_2023_us_county_500k.shp')
# This abbreviation of 'GeoDataFrame' as 'gdf' derives from
# https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html
# The current versions of the county outlines would result in a very large output file (e.g. around 42 MB).
# We can reduce this file size (and the time required to generate and save the map) by simplifying the
# county outlines.
gdf_counties['geometry'] = gdf_counties['geometry'].simplify(tolerance = 0.005)
# Source: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.simplify.html
gdf_counties.insert(0, 'County/State', gdf_counties['NAMELSAD'] + ', ' + gdf_counties['STUSPS'])
gdf_counties.head()


Unnamed: 0,County/State,STATEFP,COUNTYFP,COUNTYNS,GEOIDFQ,GEOID,NAME,NAMELSAD,STUSPS,STATE_NAME,LSAD,ALAND,AWATER,geometry
0,"Baldwin County, AL",1,3,161527,0500000US01003,1003,Baldwin,Baldwin County,AL,Alabama,6,4117725048,1132887203,"POLYGON ((-88.02858 30.22676, -87.96685 30.235..."
1,"Houston County, AL",1,69,161560,0500000US01069,1069,Houston,Houston County,AL,Alabama,6,1501742235,4795415,"POLYGON ((-85.71209 31.19727, -85.69231 31.210..."
2,"Barbour County, AL",1,5,161528,0500000US01005,1005,Barbour,Barbour County,AL,Alabama,6,2292160151,50523213,"POLYGON ((-85.74803 31.61918, -85.73117 31.629..."
3,"Sumter County, AL",1,119,161585,0500000US01119,1119,Sumter,Sumter County,AL,Alabama,6,2340898915,24634880,"POLYGON ((-88.42145 32.30868, -88.34043 32.991..."
4,"Miller County, AR",5,91,69166,0500000US05091,5091,Miller,Miller County,AR,Arkansas,6,1616257232,36848741,"POLYGON ((-94.04343 33.55158, -94.00037 33.564..."


Since there are far fewer states than there are counties, our state dataset won't require as much processing time as our county dataset. In addition, maps that show only data at the state level will generally take up less storage space. The drawback of these maps, though, is that the state-level data will hide interesting variations in net migration data at the county level. 

In [15]:
gdf_states = geopandas.read_file(
    'shapefiles/cb_2023_us_state_500k/cb_2023_us_state_500k.shp')
gdf_states['geometry'] = gdf_states['geometry'].simplify(
    tolerance = 0.005)
gdf_states.head()

Unnamed: 0,STATEFP,STATENS,GEOIDFQ,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,35,897535,0400000US35,35,NM,New Mexico,0,314198587197,726463919,"POLYGON ((-109.05004 31.33250, -109.04522 36.9..."
1,46,1785534,0400000US46,46,SD,South Dakota,0,196341525171,3387709166,"POLYGON ((-104.05788 44.99761, -104.03969 44.9..."
2,6,1779778,0400000US06,6,CA,California,0,403673296401,20291770234,"MULTIPOLYGON (((-118.60442 33.47855, -118.5386..."
3,21,1779786,0400000US21,21,KY,Kentucky,0,102266598312,2384223544,"MULTIPOLYGON (((-89.41728 36.49901, -89.37637 ..."
4,1,1779775,0400000US01,1,AL,Alabama,0,131185049346,4582326383,"MULTIPOLYGON (((-88.05338 30.50699, -88.03867 ..."


## Merging shape and demographic data tables together:

Creating a table that stores both demographic and state data may make graphing tasks easier. Therefore, we'll merge gdf_counties and df_nm_county together on their state and county ID columns. In order to make this process easier, we'll rename gdf_counties' copies of these ID fields and change their values to integers in the following cell.

In [16]:
# Renaming the ID fields within gdf_counties so that they match the names of their corresponding fields within df_nm_county:
gdf_counties.rename(columns = {'STATEFP':'STATE', 'COUNTYFP':'COUNTY'}, inplace = True)
# df_nm_county's ID fields are stored as integers, so we'll convert these to integers as well: 
# (The fact that the IDs have leading 0s is a dead giveaway that they're currently
# stored as strings, though you could also run gdf_counties.dtypes to confirm this.)
for column in ['STATE', 'COUNTY']:
    gdf_counties[column] = gdf_counties[column].astype('int')  
    
gdf_counties.head()

Unnamed: 0,County/State,STATE,COUNTY,COUNTYNS,GEOIDFQ,GEOID,NAME,NAMELSAD,STUSPS,STATE_NAME,LSAD,ALAND,AWATER,geometry
0,"Baldwin County, AL",1,3,161527,0500000US01003,1003,Baldwin,Baldwin County,AL,Alabama,6,4117725048,1132887203,"POLYGON ((-88.02858 30.22676, -87.96685 30.235..."
1,"Houston County, AL",1,69,161560,0500000US01069,1069,Houston,Houston County,AL,Alabama,6,1501742235,4795415,"POLYGON ((-85.71209 31.19727, -85.69231 31.210..."
2,"Barbour County, AL",1,5,161528,0500000US01005,1005,Barbour,Barbour County,AL,Alabama,6,2292160151,50523213,"POLYGON ((-85.74803 31.61918, -85.73117 31.629..."
3,"Sumter County, AL",1,119,161585,0500000US01119,1119,Sumter,Sumter County,AL,Alabama,6,2340898915,24634880,"POLYGON ((-88.42145 32.30868, -88.34043 32.991..."
4,"Miller County, AR",5,91,69166,0500000US05091,5091,Miller,Miller County,AR,Arkansas,6,1616257232,36848741,"POLYGON ((-94.04343 33.55158, -94.00037 33.564..."


We're now ready to merge our shapefile and net migration data together:

In [17]:
gdf_counties_and_stats = gdf_counties.merge(df_nm_county, 
                                            on = ['STATE', 'COUNTY'])
# Because we did not specify an argument for the 'how' parameter, the default 'inner'
# option will be used. This option will exclude any counties that either (1) aren't present
# in the net migration stats table or (2) don't have a shape defined. This will prevent our 
# mapping code from (1) showing counties with missing net migration data and (2) attempting
# to map a county that doesn't have a corresponding outline.
gdf_counties_and_stats.head()

Unnamed: 0,County/State,STATE,COUNTY,COUNTYNS,GEOIDFQ,GEOID,NAME,NAMELSAD,STUSPS,STATE_NAME,LSAD,ALAND,AWATER,geometry,SUMLEV,REGION,DIVISION,STNAME,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
0,"Baldwin County, AL",1,3,161527,0500000US01003,1003,Baldwin,Baldwin County,AL,Alabama,6,4117725048,1132887203,"POLYGON ((-88.02858 30.22676, -87.96685 30.235...",50,3,6,Alabama,Baldwin County,231768,233227,239439,246531,253507,1459,6212,7092,6976,560,2337,2511,2531,602,2948,3022,2640,-42,-611,-511,-109,11,105,250,291,1613,6972,7036,6804,1624,7077,7286,7095,-123,-254,317,-10,3549,3448,3351,3468,3493,9.888589,10.333971,10.123231,12.473925,12.436982,10.559198,-2.585335,-2.10301,-0.435967,0.444288,1.02887,1.163912,29.500747,28.95652,27.213932,29.945035,29.98539,28.377843,22425,9.675624,96.660305
1,"Houston County, AL",1,69,161560,0500000US01069,1069,Houston,Houston County,AL,Alabama,6,1501742235,4795415,"POLYGON ((-85.71209 31.19727, -85.69231 31.210...",50,3,6,Alabama,Houston County,107202,107284,107470,108047,108462,82,186,577,415,312,1300,1299,1306,316,1579,1535,1349,-4,-279,-236,-43,0,21,64,83,54,385,766,374,54,406,830,457,32,59,-17,1,822,822,822,822,822,12.106876,12.054734,12.064164,14.705198,14.244816,12.461376,-2.598322,-2.190082,-0.397212,0.195573,0.593921,0.766712,3.585498,7.108488,3.454822,3.78107,7.702409,4.221534,1579,1.47292,58.842239
2,"Barbour County, AL",1,5,161528,0500000US01005,1005,Barbour,Barbour County,AL,Alabama,6,2292160151,50523213,"POLYGON ((-85.74803 31.61918, -85.73117 31.629...",50,3,6,Alabama,Barbour County,25229,24969,24533,24700,24585,-260,-436,167,-115,60,270,278,267,92,390,363,350,-32,-120,-85,-83,0,0,2,13,-186,-313,237,-45,-186,-313,239,-32,-42,-3,13,0,2721,2482,2248,2702,2702,10.90865,11.293238,10.83494,15.756939,14.746207,14.203104,-4.848289,-3.452969,-3.368165,0.0,0.081246,0.527544,-12.645954,9.627689,-1.826113,-12.645954,9.708935,-1.29857,-307,-1.216854,22.073791
3,"Sumter County, AL",1,119,161585,0500000US01119,1119,Sumter,Sumter County,AL,Alabama,6,2340898915,24634880,"POLYGON ((-88.42145 32.30868, -88.34043 32.991...",50,3,6,Alabama,Sumter County,12344,12191,11967,11889,11727,-153,-224,-78,-162,28,126,133,114,59,169,165,161,-31,-43,-32,-47,0,6,20,21,-101,-181,-61,-136,-101,-175,-41,-115,-21,-6,-5,0,1080,987,875,952,922,10.431327,11.150235,9.654472,13.991224,13.832998,13.634824,-3.559897,-2.682763,-3.980352,0.49673,1.676727,1.778455,-14.984684,-5.114017,-11.517615,-14.487954,-3.43729,-9.73916,-479,-3.880428,6.424936
4,"Miller County, AR",5,91,69166,0500000US05091,5091,Miller,Miller County,AR,Arkansas,6,1616257232,36848741,"POLYGON ((-94.04343 33.55158, -94.00037 33.564...",50,3,7,Arkansas,Miller County,42599,42588,42494,42608,42415,-11,-94,114,-193,132,499,621,356,139,628,641,573,-7,-129,-20,-217,0,-5,-29,-29,-16,13,174,52,-16,8,145,23,12,27,-11,1,1375,1375,1236,1467,1511,11.729861,14.594252,8.374205,14.762229,15.064276,13.478706,-3.032369,-0.470024,-5.104501,-0.117534,-0.681535,-0.682168,0.305588,4.089211,1.223198,0.188054,3.407675,0.54103,223,0.523486,46.024173


In [18]:
# I'd like to use 'State' as the field name for state names,
# so I'll rename this field within both gdf_states and df_nm_state.
# I'll also rename a pre-existing 'STATE' column within df_nm_state
# so that it won't get confused with the new 'State' field.
gdf_states.rename(columns = {'NAME':'State'}, inplace = True)
df_nm_state.rename(columns = {'STNAME':'State', 
                              'STATE':'State_Code'}, inplace = True)


In [19]:
gdf_states_and_stats = gdf_states.merge(
    df_nm_state, on = 'State')
gdf_states_and_stats.head()

Unnamed: 0,STATEFP,STATENS,GEOIDFQ,GEOID,STUSPS,State,LSAD,ALAND,AWATER,geometry,SUMLEV,REGION,DIVISION,State_Code,COUNTY,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
0,35,897535,0400000US35,35,NM,New Mexico,0,314198587197,726463919,"POLYGON ((-109.05004 31.33250, -109.04522 36.9...",40,4,8,35,0,New Mexico,2117525,2118488,2116950,2113476,2114371,963,-1538,-3474,895,5327,21433,21151,20728,5523,24640,25470,22344,-196,-3207,-4319,-1616,158,2259,5839,3642,702,-1206,-4496,-1088,860,1053,1343,2554,299,616,-498,-43,42841,42792,42993,43017,43081,10.120795,9.999466,9.805464,11.63516,12.041341,10.569919,-1.514365,-2.041875,-0.764455,1.066714,2.760478,1.722863,-0.569481,-2.125554,-0.514683,0.497233,0.634924,1.20818,-6088,-0.287505,47.058824
1,46,1785534,0400000US46,46,SD,South Dakota,0,196341525171,3387709166,"POLYGON ((-104.05788 44.99761, -104.03969 44.9...",40,2,4,46,0,South Dakota,886668,887852,896299,909869,919318,1184,8447,13570,9449,2832,11063,11324,11369,2020,9779,9226,8556,812,1284,2098,2813,65,1100,2781,1788,299,6026,8374,4812,364,7126,11155,6600,8,37,317,36,32154,32122,32077,32214,32218,12.401417,12.539254,12.430659,10.962077,10.216104,9.354976,1.43934,2.32315,3.075683,1.233079,3.079448,1.954967,6.755034,9.27267,5.261354,7.988113,12.352118,7.216321,19511,2.200485,80.392157
2,6,1779778,0400000US06,6,CA,California,0,403673296401,20291770234,"MULTIPOLYGON (((-118.60442 33.47855, -118.5386...",40,4,9,6,0,California,39538212,39503200,39145060,39040616,38965193,-35012,-358140,-104444,-75423,103133,412506,423922,414120,74587,345181,318145,302704,28546,67325,105777,111416,1383,44116,126517,150982,-67932,-458862,-332785,-338371,-66549,-414746,-206268,-187389,2991,-10719,-3953,550,917958,897669,771170,886334,891995,10.489895,10.843981,10.617671,8.777842,8.138191,7.761063,1.712053,2.70579,2.856608,1.121856,3.236322,3.871045,-11.668713,-8.512685,-8.675533,-10.546858,-5.276363,-4.804488,-1197950,-3.029854,5.882353
3,21,1779786,0400000US21,21,KY,Kentucky,0,102266598312,2384223544,"MULTIPOLYGON (((-89.41728 36.49901, -89.37637 ...",40,3,6,21,0,Kentucky,4506297,4508155,4507600,4511563,4526154,1858,-555,3963,14591,12579,51364,52384,52380,12785,58401,61410,54385,-206,-7037,-9026,-2005,122,1757,4436,7627,1010,3216,9400,8965,1132,4973,13836,16592,932,1509,-847,4,125368,125292,116951,120265,121207,11.394276,11.616156,11.591423,12.955321,13.617672,12.035119,-1.561045,-2.001516,-0.443696,0.389762,0.983683,1.687816,0.713418,2.084451,1.983908,1.10318,3.068134,3.671724,22591,0.501321,58.823529
4,1,1779775,0400000US01,1,AL,Alabama,0,131185049346,4582326383,"MULTIPOLYGON (((-88.05338 30.50699, -88.03867 ...",40,3,6,1,0,Alabama,5024294,5031864,5050380,5073903,5108468,7570,18516,23523,34565,13867,57184,58106,58251,15165,69135,67208,59813,-1298,-11951,-9102,-1562,125,1806,4374,5384,9615,27715,28464,30744,9740,29521,32838,36128,-872,946,-213,-1,127914,131372,134653,141654,144014,11.343506,11.478541,11.441539,13.714209,13.276595,11.748344,-2.370702,-1.798053,-0.306805,0.358254,0.864061,1.057514,5.497784,5.622917,6.038672,5.856038,6.486978,7.096186,96538,1.921424,74.509804


In [20]:
df_nm_state.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,State_Code,COUNTY,State,CTYNAME,ESTIMATESBASE2020,POPESTIMATE2020,POPESTIMATE2021,POPESTIMATE2022,POPESTIMATE2023,NPOPCHG2020,NPOPCHG2021,NPOPCHG2022,NPOPCHG2023,BIRTHS2020,BIRTHS2021,BIRTHS2022,BIRTHS2023,DEATHS2020,DEATHS2021,DEATHS2022,DEATHS2023,NATURALCHG2020,NATURALCHG2021,NATURALCHG2022,NATURALCHG2023,INTERNATIONALMIG2020,INTERNATIONALMIG2021,INTERNATIONALMIG2022,INTERNATIONALMIG2023,DOMESTICMIG2020,DOMESTICMIG2021,DOMESTICMIG2022,DOMESTICMIG2023,NETMIG2020,NETMIG2021,NETMIG2022,NETMIG2023,RESIDUAL2020,RESIDUAL2021,RESIDUAL2022,RESIDUAL2023,GQESTIMATESBASE2020,GQESTIMATES2020,GQESTIMATES2021,GQESTIMATES2022,GQESTIMATES2023,RBIRTH2021,RBIRTH2022,RBIRTH2023,RDEATH2021,RDEATH2022,RDEATH2023,RNATURALCHG2021,RNATURALCHG2022,RNATURALCHG2023,RINTERNATIONALMIG2021,RINTERNATIONALMIG2022,RINTERNATIONALMIG2023,RDOMESTICMIG2021,RDOMESTICMIG2022,RDOMESTICMIG2023,RNETMIG2021,RNETMIG2022,RNETMIG2023,2020-2023 Total Domestic Net Migration,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
0,40,3,6,1,0,Alabama,Alabama,5024294,5031864,5050380,5073903,5108468,7570,18516,23523,34565,13867,57184,58106,58251,15165,69135,67208,59813,-1298,-11951,-9102,-1562,125,1806,4374,5384,9615,27715,28464,30744,9740,29521,32838,36128,-872,946,-213,-1,127914,131372,134653,141654,144014,11.343506,11.478541,11.441539,13.714209,13.276595,11.748344,-2.370702,-1.798053,-0.306805,0.358254,0.864061,1.057514,5.497784,5.622917,6.038672,5.856038,6.486978,7.096186,96538,1.921424,74.509804
68,40,4,9,2,0,Alaska,Alaska,733374,732964,734923,733276,733406,-410,1959,-1647,130,2406,9453,9356,9266,1171,5330,6340,5581,1235,4123,3016,3685,63,873,2356,2004,-1663,-2948,-7202,-5543,-1600,-2075,-4846,-3539,-45,-89,183,-16,30285,30262,30654,29833,29815,12.879738,12.744866,12.635322,7.262139,8.636431,7.610375,5.617599,4.108435,5.024947,1.189465,3.209374,2.732699,-4.016658,-9.810659,-7.558557,-2.827193,-6.601285,-4.825859,-17356,-2.366596,13.72549
99,40,4,8,4,0,Arizona,Arizona,7157902,7186683,7272487,7365684,7431344,28781,85804,93197,65660,18110,75693,79137,78494,18025,80276,79807,70792,85,-4583,-670,7702,253,8010,22296,21635,29980,82290,69798,36179,30233,90300,92094,57814,-1537,87,1773,144,166668,165653,150826,161194,162728,10.469896,10.812416,10.609428,11.103819,10.903958,9.568408,-0.633923,-0.091541,1.04102,1.107947,3.046282,2.924236,11.382396,9.536437,4.890036,12.490343,12.58272,7.814272,218247,3.049036,88.235294
115,40,3,7,5,0,Arkansas,Arkansas,3011490,3014348,3028443,3046404,3067732,2858,14095,17961,21328,8509,34927,36298,35566,8475,40180,40362,36473,34,-5253,-4064,-907,94,1346,3215,4096,2537,17806,18841,18106,2631,19152,22056,22202,193,196,-31,33,82490,82455,82617,84268,85120,11.55989,11.95026,11.634023,13.298491,13.288236,11.930713,-1.738601,-1.337976,-0.29669,0.44549,1.058463,1.339846,5.893303,6.202955,5.922668,6.338793,7.261417,7.262514,57290,1.902381,72.54902
191,40,4,9,6,0,California,California,39538212,39503200,39145060,39040616,38965193,-35012,-358140,-104444,-75423,103133,412506,423922,414120,74587,345181,318145,302704,28546,67325,105777,111416,1383,44116,126517,150982,-67932,-458862,-332785,-338371,-66549,-414746,-206268,-187389,2991,-10719,-3953,550,917958,897669,771170,886334,891995,10.489895,10.843981,10.617671,8.777842,8.138191,7.761063,1.712053,2.70579,2.856608,1.121856,3.236322,3.871045,-11.668713,-8.512685,-8.675533,-10.546858,-5.276363,-4.804488,-1197950,-3.029854,5.882353


In [21]:
gdf_counties_and_stats = gdf_counties.merge(df_nm_county, 
                                            on = ['STATE', 'COUNTY'])

In order to create our choropleth map, we'll need to determine which item to use as the key for creating our data. In our case, we'll want to use the 'County/State' column as the key, since it contains unique county names found in both the demographics and county outlines datasets.

However, if we attempt to pass 'County/State' to the `key_on` argument within the following function, we'll get the following error:

`key_on `'County/State'` not found in GeoJSON.`

That may strike you as strange: 'County/State' is definitely one of the columns in our GeoDataFrame. However, note that this code is referring to the *GeoJSON* version of this GeoDataFrame. The start of this file appears as follows:

In [22]:
gdf_counties.to_json()[0:1000]
# to_json() is listed at https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html#geopandas.GeoDataFrame
# 0:1000 limits the output of this string to the first thousand characters.

'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"County/State": "Baldwin County, AL", "STATE": 1, "COUNTY": 3, "COUNTYNS": "00161527", "GEOIDFQ": "0500000US01003", "GEOID": "01003", "NAME": "Baldwin", "NAMELSAD": "Baldwin County", "STUSPS": "AL", "STATE_NAME": "Alabama", "LSAD": "06", "ALAND": 4117725048, "AWATER": 1132887203}, "geometry": {"type": "Polygon", "coordinates": [[[-88.02858, 30.226763], [-87.966847, 30.235618], [-87.936041, 30.261469], [-87.893201, 30.239237], [-87.78775, 30.254244], [-87.768003, 30.262455], [-87.747171, 30.287768], [-87.809266, 30.332702], [-87.82988, 30.353809], [-87.837368, 30.376334], [-87.865017, 30.38345], [-87.906343, 30.40938], [-87.921278, 30.473848], [-87.936969, 30.483105], [-87.901711, 30.550879], [-87.914956, 30.585893], [-87.91253, 30.615795], [-87.919346, 30.63606], [-87.936717, 30.657432], [-87.955989, 30.658862], [-88.0089012750009, 30.6834340138575], [-88.00166, 30.704619], [-88.026319, 30.753358]

Note that the 'County/State' column is preceded by 'Feature' and 'properties.' Meanwhile, the choropleth example [found in the Folium documentation](https://python-visualization.github.io/folium/latest/user_guide/geojson/choropleth.html) uses 'feature.id' as the key_on value. Here's what the beginning of [*that* GeoJson file](https://raw.githubusercontent.com/python-visualization/folium-example-data/main/us_states.json) looks like:

{"type":"FeatureCollection","features":[

{"type":"Feature","id":"AL","properties":{"name":"Alabama"},"geometry":{"type":"Polygon","coordinates":[[[-87.359296,35.00118],

Note that 'id' is the column with state data (as it's a state-level choropleth map), and that 'properties' comes *after* this value (whereas, in our GeoJson file, 'properties' comes *before* our county column). Therefore, we can make an educated guess that 'feature.properties.County/State' is the right value to pass to the key_on column, and thankfully this guess is correct!

([Folium's API reference](https://python-visualization.github.io/folium/latest/reference.html) also hints at this solution by listing 'feature.properties.statename' as an example of a key_on value.

In my experience, figuring out what value to use for the 'key_on' argument is one of the trickier parts of creating choropleth maps. If you're having trouble with this step, my advice would be to always look at the GeoJson version of your GeoDataFrame and see what values precede it, then incorporate those values ('feature', 'properties', etc.) into your entry.

# Part 2: Creating a choropleth map with interactive tooltips

We can now create a choropleth map that shows net migration data by county. However, if we feed our entire gdf_counties_and_stats DataFrame into the mapping code, the resulting HTML file will be much larger than necessary, as it will include a number of columns that aren't actually necessary in creating our map. Therefore, we'll use a condensed version of gdf_counties_and_stats as a basis for this map instead.

In [23]:
shape_col = 'geometry'
county_boundary_name_col = 'County/State'
state_boundary_name_col = 'State'
data_col = total_nm_rate_col

gdf_counties_and_stats_condensed = gdf_counties_and_stats[
[shape_col, county_boundary_name_col, data_col, percentile_col]].copy()

gdf_counties_and_stats_condensed.head()

Unnamed: 0,geometry,County/State,2020-2023 Total Domestic Net Migration as % of 2020 Population,2020-2023 Domestic Net Migration Percentile
0,"POLYGON ((-88.02858 30.22676, -87.96685 30.235...","Baldwin County, AL",9.675624,96.660305
1,"POLYGON ((-85.71209 31.19727, -85.69231 31.210...","Houston County, AL",1.47292,58.842239
2,"POLYGON ((-85.74803 31.61918, -85.73117 31.629...","Barbour County, AL",-1.216854,22.073791
3,"POLYGON ((-88.42145 32.30868, -88.34043 32.991...","Sumter County, AL",-3.880428,6.424936
4,"POLYGON ((-94.04343 33.55158, -94.00037 33.564...","Miller County, AR",0.523486,46.024173


## Creating our bin ranges:

By default, folium.Choropleth() will group regions into 6 colored bins; the ranges from these bins will stretch, in equal segments, from the lowest to the highest value. However, because some counties have very high or low net migration rates, this approach will cause our map to look relatively dull, since most counties will be in one of the middle bins.

Therefore, a better setup will be to base our bin ranges on percentiles, as this will result in a relatively equal number of counties' appearing in each bin. (The bins will no longer be equally sized, but that's fine on my end.) We can easily determine these bins using pandas' quantile() function:

In [24]:
bins = list(gdf_counties_and_stats_condensed[
            total_nm_rate_col].quantile(
    np.linspace(0, 1, 11)))
bins

[-26.57727350868145,
 -2.8124548451811697,
 -1.3842847014410304,
 -0.57942086570913,
 0.10509734758955473,
 0.803350933220706,
 1.5648502030833924,
 2.497913057064161,
 3.8660584451404807,
 5.966706571952004,
 25.59683979436641]

In [25]:
# The following code is based on: 
# https://python-visualization.github.io/folium/latest/user_guide/geojson/choropleth.html
# Note: the 'RdYlGn' (red-yellow-green) fill color 
# argument derives from https://colorbrewer2.org/#type=diverging&scheme=RdYlGn&n=3 ,
# a helpful interactive site referenced within the Folium API reference
# (https://python-visualization.github.io/folium/latest/reference.html#module-folium.features).
# This color palette isn't listed as one of the available palettes within the choropleth
# documentation, but it still works fine.


m = folium.Map([38, -95], zoom_start=6)

folium.Choropleth(
    geo_data=gdf_counties_and_stats_condensed,
    data=gdf_counties_and_stats_condensed,
    bins = bins,
    columns=[county_boundary_name_col, total_nm_rate_col],
    key_on=f'feature.properties.{county_boundary_name_col}',
    fill_color = 'RdYlGn',
    fillOpacity = 0.6
).add_to(m)

# m

<folium.features.Choropleth at 0x1f47ff7b920>

## Adding in tooltips

We have a choropleth map in place, but it's missing something very important: a *tooltip* that will display both the name of a county and its net migration data when the user hovers over it. These tooltips aren't present in the choropleth code by default, but [Folium's documentation](https://python-visualization.github.io/folium/latest/user_guide/geojson/geojson_popup_and_tooltip.html#) demonstrates how to easily add them in:

In [26]:
tooltip = folium.GeoJsonTooltip(
    fields=[county_boundary_name_col, total_nm_rate_col, 
            percentile_col],
    aliases=["County:", 
             f"{first_nm_year}-{last_nm_year} Net Migration \
as % of {first_nm_year} Population:", 'Percentile'],
    localize=True,
    sticky=False,
    labels=True,
    style="""
        background-color: #FFFFFF;
        border: 1px;
        border-radius: 1px;
        box-shadow: 1px;
    """,
    max_width=800
)

# We'll now add these tooltips to our map by linking them
# to an invisible GeoJson object. (The county outlines were
# already present within the map as a result of the choropleth
# mapping code, so there's no need to add in those outlines a 
# second time here.)

g = folium.GeoJson(
    gdf_counties_and_stats_condensed,
    style_function=lambda x: {
        "fillOpacity": 0,
        "weight":0,
    },
    tooltip=tooltip
).add_to(m)

# We'll now go ahead and save the map:

m.save(f'maps/net_migration_rate_county_{first_nm_year}-{last_nm_year}.html')

# m

# Saving a screenshot of this map

The Plotly charting library that we explored earlier made saving static copies of charts simple: we could just call write_image(). Folium doesn't yet have this sort of functionality, but thankfully, creating static copies of its map files is still relatively simple. We'll just need to run some Selenium code that opens a web browser; navigates to our map; and then saves a screenshot of the map to a PNG file. The following code shows how to accomplish these steps.

In [27]:
options = webdriver.ChromeOptions()
# Source: https://www.selenium.dev/documentation/webdriver/browsers/chrome/
options.add_argument('--window-size=3000,1688') # I found that this window
# size, along with a starting zoom of 6 within our mapping code,
# created a relatively detailed map of the contiguous 48 US states. 
# If you'd like to create an even more detailed map, consider setting 
# your starting zoom to 7 and your window size to 6000,3375.
options.add_argument('--headless') # I found that this addition 
# (which prevents the Selenium-driven browser from displaying on your computer) 
# was necessary for getting 4K screenshots to actually display as a 3840x2160
# file. Without this line, the screenshot would get rendered
# as a 3814 x 1868-pixel file.
# Source of the above two lines:  
# https://www.selenium.dev/documentation/webdriver/browsers/chrome/
# and
# https://github.com/GoogleChrome/chrome-launcher/blob/main/docs/chrome-flags-for-tools.md

# Launching our Selenium driver:
driver = webdriver.Chrome(options=options) 
# Source: https://www.selenium.dev/documentation/webdriver/browsers/chrome/

# Navigating to our map:
driver.get(
    os.getcwd() + '/maps/' + f'net_migration_rate_county_{first_nm_year}-{last_nm_year}.html')
# Source: https://www.selenium.dev/documentation/
# Adding in os.getcwd() + '/' converts our relative path (which, by itself,
# wouldn't be compatible with Selenium)
# into an absolute path. (Note that '/' still works on Windows, at least
# in my experience.) 
time.sleep(3) # Helps ensure that the browser has enough time to download
# map contents from the tile provider
# Taking our screenshot and then saving it as a PNG image:
driver.get_screenshot_as_file(
    f'map_screenshots/net_migration_rate_county_{first_nm_year}-{last_nm_year}.png')
# Source: 
# https://selenium-python.readthedocs.io/api.html#selenium.webdriver.remote.webdriver.WebDriver.get_screenshot_as_file

# Exiting out of our webdriver:
driver.quit()
# Source: https://www.selenium.dev/documentation/

We now have a choropleth map that shows additional information when the user hovers over a given county. That's great! However, although this means of creating our map is relatively simple (and a good introduction to the use of Folium), it does have one significant flaw: it results in unnecessarily large file sizes.

On my computer, the first version of the map (with no tooltips) was around 6.04 MB in size. Once tooltips were added in, this map increased to 12.0 MB--a near doubling in size. This increase occurred because we added many items to the underlying HTML file twice, including our county boundaries. 

Now, 12 MB isn't that large of a file, but the larger your maps get, the harder (and potentially more expensive) it is to host them online, and the longer they'll take to load. (For instance, Google Sites currently limits single embedded HTML files to 15 MB in size, so our 12-megabyte map is getting pretty close to that limit. If you're working with zip code-level data, this doubling in size could become a major issue. 

Therefore, I'll now demonstrate a method that bypasses the Choropleth map class and instead allows the same set of borders to be used for tooltips and region colors. The map created with this method will be only 6.04 MB in size (just like the first one), yet it will still have the same interactive tooltips as the map we just created.

# Part 3: Creating a more efficient version of this map

In order to create a choropleth map without relying on Folium's Choropleth class, we'll need to develop a color map that we can use to assign different colors to different net migration values. I borrowed from the Choropleth source code in order to determine how best to accomplish this.

In [28]:
color_range = color_brewer('RdYlBu', n = 10)
# Based on Choropleth() definition within
# https://github.com/python-visualization/folium/blob/main/folium/features.py

# To reverse this color scheme, '_r' can be added to the end of the first
# argument (e.g. 'RdYlBu_r'). 
# Source: https://github.com/python-visualization/branca/blob/main/branca/utilities.py

# The following output shows the 10 colors (in hexadecimal format) 
# that comprise this color range:
color_range

['#a50026',
 '#d73027',
 '#f46d43',
 '#fdae61',
 '#fee090',
 '#e0f3f8',
 '#abd9e9',
 '#74add1',
 '#4575b4',
 '#313695']

### Using this color range to initialize our colormap:

Note that passing our quantile-based bins to the index argument allows the colormap to reference those bins when determining which colors to assign to which values.

In [29]:
stepped_cm = StepColormap(
    colors = color_range, 
    vmin = bins[0], vmax = bins[-1],
    index = bins)
# Based on the self.color_scale initialization within Folium's Choropleth() 
# source code (available at
# https://github.com/python-visualization/folium/blob/main/folium/features.py)

stepped_cm

We can apply this colormap to determine the colors corresponding to a given net migration value as follows:

In [30]:
stepped_cm(gdf_counties_and_stats_condensed.iloc[0][total_nm_rate_col])

'#313695ff'

## Creating our tooltips and choropleth map simultaneously via folium.GeoJson:

In [31]:
# Much of this cell is based on the sample code found at 
# https://python-visualization.github.io/folium/latest/user_guide/geojson/geojson_popup_and_tooltip.html# 
# and https://python-visualization.github.io/folium/latest/user_guide/geojson/geojson.html .

m = folium.Map([40, -95], zoom_start=5)

tooltip = folium.GeoJsonTooltip(
    fields=[county_boundary_name_col, total_nm_rate_col,
           percentile_col],
    aliases=["County:", 
             f"{first_nm_year}-{last_nm_year} Net Migration \
as % of {first_nm_year} Population:", 'Percentile'],
    localize=True,
    sticky=False,
    labels=True,
    style="""
        background-color: #FFFFFF;
        border: 1px;
        border-radius: 1px;
        box-shadow: 1px;
    """,
    max_width=800
)

# We'll now add these tooltips to our map by linking them
# to an invisible GeoJson object. (The county outlines were
# already present within the map as a result of the choropleth
# mapping code, so there's no need to add in those outlines a 
# second time here.)

g = folium.GeoJson(
    gdf_counties_and_stats_condensed,
    style_function=lambda x: {
        "fillColor": stepped_cm(
            x["properties"][total_nm_rate_col]),
        "fillOpacity": 0.6,
        "weight":1,
        "color":"black"
    },
    tooltip=tooltip
).add_to(m)
# The Folium.GeoJSON overview at
# https://python-visualization.github.io/folium/latest/user_guide/geojson/geojson.html
# contributed to this code as well.
# Note that we need to add ["properties"] in between x and 
# [total_nm_rate_col], likely because gdf_counties_and_stats_condensed
# is being interpreted as a GeoJSON object. I based this off of the
# "if "e" in feature["properties"]["name"].lower()" line within
# the above link.

stepped_cm.add_to(m)


m.save(f'maps/net_migration_rate_county_{first_nm_year}-{last_nm_year}.html')

# m

## Converting this second mapping approach into a function

In order to make this code easier to apply to other datasets, I converted it into a function (`cptt()`) by replacing hardcoded values with variables. (This is often how I create functions: I start with a working example that uses built-in values, then make that example more flexible by substituting variable names for those values.) **This function is located within choropleth_map_functions.py; I placed it there so that it could be more easily called within different notebooks.**

I also added code to this function that allows a column's data to be displayed directly within each boundary. This code works much better for state-level data (which we'll graph later) than county-level data, since the latter shapes are too small to accommodate these labels at a nationwide zoom level.

I've found that, if I try to use the same zoom level for both an HTML file and a screenshot, one of the maps will end up showing too much or too little zoom. Therefore, it can be useful to call cptt() twice--once to create a PNG version of a map, and again to create an HTML version. `create_map_and_screenshot()`, also available within choropleth_map_functions.py, makes it easier to perform these two function calls. 

This method is slower than one that doesn't require creating and saving the map twice. Ideally, we could update the zoom level of the map within our code; this would allow us to save an HTML copy of the map for interactive viewing, then update its zoom to better accommodate a screenshot. Even better, we could try using Selenium to adjust the zoom, which would eliminate the need to create two separate HTML maps.

I tried out some code for both of these options, but neither worked successfully. Therefore, I'm sticking for now to the slower, yet reliable approach shown within `create_map_and_screenshot()`.

In [32]:
# Importing the two functions described above:
from choropleth_map_functions import cptt, create_map_and_screenshot

## Applying these functions to create HTML and PNG versions of our net migration by county map with optimized zoom levels:

In [33]:
# Defining data_col_alias outside of the function call to make the latter
# a bit more readable:

data_col_alias = f"{first_nm_year}-{last_nm_year} Net Migration \
as % of {first_nm_year} Population:"

map_filename = f'net_migration_rate_county_{first_nm_year}-{last_nm_year}'

# Because cptt() automatically condenses the GeoDataFrame to include
# only those columns necessary for creating the map and tooltips,
# we can pass the original gdf_counties_and_stats GeoDataFrame as our 
# gdf argument.

m = create_map_and_screenshot(
    starting_lat = 38, starting_lon = -95, 
    html_zoom_start = 5,
    screenshot_zoom_start = 6,
    gdf = gdf_counties_and_stats, 
    data_col = total_nm_rate_col, boundary_name_col = county_boundary_name_col,
    data_col_alias = data_col_alias, boundary_name_alias = 'County:',
    tooltip_variable_list = [percentile_col], 
         tooltip_alias_list = ['Percentile:'],
    percentile_bins = True, bin_count = 10, color_scheme = 'RdYlBu',
        map_filename = map_filename,
        html_map_folder = os.getcwd()+'/maps',
        png_map_folder = os.getcwd()+'/map_screenshots')

['County/State', '2020-2023 Total Domestic Net Migration as % of 2020 Population', '2020-2023 Domestic Net Migration Percentile'] ['County:', '2020-2023 Net Migration as % of 2020 Population:', 'Percentile:']
Generating screenshot.
Removed HTML copy of map.
['County/State', '2020-2023 Total Domestic Net Migration as % of 2020 Population', '2020-2023 Domestic Net Migration Percentile'] ['County:', '2020-2023 Net Migration as % of 2020 Population:', 'Percentile:']


## Calling create_map_and_screenshot() again to create state-level net migration maps in HTML and PNG form:

For demonstration purposes, I set the 'tiles' argument to None in order to create a clean and simple map without any underlying tile data. Since there's nothing to show under the choropleth boundaries, we can then set choropleth_opacity to 1, resulting in a more vivid map.

I also chose to add net migration rates as boundary labels. In order to make the labels a bit easier to read, I switched the color scheme from RdYlBu to RdYlGn.

In [34]:
# Defining data_col_alias outside of the function call to make the latter
# a bit more readable:

# We'll use the same data_col_alias that we created for our county-level
# map

map_filename = f'net_migration_rate_state_{first_nm_year}-{last_nm_year}'

m = create_map_and_screenshot(
    starting_lat = 38, starting_lon = -95, 
    html_zoom_start = 5,
    screenshot_zoom_start= 6, 
    gdf = gdf_states_and_stats, 
    data_col = total_nm_rate_col, 
    boundary_name_col = state_boundary_name_col,
    data_col_alias = data_col_alias, boundary_name_alias = 'State:',
    tooltip_variable_list = [percentile_col], 
         tooltip_alias_list = ['Percentile:'],
    percentile_bins = True, bin_count = 10, color_scheme = 'RdYlGn',
        map_filename = map_filename, tiles = None,
        choropleth_opacity = 1,
        add_boundary_labels = True, 
        boundary_label_col = total_nm_rate_col,
        round_boundary_labels = True,
        boundary_label_round_val = 2,
        html_map_folder = os.getcwd()+'/maps',
        png_map_folder = os.getcwd()+'/map_screenshots')
 
# m

['State', '2020-2023 Total Domestic Net Migration as % of 2020 Population', '2020-2023 Domestic Net Migration Percentile'] ['State:', '2020-2023 Net Migration as % of 2020 Population:', 'Percentile:']
Generating screenshot.
Removed HTML copy of map.
['State', '2020-2023 Total Domestic Net Migration as % of 2020 Population', '2020-2023 Domestic Net Migration Percentile'] ['State:', '2020-2023 Net Migration as % of 2020 Population:', 'Percentile:']


That's it for this notebook! I would have liked to keep this section simpler, but I hope that you'll find the functions shown at the end useful for your own mapping projects as well.

In [35]:
program_end_time = time.time()
run_time = program_end_time - program_start_time
print(f"Finished running program in {round(run_time, 3)} seconds.")

Finished running program in 47.485 seconds.
