# Texas County-to-County Migration Flows with pytidycensus

This notebook demonstrates how to visualize migration flows using data retrieved from the Census Bureau's migration flows API via pytidycensus.

We'll use the `get_flows()` function to retrieve Texas county-to-county migration data for 2018, and visualize it using lonboard's BrushingExtension for interactive exploration.

This approach mirrors the [deck.gl brushing extension example](https://deck.gl/examples/brushing-extension) but uses Census API data directly through pytidycensus.

## Dependencies

You'll need the following packages:
```bash
pip install pytidycensus geopandas lonboard matplotlib numpy pandas pyarrow shapely
```

## Imports

In [1]:
import geopandas as gpd
import numpy as np
import pandas as pd
import pyarrow as pa
import shapely
from matplotlib.colors import Normalize

from lonboard import Map, ScatterplotLayer
from lonboard.experimental import ArcLayer
from lonboard.layer_extension import BrushingExtension
from lonboard._geoarrow.geopandas_interop import geopandas_to_geoarrow
import pytidycensus as tc

## Retrieve Migration Flows Data

Use pytidycensus to get Texas county-to-county migration flows for 2018.
The `geometry=True` parameter ensures we get centroid coordinates for mapping,
and `output="wide"` gives us a format suitable for building origin-destination pairs.

In [2]:
# Retrieve Texas county migration flows with geometry
flows_geo = tc.get_flows(
    geography="county",
    state="TX",
    year=2018,
    geometry=True,
    output="wide"
)

# Display the first few rows to understand the structure
flows_geo.head()



Unnamed: 0,GEOID1,GEOID2,FULL1_NAME,FULL2_NAME,MOVEDIN,MOVEDIN_M,MOVEDOUT,MOVEDOUT_M,MOVEDNET,MOVEDNET_M,centroid1,centroid2
0,48001,,"Anderson County, Texas",Africa,38,52.0,,,,,POINT (-95.65236 31.81326),
1,48001,,"Anderson County, Texas",Asia,4,6.0,,,,,POINT (-95.65236 31.81326),
2,48001,,"Anderson County, Texas",Central America,2,3.0,,,,,POINT (-95.65236 31.81326),
3,48001,1089.0,"Anderson County, Texas","Madison County, Alabama",13,20.0,0.0,28.0,13.0,20.0,POINT (-95.65236 31.81326),POINT (-86.55022579567611 34.762959383197796)
4,48001,2016.0,"Anderson County, Texas","Aleutians West Census Area, Alaska",0,31.0,7.0,9.0,-7.0,9.0,POINT (-95.65236 31.81326),POINT (-173.77316055538125 52.983281670565866)


## Explore the Data Structure

Let's examine the columns and understand what we're working with:

In [3]:
print("Columns:", flows_geo.columns.tolist())
print("\nShape:", flows_geo.shape)
print("\nSample record:")
flows_geo.iloc[0]

Columns: ['GEOID1', 'GEOID2', 'FULL1_NAME', 'FULL2_NAME', 'MOVEDIN', 'MOVEDIN_M', 'MOVEDOUT', 'MOVEDOUT_M', 'MOVEDNET', 'MOVEDNET_M', 'centroid1', 'centroid2']

Shape: (36641, 12)

Sample record:


GEOID1                                                48001
GEOID2                                                 None
FULL1_NAME                           Anderson County, Texas
FULL2_NAME                                           Africa
MOVEDIN                                                  38
MOVEDIN_M                                              52.0
MOVEDOUT                                                NaN
MOVEDOUT_M                                              NaN
MOVEDNET                                                NaN
MOVEDNET_M                                              NaN
centroid1     POINT (-95.65236082688608 31.813262871603268)
centroid2                                               NaN
Name: 0, dtype: object

## Extract Centroids from Geometry

We need to extract the centroid coordinates for each county to use as positions for our visualization:

In [4]:
# Extract centroids from the centroid1 and centroid2 columns
# Filter to only Texas-to-Texas flows (both GEOID1 and GEOID2 start with '48')
tx_flows = flows_geo[
    (flows_geo['GEOID2'].notna()) &
    (flows_geo['GEOID2'].str.startswith('48', na=False)) &
    (flows_geo['centroid1'].notna()) &
    (flows_geo['centroid2'].notna())
].copy()

print(f"Filtered to {len(tx_flows)} Texas-to-Texas flow records")

# Create a lookup dictionary for quick access to county info by GEOID
# We'll use both GEOID1 and GEOID2 to build a complete county list
county_lookup = {}

for idx, row in tx_flows.iterrows():
    # Add origin county
    if row['GEOID1'] not in county_lookup and row['centroid1'] is not None:
        county_lookup[row['GEOID1']] = {
            'name': row['FULL1_NAME'],
            'centroid': [row['centroid1'].x, row['centroid1'].y, 0],
            'geoid': row['GEOID1']
        }

    # Add destination county
    if row['GEOID2'] not in county_lookup and row['centroid2'] is not None:
        county_lookup[row['GEOID2']] = {
            'name': row['FULL2_NAME'],
            'centroid': [row['centroid2'].x, row['centroid2'].y, 0],
            'geoid': row['GEOID2']
        }

print(f"Loaded {len(county_lookup)} unique Texas counties")

Filtered to 14206 Texas-to-Texas flow records
Loaded 253 unique Texas counties


## Build Origin-Destination Arrays

Now we'll process the flows data to create arrays for:
- **arcs**: the migration flow connections between counties
- **sources**: origin counties (where people moved from)
- **targets**: destination counties (where people moved to)

This mirrors the structure from the original migration.ipynb example.

In [5]:
arcs = []
targets = []
sources = []
pairs = {}

# Group by origin county to calculate flows
for geoid1, group in tx_flows.groupby('GEOID1'):
    if geoid1 not in county_lookup:
        continue
    
    origin_county = county_lookup[geoid1]
    origin_centroid = origin_county['centroid']
    
    total_value = {
        'gain': 0,
        'loss': 0,
    }
    
    # Process each destination county
    for idx, row in group.iterrows():
        geoid2 = row['GEOID2']
        
        # Skip if destination is not in lookup or is the same county
        if geoid2 not in county_lookup or geoid2 == geoid1:
            continue
        
        destination_county = county_lookup[geoid2]
        destination_centroid = destination_county['centroid']
        
        # Get the migration flow value
        # MOVEDNET = MOVEDIN - MOVEDOUT (net migration into GEOID1 from GEOID2)
        # Use MOVEDNET if available, otherwise calculate it
        if pd.notna(row.get('MOVEDNET')):
            value = row['MOVEDNET']
        else:
            movedin = row.get('MOVEDIN', 0) if pd.notna(row.get('MOVEDIN')) else 0
            movedout = row.get('MOVEDOUT', 0) if pd.notna(row.get('MOVEDOUT')) else 0
            value = movedin - movedout
        
        if value > 0:
            total_value['gain'] += value
        else:
            total_value['loss'] += value
        
        # Filter out small flows to reduce clutter
        if abs(value) < 50:
            continue
        
        # Create unique pair key to avoid duplicate arcs
        pair_key = '-'.join(sorted([geoid1, geoid2]))
        gain = np.sign(value)
        
        # Add source point
        sources.append({
            'position': destination_centroid,
            'target': origin_centroid,
            'name': destination_county['name'],
            'radius': 3,
            'gain': -gain,
        })
        
        # Eliminate duplicate arcs
        if pair_key in pairs:
            continue
        
        pairs[pair_key] = True
        
        # Add arc based on direction of flow
        if gain > 0:
            arcs.append({
                'target': origin_centroid,
                'source': destination_centroid,
                'value': value,
            })
        else:
            arcs.append({
                'target': destination_centroid,
                'source': origin_centroid,
                'value': value,
            })
    
    # Add target point
    targets.append({
        **total_value,
        'position': [origin_centroid[0], origin_centroid[1], 10],
        'net': total_value['gain'] + total_value['loss'],
        'name': origin_county['name'],
    })

# Sort targets by radius (largest net migration first)
targets = sorted(targets, key=lambda d: abs(d['net']), reverse=True)

# Handle case where there are no targets
if len(targets) > 0 and targets[0]['net'] != 0:
    normalizer = Normalize(0, abs(targets[0]['net']))
else:
    normalizer = Normalize(0, 1)

print(f"Created {len(arcs)} arcs, {len(sources)} sources, and {len(targets)} targets")

Created 1593 arcs, 3186 sources, and 253 targets


## Define Color Scheme

We'll use red for outward migration and blue for inward migration:

In [6]:
# migrate out (red)
SOURCE_COLOR = [166, 3, 3]
# migrate in (blue)
TARGET_COLOR = [35, 181, 184]
# Combine into a single array to use as a lookup table
COLORS = np.vstack(
    [np.array(SOURCE_COLOR, dtype=np.uint8), np.array(TARGET_COLOR, dtype=np.uint8)],
)
SOURCE_LOOKUP = 0
TARGET_LOOKUP = 1

## Configure Brushing Extension

The BrushingExtension allows interactive exploration by highlighting flows near the cursor:

In [7]:
brushing_extension = BrushingExtension()
brushing_radius = 100000  # 100km radius in meters

## Create Source Layer

The source layer represents origin counties in the migration flows:

In [8]:
# Convert sources list to GeoDataFrame
if len(sources) == 0:
    print("Warning: No source points found. Check the flow data and threshold values.")
    source_layer = None
else:
    # Extract positions as a proper 2D array
    source_positions_list = [source['position'] for source in sources]
    source_arr = np.array(source_positions_list)
    
    # Create shapely points from x, y coordinates (ignoring z)
    source_positions = shapely.points(source_arr[:, 0], source_arr[:, 1])
    source_gdf = gpd.GeoDataFrame(
        pd.DataFrame.from_records(sources)[['name', 'radius', 'gain']],
        geometry=source_positions,
        crs='EPSG:4326',
    )

    # Apply colors based on gain/loss
    source_colors_lookup = np.where(source_gdf['gain'] > 0, TARGET_LOOKUP, SOURCE_LOOKUP)
    source_fill_colors = COLORS[source_colors_lookup]

    # Create ScatterplotLayer for sources
    source_layer = ScatterplotLayer.from_geopandas(
        source_gdf,
        get_fill_color=source_fill_colors,
        radius_scale=3000,
        pickable=False,
        extensions=[brushing_extension],
        brushing_radius=brushing_radius,
    )
    print(f"Created source layer with {len(source_gdf)} points")

Created source layer with 3186 points


## Create Target Layer

The target layer represents destination counties, shown as rings with colors indicating net migration:

In [9]:
# Convert targets list to GeoDataFrame
if len(targets) == 0:
    print("Warning: No target points found. Check the flow data and threshold values.")
    target_ring_layer = None
else:
    # Extract positions as a proper 2D array
    targets_positions_list = [target['position'] for target in targets]
    targets_arr = np.array(targets_positions_list)
    
    # Create shapely points from x, y, z coordinates (ignoring z for shapely)
    target_positions = shapely.points(targets_arr[:, 0], targets_arr[:, 1])
    target_gdf = gpd.GeoDataFrame(
        pd.DataFrame.from_records(targets)[['name', 'gain', 'loss', 'net']],
        geometry=target_positions,
        crs='EPSG:4326',
    )

    # Apply colors based on net migration
    target_line_colors_lookup = np.where(
        target_gdf['net'] > 0,
        TARGET_LOOKUP,
        SOURCE_LOOKUP,
    )
    target_line_colors = COLORS[target_line_colors_lookup]

    # Create ScatterplotLayer for targets (rings)
    target_ring_layer = ScatterplotLayer.from_geopandas(
        target_gdf,
        get_line_color=target_line_colors,
        radius_scale=4000,
        pickable=True,
        stroked=True,
        filled=False,
        line_width_min_pixels=2,
        extensions=[brushing_extension],
        brushing_radius=brushing_radius,
    )
    print(f"Created target layer with {len(target_gdf)} points")

Created target layer with 253 points


## Create Arc Layer with BrushingExtension

The arc layer draws curved lines showing migration flows between counties.

**Important:** To use the BrushingExtension with ArcLayer, we need to convert our data to GeoArrow format. GeoArrow is an Arrow-native encoding of geospatial data that includes proper extension metadata. This metadata is required for the BrushingExtension to work correctly.

We do this by:
1. Creating GeoDataFrames for source and target positions
2. Converting them to GeoArrow using `geopandas_to_geoarrow()`
3. Extracting the geometry columns for `get_source_position` and `get_target_position`

With BrushingExtension enabled, the arcs will respond to hover interactions on the map!

In [10]:
# Convert arcs to GeoDataFrames for proper GeoArrow conversion
if len(arcs) == 0:
    print("Warning: No arcs found. Check the flow data and threshold values.")
    arc_layer = None
else:
    # Create GeoDataFrames for source and target positions
    arc_source_points = [shapely.Point(arc['source'][:2]) for arc in arcs]
    arc_target_points = [shapely.Point(arc['target'][:2]) for arc in arcs]
    
    arc_source_gdf = gpd.GeoDataFrame(
        {'value': [arc['value'] for arc in arcs]},
        geometry=arc_source_points,
        crs='EPSG:4326'
    )
    
    arc_target_gdf = gpd.GeoDataFrame(
        {'value': [arc['value'] for arc in arcs]},
        geometry=arc_target_points,
        crs='EPSG:4326'
    )
    
    # Convert to GeoArrow format (required for BrushingExtension to work)
    arc_source_table = geopandas_to_geoarrow(arc_source_gdf)
    arc_target_table = geopandas_to_geoarrow(arc_target_gdf)
    
    # Create a PyArrow table for non-geometry attributes
    value_array = np.array([arc['value'] for arc in arcs])
    attr_table = pa.table({'value': value_array})
    
    # Create ArcLayer with BrushingExtension
    arc_layer = ArcLayer(
        table=attr_table,
        get_source_position=arc_source_table['geometry'],
        get_target_position=arc_target_table['geometry'],
        get_source_color=SOURCE_COLOR,
        get_target_color=TARGET_COLOR,
        get_width=1,
        opacity=0.4,
        pickable=False,
        extensions=[brushing_extension],
        brushing_radius=brushing_radius,
    )
    print(f"Created arc layer with {len(arcs)} arcs and BrushingExtension enabled")

Created arc layer with 1593 arcs and BrushingExtension enabled


## Create Interactive Map with BrushingExtension

Now we combine all layers into an interactive map. All three layers have the BrushingExtension enabled, which means:

- **Hover Interaction**: Move your cursor over the map to see migration flows near your pointer
- **Brushing Radius**: Only arcs and points within the `brushing_radius` (100km) of your cursor will be displayed
- **Dynamic Filtering**: As you move your cursor, the visible flows update in real-time

**Interpretation:**
- **Red arcs/points**: Outward migration (people leaving)
- **Blue arcs/points**: Inward migration (people arriving)
- **Ring size**: Indicates magnitude of net migration
- **Ring color**: Red for net loss, blue for net gain

**Tip:** Hover over different areas of Texas to explore local migration patterns!

In [15]:

map_ = Map([source_layer, target_ring_layer, arc_layer], picking_radius=10)
map_


Map(custom_attribution='', layers=(ScatterplotLayer(brushing_radius=100000.0, extensions=(BrushingExtension(),…

## Analysis: Top Migration Counties

Let's examine which Texas counties had the largest net migration gains and losses:

In [12]:
# Top 10 counties by net gain
if 'target_gdf' in locals() and len(target_gdf) > 0:
    print("Top 10 Counties by Net Migration Gain:")
    print(target_gdf.nlargest(10, 'net')[['name', 'gain', 'loss', 'net']])

    print("\nTop 10 Counties by Net Migration Loss:")
    print(target_gdf.nsmallest(10, 'net')[['name', 'gain', 'loss', 'net']])
else:
    print("No target data available for analysis.")

Top 10 Counties by Net Migration Gain:
                        name     gain    loss      net
2       Denton County, Texas  22222.0 -4351.0  17871.0
3       Brazos County, Texas  10653.0 -2834.0   7819.0
4   Williamson County, Texas  11431.0 -4030.0   7401.0
6      Lubbock County, Texas   9635.0 -2580.0   7055.0
7         Hays County, Texas   7992.0 -2081.0   5911.0
13  Montgomery County, Texas   9303.0 -5242.0   4061.0
14   Fort Bend County, Texas   9010.0 -5120.0   3890.0
16     Liberty County, Texas   5241.0 -1493.0   3748.0
17   Galveston County, Texas   6047.0 -2306.0   3741.0
18     Kaufman County, Texas   5109.0 -1807.0   3302.0

Top 10 Counties by Net Migration Loss:
                     name     gain     loss      net
0    Dallas County, Texas   4018.0 -33712.0 -29694.0
1    Harris County, Texas   5075.0 -30332.0 -25257.0
5    Travis County, Texas   9580.0 -16721.0  -7141.0
8   El Paso County, Texas   2492.0  -7292.0  -4800.0
9   Cameron County, Texas   1063.0  -5842.0  -4779.

## Summary

This notebook demonstrated how to:
1. Retrieve migration flows data from the Census API using pytidycensus
2. Process the data into origin-destination arrays
3. Create an interactive visualization using lonboard's BrushingExtension
4. Analyze migration patterns in Texas counties

The BrushingExtension makes it easy to explore local migration patterns by hovering over different regions of the map.

### Next Steps
- Try different states or years
- Adjust the `brushing_radius` to change the hover area
- Modify the flow threshold (currently 50) to show more or fewer connections
- Add demographic breakdowns using the `breakdown` parameter in `get_flows()`