This analysis investigates whether eviction rates are higher near Purple Line light rail stations compared to other areas in Prince George’s and Montgomery Counties, Maryland. The research builds on prior tabular analyses by incorporating spatial methods to examine how proximity to transit infrastructure may correlate with housing instability. Using address-level eviction data from Maryland Judiciary Case Search and geospatial data for planned Line station locations, I conduct a proximity analysis to classify eviction events as either "near" or "far" from Purple Line stations (within 1 mile and beyond 1 mile, respectively). Eviction density is calculated using kernel density estimation and aggregated to Census tracts for comparison with demographic and housing variables.

In [2]:
import pandas as pd
import geopandas as gpd
import utils
import exercise03
import census_geocode

%load_ext autoreload
%autoreload 2

In [3]:
# Load warrants and make sure zip codes are stored as strings without decimals
warrants_df = pd.read_csv('md_eviction_warrents_through_2024.csv')
warrants_df['TenantZipCode'] = warrants_df['TenantZipCode'].astype('Int64').astype('string')
len(warrants_df) # How many warrants are we working with?

411040

In [4]:
# Rather than geocoding 400K+ addresses, could we get only the unique ones?
geocode_input_df = exercise03.prep_warrants_for_geocoding(warrants_df)

411040 warrants input
Reduced to 167949 unique addresses


In [5]:
# The Census Geocoder API can only accept up to 10K rows at a time, so we have to break
# our dataframe into chunks

# Split into dataframes with less than 10K rows each
geocode_input_dfs = utils.chunk_dataframe(geocode_input_df, 9999)

# Save each dataframe as a CSV without a header
utils.save_dfs_to_csv(geocode_input_dfs, 'geocode_inputs', header=False)

split dataframe into 17 chunks


In [6]:
# Geocode addresses with the Census Geocoder (set test=True to process only one file)
census_geocode.geocode_csvs('geocode_inputs', 'geocode_outputs', test=True)

TEST MODE: Processing only one file.
Processing file: geocode_inputs/df_14.csv
Saved results to: geocode_outputs/geocoderesult_df_14.csv


In [7]:
# Recombine outputs from geocoder into a single dataframe
geocode_output_df = exercise03.combine_census_geocoded_csvs('geocode_outputs')
len(geocode_output_df)

9999

In [8]:
# Merge geocoded address back onto the inputs with separate fields for address, city, state, and zip
geocoded_df = geocode_input_df.merge(geocode_output_df, left_index=True, right_index=True)
len(geocoded_df)

9999

In [9]:
# Use address, city, state, and zip columns to join geocodes onto original warrant records
warrants_df = warrants_df.merge(geocoded_df, on=['TenantAddress','TenantCity','TenantState','TenantZipCode'])
len(warrants_df)

15112

In [10]:
# Convert warrants into a geodataframe with points
warrants_gdf = utils.lonlat_str_to_geodataframe(warrants_df, 'match_lon_lat')

In [11]:
# What proportion of records have points?
len(warrants_gdf[warrants_gdf.lon.notnull()]) / len(warrants_gdf)

0.948782424563261

In [12]:
# What proportion of records have exact geocode matches?
len(warrants_gdf[warrants_gdf.match_type == 'Exact']) / len(warrants_gdf)

0.5974060349391213

In [13]:
!pip install pyarrow
warrants_gdf.to_parquet('md_eviction_warrants_through_2024.geoparquet')



In [14]:
gdf = gpd.read_parquet('md_eviction_warrants_through_2024.geoparquet')

In [15]:
gdf.columns.tolist()

['ID',
 'EventDate',
 'EventType',
 'EventComment',
 'County',
 'Location',
 'TenantAddress',
 'TenantCity',
 'TenantState',
 'TenantZipCode',
 'CaseType',
 'CaseNumber',
 'EvictedDate',
 'Source',
 'SourceDate',
 'Year',
 'EvictionYear',
 'unique_id',
 'input_address',
 'match_status',
 'match_type',
 'match_address',
 'match_lon_lat',
 'match_tiger_line_id',
 'match_tiger_line_side',
 'lon',
 'lat',
 'geometry']

In [60]:
#0: Filter eviction cases to only those in PG and Montgomery Counties
# Normalize County column for safe matching
warrants_gdf['County'] = warrants_gdf['County'].str.lower().str.strip()

# Filter for Prince George's and Montgomery counties
pg_mo_evictions = warrants_gdf[
    warrants_gdf['County'].str.contains("prince george") |
    warrants_gdf['County'].str.contains("montgomery")
].copy()

In [61]:
#1: Load Purple Line station data
purple_line_stations = gpd.read_file('Purple_Line_stations/Purple_Line_stattions.shp')

In [62]:
#2: Create a 1-mile buffer around each station to define “near” area

# Reproject to a projected CRS appropriate for Maryland (units = meters)
buffers = purple_line_stations.to_crs(epsg=2248)

# Buffer 1 mile (1,609.34 meters)
buffers['geometry'] = buffers.buffer(1609.34)

In [63]:
#3: Spatial join - is eviction inside buffer?
pg_mo_evictions = pg_mo_evictions.to_crs(epsg=2248)
joined = pg_mo_evictions.sjoin(buffers, how='left', predicate='within')
joined['near_pl_station'] = joined['index_right'].notnull()
pg_mo_evictions = joined.drop(columns=['index_right'])

In [64]:
#4: Compare eviction rates
near_pct = pg_mo_evictions['near_pl_station'].mean()
print(f"Eviction filings near Purple Line stations: {near_pct:.2%}")
print(f"Eviction filings elsewhere: {(1 - near_pct):.2%}")

Eviction filings near Purple Line stations: 1.25%
Eviction filings elsewhere: 98.75%


This doesn't really get at the question I was really trying to ask. Below is additional analysis that asks: Are eviction rates higher in census tracts that intersect a 1-mile buffer around Purple Line stations compared to those that don’t?

In [None]:
pip install censusdis geopandas pyarrow

In [None]:
#1: Pull tracts + renter data from ACS
from censusdis.data import download
import geopandas as gpd
import os

# Set your Census API key (replace with your own)
os.environ["CENSUS_API_KEY"] = "your_api_key_here"

# Download tracts for PG and Montgomery counties (FIPS: 033 and 031)
tracts = download(
    source="acs/acs5",
    vintage=2022,
    geography="tract",
    variables=["B25003_001E", "B25003_003E"],  # Total and renter-occupied units
    state="24",  # Maryland
    county=["033", "031"],  # PG and Montgomery
    with_geometry=True
)

# Rename for clarity
tracts = tracts.rename(columns={
    "B25003_001E": "total_housing_units",
    "B25003_003E": "renter_occupied_units"
})

# Eviction rate denominator
tracts["pct_renter"] = tracts["renter_occupied_units"] / tracts["total_housing_units"]
tracts = tracts.to_crs(epsg=2248)  # Match your projected CRS


In [None]:
#2: Create 1-mile buffers
purple_line_stations = gpd.read_file('Purple_Line_stations/Purple_Line_stattions.shp')
buffers = purple_line_stations.to_crs(epsg=2248)
buffers['geometry'] = buffers.buffer(1609.34)

In [None]:
#3: Flag tracts that intersect with buffer
tracts['near_pl_station'] = tracts.intersects(buffers.unary_union)

In [None]:
#4: Spatial join: assign evictions to tracts
# Make sure evictions are projected correctly
pg_mo_evictions = pg_mo_evictions.to_crs(epsg=2248)

# Spatial join: which tract each eviction falls in
evictions_with_tracts = gpd.sjoin(pg_mo_evictions, tracts, how='left', predicate='within')

# Count evictions per tract
eviction_counts = evictions_with_tracts.groupby('GEOID').size().reset_index(name='eviction_count')

# Merge back into tract GeoDataFrame
tracts = tracts.merge(eviction_counts, on='GEOID', how='left')
tracts['eviction_count'] = tracts['eviction_count'].fillna(0)

# Calculate eviction rate per 1,000 renter households
tracts['eviction_rate_per_1k_renters'] = (tracts['eviction_count'] / tracts['renter_occupied_units']) * 1000

In [None]:
#5: Boxplot comparing eviction rates by proximity
import matplotlib.pyplot as plt

tracts.boxplot(
    column='eviction_rate_per_1k_renters',
    by='near_pl_station',
    figsize=(8, 6)
)
plt.title('Eviction Rates per 1,000 Renters\nNear vs. Not Near Purple Line')
plt.suptitle('')
plt.xlabel('Near Purple Line Station (1-mile buffer)')
plt.ylabel('Eviction Rate (per 1,000 renters)')
plt.show()


In [None]:
#6: Map eviction rates
tracts.plot(
    column='eviction_rate_per_1k_renters',
    cmap='Reds',
    legend=True,
    figsize=(10, 10),
    edgecolor='grey'
)
plt.title('Eviction Rate per 1,000 Renters by Census Tract')
plt.axis('off')
plt.show()