> # **Analysis of Eviction Patterns in Maryland (2022–2024)**

<p align="center">
<strong>Michael Bochynski</strong><br>
<em>URSP688Y: Data Science & Smart Cities</em><br>
Professor Chester Harvey<br>
April 7, 2025
</p>


# Abstract

This study examines eviction patterns across Census Tracts using descriptive statistics. By calculating key metrics such as mean, median, quartiles, and standard deviation, we quantify disparities in eviction counts across geographic areas. The analysis highlights significant variability, with certain Census Tracts experiencing extreme eviction rates while others remain largely unaffected. The findings provide a statistical foundation for understanding eviction trends without integrating socioeconomic or spatial visualization components.

**Research Question:**
How do eviction counts vary across Census Tracts, and what do statistical patterns reveal about the distribution of evictions?

# Methodology

This study examines eviction trends using Census Tract-level eviction counts alongside statewide eviction warrant records for Maryland (2022–2024). The dataset originates from the District Court of Maryland and Department of Housing and Community Development (DHCD), and includes address-level eviction filings, which were processed in a structured directory to ensure compliance with data privacy best practices.

The analysis employs Python and key libraries such as pandas, geopandas, matplotlib, and folium, facilitating geospatial and statistical operations. Due to ethical data considerations, address-level records were not publicly stored or committed to version control. Additionally, AI-assisted tools—including ChatGPT and Copilot—were consulted throughout the coding process for debugging, optimization, and methodological refinement.

# Preliminary Data Preparation
To prepare the Maryland eviction warrant records through 2024 **(N=411,040)** for geocoding and spatial analysis, addresses were geocoded using the Census Geocoder to enable accurate mapping, spatial joins, and geographic analysis.

Steps taken:

- Loaded the eviction warrants dataset and ensured ZIP codes were properly formatted as strings.
- Extracted 167,949 unique addresses from the 411,040 records to minimize redundant geocoding.
- Split the addresses into 17 CSV files (each <10,000 rows) for batch geocoding.
- Used the Census Geocoder API to geocode all address chunks and recombined results.
- Merged geocoded results back onto the full eviction records using address fields.
- Converted the resulting dataset to a GeoDataFrame with lon, lat, and geometry columns.
- 94.6% of records received valid geocodes, and 56.0% were exact matches.

In [1]:
# Import necessary modules and libraries
import pandas as pd
import geopandas as gpd
import utils
import census_geocode
from census_geocode import geocode_csvs
import exercise03
from exercise03 import prep_warrants_for_geocoding
import os
import re
from shapely.geometry import Point

# Enable autoreloading of modules to reflect changes automatically.
%load_ext autoreload
%autoreload 2

In [None]:
# Load warrants and make sure zip codes are stored as strings without decimals
warrants_df = pd.read_csv('md_eviction_warrants_through_2024.csv')
warrants_df['TenantZipCode'] = warrants_df['TenantZipCode'].astype('Int64').astype('string')
len(warrants_df)

# Prepare unique addresses for geocoding
geocode_input_df = exercise03.prep_warrants_for_geocoding(warrants_df)

# Split dataframe into smaller chunks (sub-dataframes) with fewer than 10,000 rows each
geocode_input_dfs = utils.chunk_dataframe(geocode_input_df, 9999)

# Save each dataframe as a CSV without a header
utils.save_dfs_to_csv(geocode_input_dfs, 'geocode_inputs', header=False)

# Geocode addresses with the Census Geocoder
census_geocode.geocode_csvs('geocode_inputs', 'geocode_outputs')

# Recombine outputs from geocoder into a single dataframe
geocode_output_df = exercise03.combine_census_geocoded_csvs('geocode_outputs')
len(geocode_output_df)

# Merge geocoded address back onto the inputs with separate fields for address, city, state, and zip
geocoded_df = geocode_input_df.merge(geocode_output_df, left_index=True, right_index=True)
len(geocoded_df)

# Use address, city, state, and zip columns to join geocodes onto original warrant records
warrants_df = warrants_df.merge(geocoded_df, on=['TenantAddress','TenantCity','TenantState','TenantZipCode'])
len(warrants_df)

# Convert warrants into a geodataframe with points
warrants_gdf = utils.lonlat_str_to_geodataframe(warrants_df, 'match_lon_lat')

# Calculate proportion of records that received a valid geocode
len(warrants_gdf[warrants_gdf.lon.notnull()]) / len(warrants_gdf)

# Calculate proportion of records with exact geocode matches
len(warrants_gdf[warrants_gdf.match_type == 'Exact']) / len(warrants_gdf)

warrants_gdf.to_parquet('md_eviction_warrants_through_2024.geoparquet')

gdf = gpd.read_parquet('md_eviction_warrants_through_2024.geoparquet')

411040 warrants input
Reduced to 167949 unique addresses
split dataframe into 17 chunks
Processing file: geocode_inputs\df_0.csv
Saved results to: geocode_outputs\geocoderesult_df_0.csv
Processing file: geocode_inputs\df_1.csv


# Section 2: Assessing Eviction Records Geocoding Accuracy Bias

The geocoding accuracy of the Maryland eviction warrant dataset was assessed and potential biases related to eviction types and counties were examined. The analysis revealed significant biases in geocoding accuracy related to both eviction types and counties. These biases suggest that urban areas (Baltimore City, Montgomery, and Prince George's) and certain types of eviction events tend to have higher geocoding accuracy, whereas rural areas and other eviction types face more challenges with accurate geocoding.

## Analysis Summary

1. **No Missing Data**: No missing data were detected in critical columns, ensuring that the dataset was complete for analysis.
2. **Geometry Handling**: Missing geometries were successfully addressed by converting latitudinal and longitudinal coordinates into point geometries for all records, ensuring valid geospatial data.
3. **Geocoding Accuracy**: The analysis showed varying levels of geocoding accuracy across different eviction types and counties:
   - Eviction types such as "Warrant of Restitution - Return of Service - Expired" had slightly lower accuracy, while others like "Petition - For Warrant of Restitution Filed" exhibited higher accuracy.
   - Geocoding accuracy also varied by county, with urban counties generally showing higher accuracy compared to rural counties.
4. **Chi-Square Test Results**: 
   - The Chi-square test for eviction type bias yielded a p-value of 0.0000, indicating a significant relationship between eviction type and geocoding accuracy.
   - The Chi-square test for county bias also produced a p-value of 0.0000, indicating a significant relationship between county and geocoding accuracy.

In [None]:
# 2.1 Check for Missing Data
missing_data = gdf.isnull().sum()
print(missing_data)

In [None]:
# 2.2 Geometry Handling

from shapely import wkb
from shapely.geometry import Point

# Load Parquet file.
df = pd.read_parquet("md_eviction_warrants_through_2024.geoparquet")

# Convert WKB geometry to Shapely geometry.
def safe_wkb_loads(value):
    try:
        return wkb.loads(value) if isinstance(value, bytes) else None
    except:
        return None

df["geometry"] = df["geometry"].apply(safe_wkb_loads)

# Fill missing geometries with lat/lon points if available.
missing_geom = df["geometry"].isna()
if missing_geom.any():
    df.loc[missing_geom, "geometry"] = df.loc[missing_geom].apply(
        lambda row: Point(row["lon"], row["lat"]), axis=1
    )

# Convert to GeoDataFrame and reproject to EPSG:4326
gdf = gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4269")
gdf = gdf.to_crs(epsg=4326)

# Display first rows and confirm CRS.
print(gdf.head())
print(gdf.crs)


In [105]:
# 2.3 Geocoding Accuracy Classification and Summary

import scipy.stats as stats

# Classify geocoding accuracy.
accuracy_levels = {
    "Exact": 2,
    "Non_Exact": 1,
    "No_Match": 0
}

df["geocode_accuracy"] = df["match_type"].map(accuracy_levels).fillna(0)  # Default to 0 if missing.

# Summarize accuracy across eviction types and counties.
accuracy_by_eviction = df.groupby("EventType")["geocode_accuracy"].mean()
accuracy_by_county = df.groupby("County")["geocode_accuracy"].mean()

# Print results
print("Geocoding Accuracy by Eviction Type:\n", accuracy_by_eviction)
print("\nGeocoding Accuracy by County:\n", accuracy_by_county)

Geocoding Accuracy by Eviction Type:
 EventType
Petition - For Warrant of Restitution Filed                1.497893
Warrant of Restitution - Return of Service - Cancelled     1.502320
Warrant of Restitution - Return of Service - Cancelled     1.800000
Warrant of Restitution - Return of Service - Evicted       1.502296
Warrant of Restitution - Return of Service - Expired       1.497003
petition - For Warrant of Restitution Filed                1.279070
Name: geocode_accuracy, dtype: float64

Geocoding Accuracy by County:
 County
Allegany           1.497156
Anne Arundel       1.519758
Baltimore          1.503065
Baltimore City     1.501311
Calvert            1.497758
Caroline           1.480916
Carroll            1.480880
Cecil              1.489447
Charles            1.479340
Dorchester         1.462409
Frederick          1.485777
Garrett            1.620968
Harford            1.487263
Howard             1.507268
Kent               1.358824
Montgomery         1.505331
Prince George's   

In [101]:
# 2.4 Chi-Square Test for Bias Detection

# Check eviction counts for significant bias using Chi-square test
eviction_counts = df.groupby(["EventType", "geocode_accuracy"]).size().unstack(fill_value=0)
chi2_eviction, p_eviction = stats.chi2_contingency(eviction_counts)[:2]

# Check County counts for significant bias using Chi-square test
county_counts = df.groupby(["County", "geocode_accuracy"]).size().unstack(fill_value=0)
chi2_county, p_county = stats.chi2_contingency(county_counts)[:2]

print(f"\nChi-square test for Eviction Type Bias: p-value = {p_eviction:.4f}")
print(f"Chi-square test for County Bias: p-value = {p_county:.4f}")


Chi-square test for Eviction Type Bias: p-value = 0.0000
Chi-square test for County Bias: p-value = 0.0000


# Section 3: Geospatial Data Preperation
The eviction dataset was prepared for geospatial analysis by matching the eviction records with corresponding Census Tracts. This involved cleaning and transforming the data into a format suitable for spatial analysis, validating coordinates, and performing a spatial join to link each eviction record with its respective Census Tract. While some records lacked geographic coordinates, these were removed to ensure accurate mapping. All remaining geometries were valid, and no missing data were found in the critical columns used for spatial analysis.

## Results Summary

1. **Data Preparation**: The evictions dataset was converted into a GeoDataFrame with proper point geometries using latitude and longitude coordinates. Records missing location data (5.4% of total) were excluded to maintain spatial integrity. 
2. **Spatial Join**: A spatial join was performed to match eviction records with the corresponding Census Tracts based on their geographic locations. Only records with valid MD Census Tract matches were retained.
3. **Final Dataset**: After performing the spatial join, **N=388,966** evictions were successfully matched to valid Census Tracts.

In [133]:
import pandas as pd
import geopandas as gpd

# **Step 1: Load Eviction Data**
evictions_df = pd.read_parquet("md_eviction_warrants_through_2024.geoparquet", engine="pyarrow")

# **Step 2: Verify Data Integrity**
print("Columns in eviction dataset:", evictions_df.columns)
print("Missing values:", evictions_df[["CaseNumber", "lon", "lat"]].isnull().sum())

# **Step 3: Handle Missing Location Data**
evictions_df_clean = evictions_df.dropna(subset=["lon", "lat"])  # Remove records without valid coordinates
print("Records after dropping missing locations:", len(evictions_df_clean))

# **Step 4: Convert to GeoDataFrame (Preserve All Columns)**
evictions_gdf = evictions_df_clean.copy()  # Ensure all attributes remain
evictions_gdf["geometry"] = gpd.points_from_xy(evictions_gdf["lon"], evictions_gdf["lat"])  # Add spatial data

# Convert to GeoDataFrame with CRS set
evictions_gdf = gpd.GeoDataFrame(evictions_gdf, crs="EPSG:4326")

# **Step 5: Verify Conversion Results**
print("Columns after GeoDataFrame conversion:", evictions_gdf.columns)
print(evictions_gdf.head())  # Confirm eviction case details remain

# **Step 6: Save Converted GeoDataFrame**
evictions_gdf.to_file("data/evictions_cleaned.gpkg", driver="GPKG")

print("✅ Eviction data successfully converted into GeoDataFrame!")

Columns in eviction dataset: Index(['ID', 'EventDate', 'EventType', 'EventComment', 'County', 'Location',
       'TenantAddress', 'TenantCity', 'TenantState', 'TenantZipCode',
       'CaseType', 'CaseNumber', 'EvictedDate', 'Source', 'SourceDate', 'Year',
       'EvictionYear', 'unique_id', 'input_address', 'match_status',
       'match_type', 'match_address', 'match_lon_lat', 'match_tiger_line_id',
       'match_tiger_line_side', 'lon', 'lat', 'geometry'],
      dtype='object')
Missing values: CaseNumber        2
lon           22074
lat           22074
dtype: int64
Records after dropping missing locations: 388966
Columns after GeoDataFrame conversion: Index(['ID', 'EventDate', 'EventType', 'EventComment', 'County', 'Location',
       'TenantAddress', 'TenantCity', 'TenantState', 'TenantZipCode',
       'CaseType', 'CaseNumber', 'EvictedDate', 'Source', 'SourceDate', 'Year',
       'EvictionYear', 'unique_id', 'input_address', 'match_status',
       'match_type', 'match_address', 'matc

In [143]:
# 3.2 Spatial Join

import geopandas as gpd
import pandas as pd

# **Step 1: Load Maryland Census Tract Shapefile**
tracts_gdf = gpd.read_file("shapefiles/tl_2024_24_tract.shp").to_crs("EPSG:4326")

# **Step 2: Load Cleaned Eviction Dataset & Verify Integrity**
evictions_gdf = gpd.read_file("data/evictions_cleaned.gpkg")  # This was processed in Section 3.1
print("Eviction dataset columns:", evictions_gdf.columns)

# **Step 3: Perform Spatial Join (Evictions ↔ Census Tracts)**
evictions_gdf = evictions_gdf.sjoin(tracts_gdf[["GEOID", "geometry"]], how="left", predicate="within")

# **Step 4: Aggregate Eviction Data by Census Tract**
evictions_by_tract = evictions_gdf.groupby("GEOID").agg(
    EvictionCount=("CaseNumber", "count"),  # Total evictions per tract
    CaseNumbersList=("CaseNumber", lambda x: list(x))  # Preserve case numbers
).reset_index()

# **Step 5: Merge Aggregated Eviction Data with Census Tracts**
tracts_gdf = tracts_gdf.merge(evictions_by_tract, on="GEOID", how="left")
tracts_gdf["EvictionCount"] = tracts_gdf["EvictionCount"].fillna(0)  # Fill missing values
tracts_gdf["CaseNumbersList"] = tracts_gdf["CaseNumbersList"].apply(lambda x: x if isinstance(x, list) else [])  # Ensure valid case lists

# **Step 6: Verify Spatial Join & Aggregation Results**
print("Total eviction cases assigned to Census Tracts:", tracts_gdf["EvictionCount"].sum())
print(tracts_gdf[["GEOID", "EvictionCount", "CaseNumbersList"]].sample(10))  # Random check

Eviction dataset columns: Index(['ID', 'EventDate', 'EventType', 'EventComment', 'County', 'Location',
       'TenantAddress', 'TenantCity', 'TenantState', 'TenantZipCode',
       'CaseType', 'CaseNumber', 'EvictedDate', 'Source', 'SourceDate', 'Year',
       'EvictionYear', 'unique_id', 'input_address', 'match_status',
       'match_type', 'match_address', 'match_lon_lat', 'match_tiger_line_id',
       'match_tiger_line_side', 'lon', 'lat', 'geometry'],
      dtype='object')
Total eviction cases assigned to Census Tracts: 388910.0
            GEOID  EvictionCount  \
824   24019970100           20.0   
990   24510260402         1451.0   
210   24017850600           13.0   
1128  24510200701          647.0   
916   24003740305          697.0   
1160  24017851200           74.0   
1466  24031701900          114.0   
1070  24510160400          383.0   
539   24027601206          114.0   
986   24033800514           35.0   

                                        CaseNumbersList  
824   [

In [144]:
# 3.3 Final Datasdet
tracts_gdf.to_file("data/processed_evictions_by_tract.gpkg", driver="GPKG")

print("✅ Spatial join and eviction count aggregation completed successfully!")

✅ Spatial join and eviction count aggregation completed successfully!


# Section 4: Census Tract Eviction HotSpots

## Spatial Distribution
- **High Variability:** Standard deviation of 445.14 suggests that eviction counts fluctuate significantly across different areas
- **Extreme Outliers:** Some Census Tracts recorded eviction counts as high as 4,618, meaning certain regions face severe housing instability.
- **Lower Quartile:** 25% of Census Tracts had 16 or fewer evictions, showing that eviction is highly concentrated in specific geographic areas
- **Upper Quartile:** 75% of Census Tracts had 338 or fewer evictions, indicating that high-eviction areas are rare but extreme when they do exist.

## Extreme Hotspots:
- Essex, followed by other Baltimore-area tracts suggests severe clustering in urban counties.
- Spatial clustering suggests concentrated eviction activity.

In [148]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# **Step 1: Load Processed Evictions Dataset**
tracts_gdf = gpd.read_file("data/processed_evictions_by_tract.gpkg")

# **Step 2: Compute Key Summary Statistics**
eviction_stats = tracts_gdf["EvictionCount"].describe()

mean_evictions = eviction_stats["mean"]
median_evictions = tracts_gdf["EvictionCount"].median()
std_dev_evictions = eviction_stats["std"]
min_evictions = eviction_stats["min"]
max_evictions = eviction_stats["max"]
quartiles = tracts_gdf["EvictionCount"].quantile([0.25, 0.75])

# **Step 3: Display Results**
print("📊 Eviction Count Statistics:")
print(f"Mean Evictions per Census Tract: {mean_evictions:.2f}")
print(f"Median Evictions per Census Tract: {median_evictions}")
print(f"Standard Deviation: {std_dev_evictions:.2f}")
print(f"Min Evictions: {min_evictions}, Max Evictions: {max_evictions}")
print(f"25th Percentile (Q1): {quartiles[0.25]}, 75th Percentile (Q3): {quartiles[0.75]}")

📊 Eviction Count Statistics:
Mean Evictions per Census Tract: 263.67
Median Evictions per Census Tract: 74.0
Standard Deviation: 445.14
Min Evictions: 0.0, Max Evictions: 4618.0
25th Percentile (Q1): 16.0, 75th Percentile (Q3): 338.0


# Conclusion

The eviction analysis reveals significant disparities across Census Tracts, with highly uneven distributions of eviction counts. While 25% of Census Tracts had 16 or fewer evictions, others experienced thousands, indicating extreme clustering in certain areas. 

The presence of outlier Census Tracts with exceptionally high eviction counts underscores the need for targeted policy interventions, including tenant protections and affordability initiatives in high-risk areas. Additionally, the stark contrast between mean (263.67) and median (74.0) eviction counts suggests that a small subset of tracts disproportionately drive the overall eviction rate.

## Follow-Up Research

Future studies should normalize eviction rates by renter population, housing density, or median income, allowing deeper insights into systemic economic and policy drivers behind eviction trends. Additional research could explore transit accessibility, zoning restrictions, and rental affordability to determine if high eviction rates correlate with urban development policies. Finally, incorporating tenant surveys or legal case outcomes would provide a more holistic view of eviction causes and long-term housing instability.