# 17 — US County Mass Shooting Data Collection

Load mass shooting incident counts from the Gun Violence Archive (GVA) CSV,
compute per-capita rates, and merge with existing county indicators.

**Definition:** A mass shooting is an incident where 4 or more people are shot
(not including the shooter). Source: Gun Violence Archive, 2019-2023.

**Caveat:** Mass shootings are rare events. Even aggregated over 5 years,
many counties have very low counts (1-4 incidents), making per-capita rates
subject to substantial small-count volatility. Interpret rates for small-population
counties with caution.

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from pathlib import Path

RAW_DIR = Path('../data/raw')
OUTPUT_DIR = Path('../data/processed')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

## Load Mass Shooting Data

In [2]:
ms_df = pd.read_csv(RAW_DIR / 'us_county_mass_shootings.csv', dtype={'fips': str})
print(f"Mass shooting data: {len(ms_df)} counties")
ms_df.head()

Mass shooting data: 101 counties


Unnamed: 0,fips,county_name,state,mass_shooting_incidents,years
0,6037,Los Angeles County,CA,62,2019-2023
1,17031,Cook County,IL,88,2019-2023
2,48201,Harris County,TX,54,2019-2023
3,4013,Maricopa County,AZ,28,2019-2023
4,6073,San Diego County,CA,14,2019-2023


## Load Existing County Data & Compute Rate

In [3]:
county_df = pd.read_csv(OUTPUT_DIR / 'merged_us_county_data.csv', dtype={'fips': str})
print(f"Existing county data: {len(county_df)} counties")

# Merge mass shooting counts with all county indicators
merged = county_df.merge(
    ms_df[['fips', 'mass_shooting_incidents', 'years']],
    on='fips', how='left'
)

# Compute mass shooting rate per 100K per year
# Rate = (incidents / 5 years) / (population / 100,000)
merged['mass_shooting_rate'] = (
    (merged['mass_shooting_incidents'] / 5) / (merged['population'] / 100_000)
)

print(f"Merged dataset: {len(merged)} counties")
print(f"Counties with mass shooting data: {merged['mass_shooting_rate'].notna().sum()}")
merged.head(10)

Existing county data: 101 counties
Merged dataset: 101 counties
Counties with mass shooting data: 101


Unnamed: 0,fips,county_name,state,gun_homicide_rate,population,gini,drug_offense_rate,poverty_rate,median_income,gun_ownership_pct,giffords_grade,giffords_numeric,region,mass_shooting_incidents,years,mass_shooting_rate
0,6037,Los Angeles County,CA,7.2,9829544,0.507,320.0,14.2,75235,28.3,A,11,West,62,2019-2023,0.12615
1,17031,Cook County,IL,14.5,5173146,0.504,580.0,13.1,72231,27.8,A,11,Midwest,88,2019-2023,0.340219
2,48201,Harris County,TX,12.8,4728030,0.498,490.0,15.8,63802,45.7,F,0,South,54,2019-2023,0.228425
3,4013,Maricopa County,AZ,7.1,4496588,0.46,420.0,13.5,72850,46.3,F,0,West,28,2019-2023,0.124539
4,6073,San Diego County,CA,3.5,3276208,0.465,340.0,11.8,88240,28.3,A,11,West,14,2019-2023,0.085465
5,6059,Orange County,CA,1.8,3162245,0.465,260.0,9.1,104419,28.3,A,11,West,6,2019-2023,0.037948
6,12086,Miami-Dade County,FL,10.2,2701767,0.512,410.0,16.5,57815,35.3,D,2,South,38,2019-2023,0.281297
7,48113,Dallas County,TX,11.4,2613539,0.504,470.0,15.3,62081,45.7,F,0,South,32,2019-2023,0.244879
8,36047,Kings County,NY,4.8,2559903,0.512,290.0,19.2,66850,19.9,A,11,Northeast,22,2019-2023,0.171882
9,6065,Riverside County,CA,5.6,2418185,0.435,380.0,13.4,73260,28.3,A,11,West,16,2019-2023,0.132331


## Small-Count Caveat

Mass shootings (4+ people shot) are thankfully rare. Many counties in this dataset
had only 1-4 incidents over the entire 5-year window. A single additional incident
could drastically change a county's rate, especially for smaller populations.
This inherent instability should be kept in mind when interpreting scatter plots
and correlation statistics below.

In [4]:
print("Distribution of mass shooting incident counts (5-year total):")
print(merged['mass_shooting_incidents'].describe())
print()
print(f"Counties with <= 4 incidents: {(merged['mass_shooting_incidents'] <= 4).sum()}")
print(f"Counties with 5-20 incidents: {((merged['mass_shooting_incidents'] > 4) & (merged['mass_shooting_incidents'] <= 20)).sum()}")
print(f"Counties with > 20 incidents: {(merged['mass_shooting_incidents'] > 20).sum()}")

Distribution of mass shooting incident counts (5-year total):
count    101.000000
mean      14.465347
std       16.144079
min        1.000000
25%        4.000000
50%       10.000000
75%       16.000000
max       95.000000
Name: mass_shooting_incidents, dtype: float64

Counties with <= 4 incidents: 28
Counties with 5-20 incidents: 54
Counties with > 20 incidents: 19


## Data Coverage Summary

In [5]:
coverage = pd.DataFrame({
    'Metric': ['Mass Shooting Rate', 'Gun Homicide Rate', 'Population',
               'Gini Coefficient', 'Drug Offense Rate', 'Poverty Rate',
               'Median Income', 'Gun Ownership %', 'Giffords Grade'],
    'Counties with data': [
        merged['mass_shooting_rate'].notna().sum(),
        merged['gun_homicide_rate'].notna().sum(),
        merged['population'].notna().sum(),
        merged['gini'].notna().sum(),
        merged['drug_offense_rate'].notna().sum(),
        merged['poverty_rate'].notna().sum(),
        merged['median_income'].notna().sum(),
        merged['gun_ownership_pct'].notna().sum(),
        merged['giffords_numeric'].notna().sum(),
    ]
})
print(f"Counties with ALL metrics: {merged.dropna().shape[0]}")
print()
coverage

Counties with ALL metrics: 101



Unnamed: 0,Metric,Counties with data
0,Mass Shooting Rate,101
1,Gun Homicide Rate,101
2,Population,101
3,Gini Coefficient,101
4,Drug Offense Rate,101
5,Poverty Rate,101
6,Median Income,101
7,Gun Ownership %,101
8,Giffords Grade,101


In [6]:
fig = go.Figure(data=[
    go.Bar(
        x=coverage['Metric'],
        y=coverage['Counties with data'],
        marker_color=['#c0392b', '#e74c3c', '#3498db', '#2ecc71', '#f39c12',
                      '#9b59b6', '#1abc9c', '#e67e22', '#34495e']
    )
])
fig.update_layout(
    title='Data Coverage by Metric (US Counties — Mass Shooting Analysis)',
    yaxis_title='Number of Counties',
    template='plotly_white',
    height=400,
)
fig.show()

## Save Merged Dataset

In [7]:
out_path = OUTPUT_DIR / 'merged_us_mass_shooting_data.csv'
merged.to_csv(out_path, index=False)
print(f"Saved merged data to {out_path}")
print(f"Shape: {merged.shape}")
merged.describe()

Saved merged data to ../data/processed/merged_us_mass_shooting_data.csv
Shape: (101, 16)


Unnamed: 0,gun_homicide_rate,population,gini,drug_offense_rate,poverty_rate,median_income,gun_ownership_pct,giffords_numeric,mass_shooting_incidents,mass_shooting_rate
count,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0
mean,8.29703,1244821.0,0.469515,397.524752,13.940594,70647.306931,39.711881,4.960396,14.465347,0.306792
std,6.927748,1272756.0,0.039671,119.602725,4.976247,20091.026633,12.617974,4.855761,16.144079,0.381444
min,0.7,99500.0,0.393,150.0,5.1,39820.0,8.1,0.0,1.0,0.012255
25%,3.4,570719.0,0.441,310.0,10.2,57350.0,28.3,0.0,4.0,0.124539
50%,6.2,921130.0,0.469,390.0,13.5,67850.0,44.4,2.0,10.0,0.205295
75%,11.4,1459762.0,0.496,480.0,16.5,76215.0,47.9,11.0,16.0,0.313979
max,34.2,9829544.0,0.6,720.0,28.6,140258.0,66.3,11.0,95.0,2.591115
