In [None]:
import pandas as pd
import viz_functions as viz
from scipy.stats import spearmanr

In [None]:
data = pd.read_pickle('Data/data_log_transformed.pkl')
original = pd.read_pickle('Data/data_cleaned.pkl')

In [None]:
labels = {
    '%_FEMALE': 'Share of Female Officers (log1p)',
    '%_BLACK': 'Share of Black Officers (log1p)',
    '%_HISP': 'Share of Hispanic Officers (log1p)',
    'CCRB': 'Share of Officers in CCRB-Covered Agencies (log1p)',
    'CFDBK_POLICY': 'Share of Officers in Agencies Using Community Feedback (log1p)',
    'STD_FORCE_TO_RESIDENT': 'Lethal Force Incidents / 100k Residents (log1p-scaled)',
    'STD_FORCE_TO_CRIME': 'Lethal Force Incidents / 1k Reported Crimes (log1p-scaled)'
}

In [None]:
# Figure 1: Lethal Force Incidents per 100k Residents by State and Year
viz.bar_by_state_year( original,'%_FORCE_TO_RESIDENT')

In [None]:
# Figure 2: Lethal Force Incidents per 100k Residents by State and Year
viz.bar_by_state_year( original,'%_FORCE_TO_CRIME')

In [None]:
# Figure 3: Spearman Correlation Matrix
corr = data[['%_FEMALE', '%_BLACK', '%_HISP', 'CCRB',
       'CFDBK_POLICY', 'STD_FORCE_TO_RESIDENT', 'STD_FORCE_TO_CRIME']]
viz.plot_correlation_matrix(corr)

## Hypothesis 1 & Hypothesis 4.1 Analysis: Share of Female Officers vs. Lethal Force Incidents


### State level analysis
 *States that have a higher proportion of female officers will have a less lethal force incidents per year.*

In [None]:
viz.scatter_dual_year_highlight(data, '%_FEMALE','STD_FORCE_TO_CRIME', label_map= labels)

In [None]:
viz.scatter_dual_year_highlight(data, '%_FEMALE','STD_FORCE_TO_CRIME', label_map= labels)

### County level analysis
*Counties that have a higher proportion of female officers will have a less lethal force incidents per year.*

## Hypothesis 2 & Hypothesis 4.2 Analysis: Minority Officer Representation vs. Lethal Force Incidents

### State level analysis
*States that have a higher proportion of minority officers (Black, Latino) are less likely to be involved in lethal force incidents.*

In [None]:
viz.scatter_dual_year_highlight(data, '%_BLACK','STD_FORCE_TO_RESIDENT', label_map= labels)

In [None]:
viz.scatter_dual_year_highlight(data, '%_BLACK','STD_FORCE_TO_CRIME', label_map= labels)

In [None]:
viz.scatter_dual_year_highlight(data, '%_HISP','STD_FORCE_TO_RESIDENT', label_map= labels)

In [None]:
viz.scatter_dual_year_highlight(data, '%_HISP','STD_FORCE_TO_CRIME', label_map= labels)

### County level analysis
*Counties that have a higher proportion of minority officers (Black, Latino) are less likely to be involved in lethal force incidents.*

## Hypothesis 3 & Hypothesis 4.3 Analysis: Civilian Complaint Review Board (CCRB) Presence vs. Lethal Force Incidents

### State level analysis
*States where a greater share of officers are employed in agencies with a civilian complaint review board will have a less lethal force incidents per year.*

In [None]:
viz.scatter_dual_year_highlight(data, 'CCRB','STD_FORCE_TO_RESIDENT', label_map= labels)

In [None]:
viz.scatter_dual_year_highlight(data, 'CCRB','STD_FORCE_TO_CRIME', label_map= labels)

The scatter plots show that the relationship between CCRB coverage and lethal force rates isn’t very linear. Some states with low CCRB coverage had the highest rates, and the one with full coverage still had above-average use of force. When comparing across years, CCRB coverage appears to have increased: in 2016, most states were clustered on the lower end, while in 2020, the points are more spread out and extend further to the right, suggesting broader or more varied adoption across states.

In [None]:
viz.plot_quartile_boxplot(data, x_var='CCRB', y_var='STD_FORCE_TO_RESIDENT', label_map=labels)

In [None]:
viz.plot_quartile_boxplot(data, x_var='CCRB', y_var='STD_FORCE_TO_RESIDENT', label_map=labels)

The boxplot shows that states in higher CCRB coverage quartiles tend to have higher median lethal force rates. There's notable overlap and an outlier in Q2—a state with low CCRB coverage but the highest force rate (scaled to 1)—suggesting the relationship isn’t consistent across all states.

In [None]:
corr = data[['CCRB','STD_FORCE_TO_RESIDENT','STD_FORCE_TO_CRIME']]
viz.plot_correlation_matrix(corr)

From the spearman correlations we can see a **moderate positive relationship** (Spearman’s ρ = 0.38), suggesting that states with greater CCRB presence tend to have higher rates of lethal force incidents. This goes against our expectation that more accountability (via CCRBs) would be linked to lower lethal force rates, this likely because CCRBs are being implemented reactively in response to high number of use-of-force incidents and as we saw from bar plots lethal force rates were generally higher in 2020. We look more closely at difference in correlation between 2016 and 2020:

In [None]:
for col in ['STD_FORCE_TO_RESIDENT','STD_FORCE_TO_CRIME']:
    print(f'{col} vs. CCRB')
    for year in [2016, 2020]:
        year_sub = data[data['YEAR'] == year]
        r, p = spearmanr(year_sub['CCRB'], year_sub[col])
        print(f"Year {year}: Spearman r = {r:.3f}, p = {p:.3f}")

The relationship between CCRB coverage and standardized lethal force rates strengthened from 2016 to 2020. In both population- and crime-normalized metrics, the 2020 correlations were moderate and statistically significant, while 2016 showed weaker, non-significant results. This **supports Hypothesis 4**, though the positive direction of the relationship **contradicts Hypothesis 3**.

### County level analysis
*Counties where a greater share of officers are employed in agencies with a civilian complaint review board will have a less lethal force incidents per year.*