In [1]:
#hide
import pandas as pd
pd.set_option('precision', 2)
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import sys

plt.style.use('ggplot')
%config InlineBackend.figure_format = 'retina'
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
colors_ = plt.get_cmap('Set2')(np.linspace(0, 1, 8))

from IPython.core.pylabtools import figsize
from IPython.display import display
figsize(8, 5)

%load_ext watermark
%load_ext autoreload
%autoreload 2
%matplotlib inline

%watermark -d -t -u -v -g -r -b -iv -a "Hongsup Shin"

Author: Hongsup Shin

Last updated: 2021-04-26 20:26:53

Python implementation: CPython
Python version       : 3.7.10
IPython version      : 7.20.0

Git hash: fbb36e500f156738a756cd5c20741a0bd0af8d0b

Git repo: https://github.com/hongsups/blog.git

Git branch: TJI_volunteering

matplotlib: 3.4.1
pandas    : 1.2.3
sys       : 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12) 
[Clang 11.0.1 ]
seaborn   : 0.11.1
numpy     : 1.20.2



# Data Science Volunteering for Criminal Justice (1/2): Police Shooting in Texas

I have met several tech workers who are interested in using their technical skills in more meaningful projects particularly for social good. As a tech worker myself, I've been volunteering at several organizations about two years now. Among those, I have been volunteering at [Texas Justice Initiative (TJI)](https://texasjusticeinitiative.org/) consistently over a year where I published [a data journalism report on police shooting in Texas](https://texasjusticeinitiative.org/static/TJI_OISReport_2020.pdf) with our TJI director, [Eva Ruth Moravec](https://www.linkedin.com/in/eva-ruth-moravec-m-a-1aa9332/).

In this post, I'd like to share some of my volunteering at TJI, particularly on police shooting in Texas. In the follow-up post, I will share some tips on tech volunteering at nonprofits, especially focusing on how to use machine learning and data science knowledge.

## Background

### Texas Justice Iniative (TJI)
[Texas Justice Iniative (TJI)](https://texasjusticeinitiative.org/) is a criminal justice nonprofit in Austin, TX. It was founded in 2016 by Eva Ruth Moravec, a journalist, and Amanda Woog, a researcher. The main goal of TJI is to **create data portal for criminal justice in Texas**. TJI mostly relies on tech volunteers. We currently have [11 active volunteers](https://texasjusticeinitiative.org/about), including myself.


### Police shooting in the US

It's 2021 now and almost everyone in the US is aware that there is an epidemic of police brutality all over the country. Academic researchers now agree that [police contact is a significant contributing factor to health inequality and particuarly to early mortality for people of color](https://pubmed.ncbi.nlm.nih.gov/31383756/). The police brutality in the US is also internationally recognizied such as in [the 2014 report by the UN Committee Against Torture](https://www.justsecurity.org/wp-content/uploads/2014/11/UN-Committee-Against-Torture-Concluding-Observations-United-States.pdf):
> The   Committee   is   concerned   about   numerous   reports  of   police   brutality   and excessive use of force by law enforcement officials, in particular against persons belonging to certain racial and ethnic groups, immigrants and  LGBTI individuals, racial profiling by police  and  immigration  offices  and  growing  militarization  of  policing  activities.  

If the issue of police brutality is this severe, one would expect that there might be a national database of police shooting incidents. **Unfortunately, the US goverment does not maintain a national database of police shooting incidents.** Rather, most efforts to collect police shooting incidents were done by journalists: D. Brian Burghart's [Fatal Encounter](https://fatalencounters.org/) and Washington Post's [Fatal Force](https://www.washingtonpost.com/graphics/investigations/police-shootings-database/) are most populat examples. Some local goverments have pulic data portals for police shooting but they are mostly at a city level and it's rare to find a state-level database.

### Officer-involved shooting (OIS) data in Texas

The state of Texas collects officer-involved shooting (OIS) data from all law enforcement agencies in the state. The term **officer-involved shooting** refers to two different types of incidents:
1. **Shooting *by* officers**. In this case, people harmed during a shooting incident are **civilians**.
2. **Shooting *of* officers**. Here, those harmed during an incident are **police officers.**

This was possible thanks to the legistation in 2015 (HB 1036) which requires all officer-involved shootings to be reported to the **Office of Attorney General (OAG) of Texas**. In 2017, another bill was passed, which now requires OAG to investigate missing reports and to fine law enforcement agencies with delayed reports. This data is is available in public in [the OAG website](https://oagtx.force.com/oisreports/apex/OISReportsPage). Every report in the data is saved as a PDF file, which makes it challenging to digest.

### Data collection process

Many ML practicioners and data scientists are usually detached from data collection process, which makes it easy for us to overlook the importance of the effort and cost of data collection. Thus, in this section, I'd like to describe the data collection process for the OIS data briefly to show you that **data doesn't just appear for free**. 

First and foremost, as above-mentioned, for the public data, legislations and polices should be established as a consorted effort to collect data formally. Now every law enforcement agency has to file a one-page report (see the example [here](https://www.austintexas.gov/sites/default/files/files/Police/Olson_Dr._-_Peace_Officer_Involved_Injuries.pdf)) whenever they identify an officer-involved shooting incident in their jurisdiction. 

This OIS report contains many pieces of valuable information. To list a few:
- Date and location of the incident (location as detailed as street address)
- Demographic information (gender, race, and age) of the person who was shot by police and the police officer who shot the person
- Severity of the incident (binary; injury or death) and whether the person shot possessed a deadly weapon
- Whether multiple officers were involved, whether the officer was on duty, and so on.

To acquire this data, every month, TJI submits an open records request via OAG's online portal. Luckily, we get a tabular data (not the PDF files) in a csv file format and our director Eva manually inspects and fixes errors. In this process, we contact the agencies and ask for clarification for errors. Finally, the data is added to the existing data and the data on our website is updated. You can download the [OIS data on our website](https://texasjusticeinitiative.org/datasets/civilians-shot).

## Motivation and Data Preparation