# Bulletproofing custody deaths

The purpose of this notebook is to look at all the different ways the data can be wrong or lacking.

There is also some various data checks in individual notebooks.

### Get the data

This is the same in all the notebooks. You have to run the [File processing](./) notebook first to get the cleaned files.

In [1]:
import agate
import datetime

In [2]:
# sets data types on fields agate got wrong
specified_data_types = {
    'tracked_cause': agate.Text(),
    'offense': agate.Text(),
    'case_study': agate.Text(),
    'official_discipline': agate.Text()
}

# this pulls the deaths file that is exported in the File processing notebook
deaths = agate.Table.from_csv('../exports/deaths_latest.csv', column_types=specified_data_types)

print(deaths)

| column               | data_type |
| -------------------- | --------- |
| id                   | Number    |
| ag_report_url        | Text      |
| first_name           | Text      |
| middle_name          | Text      |
| last_name            | Text      |
| suffix               | Text      |
| slug                 | Text      |
| race                 | Text      |
| gender               | Text      |
| date_of_birth        | Date      |
| date_of_death        | Date      |
| age                  | Number    |
| agency               | Number    |
| restrained           | Boolean   |
| tazed                | Boolean   |
| times_tazed          | Number    |
| pepper_sprayed       | Boolean   |
| official_discipline  | Text      |
| grand_jury_result    | Text      |
| mental_health_issues | Boolean   |
| manner_of_death      | Text      |
| drug_intoxication    | Boolean   |
| cause_of_death       | Text      |
| tracked_cause        | Text      |
| offense              | Text      |
|

### No ag_report_url

These are cases with no custody death report on file with the AG's officer.

In [3]:
no_ag = deaths.where(lambda row: row['ag_report_url'] == None)

no_ag.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'}).print_table()

| first_name | middle_name | last_name | reporter |
| ---------- | ----------- | --------- | -------- |
| Dustin     | Keith       | Glover    | ed       |
| Mark       | Spotswood   | Green     | ed       |
| Thuan      |             | Tran      | ed       |
| Joel       | Don         | Casey     | ed       |
| Kenneth    |             | Eagleton  | ed       |
| Herman     |             | Barnes    | ed       |


### No birthdate

The `date_of_birth` field is required, so we used 1900-01-01 for records where we didn't have that date.

In [4]:
# all deaths ordered by date of birth to show bad ones at top.

print('There were three cases where there was no birthdate:\n')
deaths.select(['first_name',
               'middle_name',
               'last_name',
               'date_of_birth']).order_by('date_of_birth').print_table(5)

There were three cases where there was no birthdate:

| first_name | middle_name | last_name | date_of_birth |
| ---------- | ----------- | --------- | ------------- |
| Mark       | Spotswood   | Green     |    1900-01-01 |
| Kenneth    |             | Eagleton  |    1900-01-01 |
| Herman     |             | Barnes    |    1900-01-01 |
| Robert     | Earl        | Williams  |    1942-12-01 |
| William    |             | Livezey   |    1943-03-04 |
| ...        | ...         | ...       |           ... |


### Where 'cause of death' is null

In [5]:
# filter cases where cause of death is blank
cause_none = deaths.where(lambda row: row['cause_of_death'] == None)

print('\nList of records where cause_of_death is null:\n')

# print the results
cause_none.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'}).print_table()


List of records where cause_of_death is null:

| first_name | middle_name | last_name | reporter |
| ---------- | ----------- | --------- | -------- |
| Dustin     | Keith       | Glover    | ed       |
| Mario      | Anthony     | Pedigone  | js       |
| Manuel     |             | Delacruz  | js       |


### No manner of death chosen

In [6]:
# Filter to no manner of death:
manner_none = deaths.where(lambda row: row['manner_of_death'] == None)

print('\nList of records where no manner_of_death chosen:\n')
# print names and reporters where true
manner_none.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'}).print_table(max_rows=None)


List of records where no manner_of_death chosen:

| first_name | middle_name | last_name | reporter |
| ---------- | ----------- | --------- | -------- |
| Joshua     | Carl        | Ivy       |          |
| Sam        |             | Akin      |          |
| Larry      | Earl        | Leaks     |          |
| Greg       |             | Cranfill  |          |
| William    | Yendis      | Franklin  |          |
| Lorenzo    | Charles     | Stiggers  |          |
| Dennis     | Demarcus    | Oatis     |          |
| John       | Yongwoo     | Kim       |          |
| Efrain     | Agapito     | Sauceda   |          |
| Jerry      | Guildwin    | Choate    |          |
| Cory       | Christopher | Lucero    |          |
| Antonie    | D           | Bell      |          |
| Shawn      | Ellis       | Leflore   |          |
| Dustin     | Keith       | Glover    | ed       |
| Manuel     |             | Delacruz  | js       |
| David      | Mercado     | Gonzalez  | ab       |
| Macario    

### No Grand Jury chosen

In [7]:
# pivot and print to see values
grandjury_pivot = deaths.pivot('grand_jury_name')
grandjury_pivot.print_table()

# Filter to no grand_jury_result chosen:
grandjury_none = deaths.where(lambda row: row['grand_jury_name'] == None)

print('\nList of records where no grand_jury_result chosen:\n')
# print names and reporters where true
grandjury_none.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'})\
               .order_by('last_name').order_by('reporter').print_table(max_rows=None)

| grand_jury_name | Count |
| --------------- | ----- |
|                 |   129 |
| Not brought     |    82 |
| No-bill         |    76 |
| Indictment      |     2 |

List of records where no grand_jury_result chosen:

| first_name  | middle_name   | last_name        | reporter |
| ----------- | ------------- | ---------------- | -------- |
| Weldon      | Davis         | Baker            | ab       |
| Derrick     | Anthony       | Birdow           | ab       |
| Charles     | Edward        | Boll             | ab       |
| Wilber      |               | Castillo-Gongora | ab       |
| Jesse       | Andrew        | DeBusk           | ab       |
| David       | Mercado       | Gonzalez         | ab       |
| Daniel      | Lee           | Guerra           | ab       |
| Jonas       |               | Hernandez        | ab       |
| Christopher | Stewart       | James            | ab       |
| Jorge       |               | Ochoa            | ab       |
| Kevin       | Ray           | Oma

### Restrained as no or none

Checking the restrained field for empty or "no" values.

In [8]:
# pivot on restrained to see the number of cases.
deaths.pivot('restrained').print_table()

# filter for rows were restrained is not selected at all
restrained_null = deaths.where(lambda row: row['restrained'] == None)

print('\nCass where Restrained is not selected\n')
restrained_null.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'}).print_table()

# filter for rows were restrained is not selected at all
restrained_no = deaths.where(lambda row: row['restrained'] == False)

print('\nCass where Restrained is set to False\n')
restrained_no.select([
                'first_name',
                'middle_name',
                'last_name',
                'reporter_assigned']).rename(column_names={'reporter_assigned': 'reporter'}).print_table()



| restrained | Count |
| ---------- | ----- |
|       True |   289 |

Cass where Restrained is not selected

| first_name | middle_name | last_name | reporter |
| ---------- | ----------- | --------- | -------- |

Cass where Restrained is set to False

| first_name | middle_name | last_name | reporter |
| ---------- | ----------- | --------- | -------- |
