Importing required packages into notebook.

In [1]:
import numpy as np
import pandas as pd
import xlrd
from functools import reduce

import warnings
warnings.filterwarnings('ignore')

# Search for Dataset

We were interested in exploring injuries and death trend from gun violence, including accidental, intentional, mass shootings and officer involved shootings. After looking at several datasets available online, we selected one from Kaggle which was taken from Gun Violence Archive. Gun Violence Archive is a not-for-profit corporation formed in 2013 that provides free online public access to accurate information about gun-related violence in the United States, compiled from over 2500 law enforcement, media, commercial and government sources daily.

# Understanding the dataset

The data source is comprised of data from a gun violence database that contains gun violence events in the years 2014, 2015, and 2016. In total, we had 14 csv files that were categorized by the type of incident (accidental/intentional/mass shooting/police involvement) as well as by age groups (children/teen/adult).

Within each category and age group incidents, the data provides a count of victims killed or injured for each incident. There is also data that shows mass shooting occurences that correspond to one or many of the categorized occurences. The data spans the years of 2014, 2015, and 2016 with state, city/county, and address to match. 

There are three groups for Age:
  - **Children**: Ages 0-11
  - **Teens**: Ages 12-17
  - **Adults**: Ages 18+

Thre are four Categories of incidents:
  - **Accidental**: Incidents that were due to playing, negligent handling, unsupervised gun handling
  - **Intentional**: Incidents of murder, suicide. Mass Shootings also account towards intentional incidents
  - **Officers Involved**: Incidents where police officers were involved (killed, injured, or fired his weapon)
  - **Mass Shootings**: Incidents where 4+ victims were injured or killed, excluding the subject/suspect/perpetrator, in one location

In each of the files, 1 row represented 1 incident from gun violence for that particular age group and incident type.

Each file contained the same data structure with the following columns:

    
|Column Name|Explanation |
|-----------|------------|
| Incident Date|The date of the incident|
|State|The state in which the incident occurred|
|City Or County|The city or county of the state where the incident occured|
|Address|The adress of the location the incident occured|
|# Killed|The total count of people killed as a result of the incident|
|# Injured|The total count of people who were injured as a result of the incident   |
|Operations|NaN|
  

**NOTE**: On the Gun Violence Archive website, the `Operations` column contained links to a report summarizing the incident, which were not included in the exported dataset.

We obtained our data set from Kaggle.
Data Source: https://www.kaggle.com/gunviolencearchive/gun-violence-database#teens_killed.csv

A more updated version (up till 2019) is available on Gun Violence Archive's website.
https://www.gunviolencearchive.org/reports

Although a more updated dataset is available, we decided to remain with just data from 2014 - 2016 to focus our analysis on these three years.


### 1. Load and create columns for each CSV
 
 Accidental deaths of adults CSV

In [2]:
df = pd.read_csv('Data/accidental_deaths.csv')

In [3]:
df['Category'] = 'Accidental'

In [4]:
df['Age'] = 'Adult'

Accidental deaths of children CSV

In [5]:
df2 = pd.read_csv('Data/accidental_deaths_children.csv')

In [6]:
df2['Category'] = 'Accidental'

In [7]:
df2['Age'] = 'Child'

Accidental deaths of teens CSV

In [8]:
df3 = pd.read_csv('Data/accidental_deaths_teens.csv')

In [9]:
df3['Category'] = 'Accidental'

In [10]:
df3['Age'] = 'Teen'

Accidental injuries of adults CSV

In [11]:
df4 = pd.read_csv('Data/accidental_injuries.csv')

In [12]:
df4['Category'] = 'Accidental'

In [13]:
df4['Age'] = 'Adult'

Accidental injuries of children CSV

In [14]:
df5 = pd.read_csv('Data/accidental_injuries_children.csv')

In [15]:
df5['Category'] = 'Accidental'

In [16]:
df5['Age'] = 'Child'

Accidental injuries of teens CSV

In [17]:
df6 = pd.read_csv('Data/accidental_injuries_teens.csv')

In [18]:
df6['Category'] = 'Accidental'

In [19]:
df6['Age'] = 'Teen'

Intentional injuries of children CSV

In [20]:
df7 = pd.read_csv('Data/children_injured.csv')

In [21]:
df7['Category'] = 'Intentional'

In [22]:
df7['Age'] = 'Child'

Intentional deaths of children CSV

In [23]:
df8 = pd.read_csv('Data/children_killed.csv')

In [24]:
df8['Category'] = 'Intentional'

In [25]:
df8['Age'] = 'Child'

Mass shootings in 2014 CSV

In [26]:
df9 = pd.read_csv('Data/mass_shootings_2014.csv')

In [27]:
df9['Category'] = 'Mass Shootings'

In [28]:
df9['Age'] = 'N/A'

Mass shootings in 2015 CSV

In [29]:
df10 = pd.read_csv('Data/mass_shootings_2015.csv')

In [30]:
df10['Category'] = 'Mass Shootings'

In [31]:
df10['Age'] = 'N/A'

Mass shootings in 2016 CSV

In [32]:
df11 = pd.read_csv('Data/mass_shootings_2016.csv')

In [33]:
df11['Category'] = 'Mass Shootings'

In [34]:
df11['Age'] = 'N/A'

Officers involved CSV

In [35]:
df12 = pd.read_csv('Data/officer_involved_shootings.csv')

In [36]:
df12['Category'] = 'Police Involvement'

In [37]:
df12['Age'] = 'Adult'

Intentional injuries of teens CSV

In [38]:
df13 = pd.read_csv('Data/teens_injured.csv')

In [39]:
df13['Category'] = 'Intentional'

In [40]:
df13['Age'] = 'Teen'

Intentional deaths of teens CSV

In [41]:
df14 = pd.read_csv('Data/teens_killed.csv')

In [42]:
df14['Category'] = 'Intentional'

In [43]:
df14['Age'] = 'Teen'

We then proceed to cleaning the data and combining them.

We then appended each file into one dataframe so that easier for analysis.

### 2. Append individual dataframes into one

In [44]:
dfs = [df,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14]

In [45]:
df_merged = reduce(lambda  left,right: left.append(right, sort = False), dfs)

In [46]:
df_merged.to_csv('Data/gun_violence.csv')

### 3. Verify row length matches after append

In [47]:
len(df_merged)

5457

In [48]:
len(df)+len(df2)+len(df3)+len(df4)+len(df5)+len(df6)+len(df7)+len(df8)+len(df9)+len(df10)+len(df11)+len(df12)+len(df13)+len(df14)

5457

# Cleaned data
Below is the cleaned data showing the age and category columns that were added to apply additional details of the data.

In [55]:
df_merged.sample(10)

Unnamed: 0,Incident Date,State,City Or County,Address,# Killed,# Injured,Operations,Category,Age
318,"January 16, 2015",California,Oakland,105th and Edes,2,2,,Mass Shootings,
123,"July 27, 2014",Missouri,Saint Louis,2000 block of East John Avenue,0,5,,Mass Shootings,
263,"June 15, 2015",Georgia,Lithonia,2600 block of Embarcadero Drive,0,1,,Accidental,Child
214,"June 11, 2016",Minnesota,Webster,4000 block of Elmore Avenue,0,4,,Mass Shootings,
123,"May 16, 2015",California,Perris,140 block of Metz Road,1,0,,Accidental,Child
199,"June 22, 2016",Washington,Olympia,500 block of Dutterow Road SE,3,1,,Mass Shootings,
278,"May 2, 2016",Missouri,Saint Louis (St Louis),I-70 and Grand Ave,0,1,,Intentional,Child
451,"November 9, 2014",Ohio,Cleveland,13301 Eaglesmere Ave,0,1,,Accidental,Teen
308,"March 6, 2016",Georgia,Roswell,890 Atlanta St,0,4,,Mass Shootings,
111,"August 25, 2016",Colorado,Aurora,E. Ohio Place and S. Buckley Road,1,0,,Accidental,Adult


Before we begin our analysis, we first seek to understand the spread of our data for each of the individual files that we have.

Show spread of data - no or records, mean, median, etc?

# Limitations
1.  Data Duplication
    - Mass shootings has an accompanying row that explains the datails of the event.
    - We have to filter # killed and # injured in our viz to avoid duplicates when showing full pictures of the data so our data is not over stated.

In [50]:
df_merged[(df_merged['City Or County'] == 'New Orleans') & (df_merged['Category'] == 'Mass Shootings')
         & (df_merged['# Injured'] == 17)]

Unnamed: 0,Incident Date,State,City Or County,Address,# Killed,# Injured,Operations,Category,Age
31,"November 22, 2015",Louisiana,New Orleans,1900 block of Gallier Street,0,17,,Mass Shootings,


In [51]:
df_merged[(df_merged['Incident Date'] == 'November 22, 2015') & (df_merged['# Injured'] == 17)]

Unnamed: 0,Incident Date,State,City Or County,Address,# Killed,# Injured,Operations,Category,Age
438,"November 22, 2015",Louisiana,New Orleans,1900 block of Gallier Street,0,17,,Intentional,Child
31,"November 22, 2015",Louisiana,New Orleans,1900 block of Gallier Street,0,17,,Mass Shootings,


2. Missing Data 
    - The operations column in the original data set contains a link to the incident report and source that shows criminal profiles, weapons involved, and description of incident. Therefore, our data is limited to the exclusion of this information.
    - As shown below, the Operations column shows null

In [52]:
df_merged.Operations.unique()

array([nan])

3. Officers involved data is laciking detail
    - Data only states if the officer participated in the occurence, but it does not show what his exact involvement was. For instance the data does not mention if the officer was injured, killed, or if the officer did the injuring or killing.
    - Our data does not describe the person who committed the act of gun violence (i.e. child, police officer, gang member)

In [53]:
df_merged[(df_merged['Incident Date'] == 'July 17, 2016') & (df_merged['# Injured'] == 3)]

Unnamed: 0,Incident Date,State,City Or County,Address,# Killed,# Injured,Operations,Category,Age
153,"July 17, 2016",Texas,Houston,6610 Tidwell,1,3,,Mass Shootings,
154,"July 17, 2016",Louisiana,Baton Rouge,9611 Airline Highway,4,3,,Mass Shootings,
22,"July 17, 2016",Louisiana,Baton Rouge,9611 Airline Highway,4,3,,Police Involvement,Adult


4. The data is missing intentional adult injuries and deaths
    - This shows that only child and teen are available in the Intentional category

In [54]:
df_merged[df_merged.Category == 'Intentional'].Age.unique()

array(['Child', 'Teen'], dtype=object)

# How we plan to use the data.
We plan to evaluate three potential analyses
1. Is there a correlation between Police involvement and crimes also involving children and teens?
2. Is there seasonality to gun violence?
3. Frequency of police involvement in acts of gun violence?