<a href="https://colab.research.google.com/github/pragyansharma24/Gun-Violence-in-United-States/blob/master/Gun_Violence_in_United_States.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gun Violence in the United States

### Authors: Pragyan Sharma, Prerna Patil, Shriya Rao, Sonal Agrawal
---






Over the years, there has been a continuous increase in the number of incidences related to gun violence in the United States. In addition to killing hundreds of Americans, it has forced them to live in a continuous fear of the next shooting. 

Therefore, in order to understand the magnitude of everyday gun violence, we have analyzed available data to understand:
<ul>
  <li><b>Crime statistics across United States</b></li>
  <li><b>Crime statistics across the world</b></li>
</ul>

In [0]:
import pandas as pd
import numpy as np

## Crime statistics across United States 

### Introduction: 
According to an article in New York Times, '*More people died from firearm injuries in the United States in 2017 than in any other year since at least 1968, according to new data from the Centers for Disease Control and Prevention*.' (Ref: https://www.nytimes.com/2018/12/18/us/gun-deaths.html)

In fact, 2017 witnessed the highest number of mass shootings alone, killing and injuring hundreds of innocent lives. In order to understand the seriousness of the situation, we visualized each of the mass shooting incidences on a map of United States.   

### Data Sources:
<ul>
  <li><b>Datasets:</b></li>
    <ul>
      <li><i>Mass Shootings: </i> (Ref: https://www.gunviolencearchive.org) </li>
      <li> <i>City Locations: </i> *to be added later* </li>
    </ul>
  <li><b>Variables Used</b>:</li>
    <ul>
      <li><i>Incident Date</i>: The date of the mass shooting incidence</li>
      <li><i>State</i>: State where the mass shooting incidence took place</li>
      <li><i>City</i>: City where the mass shooting incidence took place</li>
      <li><i>Address</i>: Address where the mass shooting incidence took place</li>
      <li><i>num_Killed</i>: Number of people killed in the mass shooting incidence</li>
      <li><i>num_Injured</i>: Number of people injured in the mass shooting incidence</li>
      <li><i>Latitude</i>: Latitude of the given location</li>
      <li><i>Longitude</i>: Longitude of the given location</li>
    </ul>

### Data Cleaning
The dataset needs some preparation before we start exploring and visualizing the dataset further. In order to do so:
<ul>
  <li>Column names have been replaced with shorter names</li>
  <li>'NAs' will be removed</li>
  <li>Adding latitude and longitude for each of the given locations</li>
  

In [0]:
# Merging data for mass shootings from 2013 to 2019
y2012_13 = pd.read_csv("2012&13.csv")
y2014 = pd.read_csv("2014.csv")
y2015 = pd.read_csv("2015.csv")
y2016 = pd.read_csv("2016.csv")
y2017 = pd.read_csv("2017.csv")
y2018 = pd.read_csv("2018.csv")
y2019 = pd.read_csv("2019.csv")

mass_shoot = pd.concat([y2012_13, y2014, y2015, y2016, y2017, y2018, y2019])
mass_shoot.head()

Unnamed: 0,Incident Date,State,City Or County,Address,# Killed,# Injured,Operations
0,"December 31, 2013",New York,Brooklyn,60 Glenmore Ave,0,6,
1,"December 28, 2013",Alabama,Montgomery,954 Highland Ave,3,5,
2,"December 26, 2013",Louisiana,Slidell,2144 First St,2,6,
3,"December 26, 2013",Louisiana,Lockport,313 Tenth St,3,3,
4,"December 25, 2013",New Jersey,Irvington,Nye Avenue and 21st Street,3,2,


In [0]:
# Renaming columns for efficient data handling
mass_shoot.rename(columns={'Incident Date':'Date','City Or County':'City', '# Killed':'num_Killed', '# Injured':'num_Injured'}, inplace=True)
mass_shoot.head()

Unnamed: 0,Date,State,City,Address,num_Killed,num_Injured,Operations
0,"December 31, 2013",New York,Brooklyn,60 Glenmore Ave,0,6,
1,"December 28, 2013",Alabama,Montgomery,954 Highland Ave,3,5,
2,"December 26, 2013",Louisiana,Slidell,2144 First St,2,6,
3,"December 26, 2013",Louisiana,Lockport,313 Tenth St,3,3,
4,"December 25, 2013",New Jersey,Irvington,Nye Avenue and 21st Street,3,2,


In [0]:
# Extracting year from the incident date column
from datetime import datetime as dt

mass_shoot['Date'] = pd.to_datetime(mass_shoot['Date'])
mass_shoot['Year'] = mass_shoot['Date'].dt.year
mass_shoot.head()

Unnamed: 0,Date,State,City,Address,num_Killed,num_Injured,Operations,Year
0,2013-12-31,New York,Brooklyn,60 Glenmore Ave,0,6,,2013
1,2013-12-28,Alabama,Montgomery,954 Highland Ave,3,5,,2013
2,2013-12-26,Louisiana,Slidell,2144 First St,2,6,,2013
3,2013-12-26,Louisiana,Lockport,313 Tenth St,3,3,,2013
4,2013-12-25,New Jersey,Irvington,Nye Avenue and 21st Street,3,2,,2013


In [0]:
# Reading in data for latitudes and longitudes for cities in the United States
cities = pd.read_csv("uscitiesv1.5.csv")

# Selecting relevant columns for further analysis
cities_sub = cities[['city','state_id', 'state_name','lat', 'lng']]
cities_sub.head()

Unnamed: 0,city,state_id,state_name,lat,lng
0,Prairie Ridge,WA,Washington,47.1443,-122.1408
1,Edison,WA,Washington,48.5602,-122.4311
2,Packwood,WA,Washington,46.6085,-121.6702
3,Wautauga Beach,WA,Washington,47.5862,-122.5482
4,Harper,WA,Washington,47.5207,-122.5196


In [0]:
# Merging data for mass shootings and location data for cities
city_data = mass_shoot.merge(cities_sub,how='left', left_on=['City', 'State'], right_on=['city', 'state_name'])
city_data.head()

Unnamed: 0,Date,State,City,Address,num_Killed,num_Injured,Operations,Year,city,state_id,state_name,lat,lng
0,2013-12-31,New York,Brooklyn,60 Glenmore Ave,0,6,,2013,Brooklyn,NY,New York,40.6501,-73.9496
1,2013-12-28,Alabama,Montgomery,954 Highland Ave,3,5,,2013,Montgomery,AL,Alabama,32.347,-86.2663
2,2013-12-26,Louisiana,Slidell,2144 First St,2,6,,2013,Slidell,LA,Louisiana,30.2882,-89.7826
3,2013-12-26,Louisiana,Lockport,313 Tenth St,3,3,,2013,Lockport,LA,Louisiana,29.6418,-90.5376
4,2013-12-25,New Jersey,Irvington,Nye Avenue and 21st Street,3,2,,2013,,,,,


In [0]:
# Dropping extra columns which we won't be using further
city_data.drop(columns=['Operations', 'city', 'state_name','state_id'], inplace=True)

In [0]:
# Checking number of rows in the combined dataframe
len(city_data)

2056

In [0]:
# Checking for NAs
city_data.isna().sum()

Date             0
State            0
City             0
Address          8
num_Killed       0
num_Injured      0
Year             0
lat            134
lng            134
dtype: int64

In [0]:
# Removing all the NA values
city_data = city_data.dropna()
city_data.head()

Unnamed: 0,Date,State,City,Address,num_Killed,num_Injured,Year,lat,lng
0,2013-12-31,New York,Brooklyn,60 Glenmore Ave,0,6,2013,40.6501,-73.9496
1,2013-12-28,Alabama,Montgomery,954 Highland Ave,3,5,2013,32.347,-86.2663
2,2013-12-26,Louisiana,Slidell,2144 First St,2,6,2013,30.2882,-89.7826
3,2013-12-26,Louisiana,Lockport,313 Tenth St,3,3,2013,29.6418,-90.5376
5,2013-12-25,New York,Medford,33A Cedarhurst Ave,1,3,2013,40.822,-72.9859


### Appplication
We will be using the combined dataset to analysis the following:
<ul>
  <li><b><i>Mass shootings across United States:</i></b> We will plot each of the incidents on a map to visualize the distribution of mass shootings across United States. This will help us identify the states where gun violence is comparatively higher.</li>
  <ul>
    <li>X-variable: Latitude of the location of the incidence</li>
    <li>Y-variable: Longitude of the location of the incidence</li>
  </ul>
  <li><b><i>Change in mass shooting incidents over the years:</i></b> We will plot the change in number of mass shooting incidences over the years (2013 - 2019) across United States. This will help identify any hidden trends in the data.</li>
  <ul>
    <li>X-variable: Year of the mass shooting incident</li>
    <li>Y-variable: Number of incidences</li>

## Crime statistics across the world

#### The dataset needs some preparation before we start exploring and visualizing the dataset further. In order to do so:
- Column names will be replaced with shorter names 
- 'NA' will be removed 


In [0]:
# Reading in data for world crimes
world_crime = pd.read_csv("world_crime.csv")
world_crime.head()

Unnamed: 0,Country/Territory,ISO code,Source,% of homicides by firearm,Number of homicides by firearm,"Homicide by firearm rate per 100,000 pop",Rank by rate of ownership,Average firearms per 100 people,Average total all civilian firearms
0,Albania,AL,CTS,65.9,56.0,1.76,70.0,8.6,270000.0
1,Algeria,DZ,CTS,4.8,20.0,0.06,78.0,7.6,1900000.0
2,Angola,AO,,,,,34.0,17.3,2800000.0
3,Anguilla,AI,WHO-MDB,24.0,1.0,7.14,,,
4,Argentina,AR,Ministry of Justice,52.0,1198.0,3.02,62.0,10.2,3950000.0


In [0]:
# Creating a new list of column names
cols = ['Country', 'Country_code', 'Source', '%_homi_gun', 'num_homi_gun', 'homi_gun_per100k', 'rank-rate_owner',\
        'avg_gun_per100', 'avg_tot_all_guns']

In [0]:
# Replacing the existing column names with new names saved in the list 'cols'
world_crime.columns = cols

In [0]:
world_crime.head()

Unnamed: 0,Country,Country_code,Source,%_homi_gun,num_homi_gun,homi_gun_per100k,rank-rate_owner,avg_gun_per100,avg_tot_all_guns
0,Albania,AL,CTS,65.9,56.0,1.76,70.0,8.6,270000.0
1,Algeria,DZ,CTS,4.8,20.0,0.06,78.0,7.6,1900000.0
2,Angola,AO,,,,,34.0,17.3,2800000.0
3,Anguilla,AI,WHO-MDB,24.0,1.0,7.14,,,
4,Argentina,AR,Ministry of Justice,52.0,1198.0,3.02,62.0,10.2,3950000.0


In [0]:
len(world_crime)

185

In [0]:
# Checking columns with 'NA' values
world_crime.isna().sum()

Country              0
Country_code         1
Source              69
%_homi_gun          69
num_homi_gun        69
homi_gun_per100k    69
rank-rate_owner      9
avg_gun_per100       9
avg_tot_all_guns     9
dtype: int64

In [0]:
# Removing all the 'NA' values
world_crime = world_crime.dropna()

In [0]:
len(world_crime)

107

In [0]:
# Reading in data for growth rates for all 'Developed' countries
country_index = pd.read_csv("developed.csv")
country_index.head()

Unnamed: 0,cca2,name,area,pop2019,GrowthRate
0,US,United States,9372610,329093.11,1.007119
1,JP,Japan,377930,126854.745,0.997401
2,TR,Turkey,783562,82961.805,1.012756
3,DE,Germany,357114,82438.639,1.001764
4,GB,United Kingdom,242900,66959.016,1.005791


In [0]:
# Checking for NAs
country_index.isna().sum()

cca2          0
name          0
area          0
pop2019       0
GrowthRate    0
dtype: int64

In [0]:
# Merging world crimes and growth rates dataset
world_data = world_crime.merge(country_index, how="inner", left_on="Country", right_on="name")
world_data.head()

Unnamed: 0,Country,Country_code,Source,%_homi_gun,num_homi_gun,homi_gun_per100k,rank-rate_owner,avg_gun_per100,avg_tot_all_guns,cca2,name,area,pop2019,GrowthRate
0,Australia,AU,NSO,11.5,30.0,0.14,42.0,15.0,3050000.0,AU,Australia,7692024,25088.636,1.012772
1,Austria,AT,CTS,29.5,18.0,0.22,14.0,30.4,2500000.0,AT,Austria,83871,8766.201,1.001643
2,Belgium,BE,WHO-MDB,39.5,70.0,0.68,34.0,17.2,1800000.0,BE,Belgium,30528,11562.784,1.005589
3,Canada,CA,CTS,32.0,173.0,0.51,13.0,30.8,9950000.0,CA,Canada,9984670,37279.811,1.008823
4,Cyprus,CY,CTS,26.3,5.0,0.46,6.0,36.4,275000.0,CY,Cyprus,9251,1198.427,1.007856


### Application

Graph1: Homicides by firearm per 1 million people

Graph2: Population vs Ownership of firearms 

Graph3: Gun related deaths per 100k people vs guns per 100 people