# <font> Project 1: A Deep Analysis on Gun Violence in the U.S. </font>

# Team 8. Karan and Junming

## <font color='brightred'> Table of Contents </font>
1. [Movitation and Introduction](#1)
2. [Questions](#2)
3. [Glimpse of Data](#3)
    1. [Statistical Overview of the Data](#4)
4. [Preparing the Data](#5)
    1. [Check for Missing Data](#6)
    2. [Data Cleaning](#7)
5. [Time Series Analysis](#8)
6. [Geographical Analysis](#9)
    1. [Non-Calibrated for Population](#10)
    2. [Population Adjusted](#11)
    3. [Most Impactful Incidents](#12)
7. [People Involved in Gun Violence](#13)
    1. [Characteristics of Age](#14)
    2. [Characteristics of Gender](#15)
8. [Gun Involved in Gun Violence](#16)
    1. [The Most Used Guns](#17)
    2. [Guns that Caused the Most Harm](#18)
    3. [Distribution of Incidents among Different Types of Guns](#19)
9. [Conclusion](#20)
10. [Citations](#21)



In [1]:
# import sys
# !conda install --yes --prefix {sys.prefix} plotly

In [2]:
import pandas as pd
import numpy as np
import numbers
import plotly
from plotly.offline import init_notebook_mode, iplot
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
import folium 
from folium import plugins

init_notebook_mode(connected=True)

# <a id='1'> 1. Motivation and Introduction </a>

While some state that gun violence in America has decreased, over the last 50 years, the number of premeditated mass shootings along with the magnitude of each shooting has increased. For example, the high profile Columbine high school massacre, which took place in 1999, was a highly planned attack that led to the death of 13 individuals while injuring 25 others. Last year, on October 1st, 2017, the perpetrator, Stephen Paddock, released gunfire at a concert in Las Vegas. The casualties included about 58 deaths, and over 500 injuries. The Las Vegas mass shooting is now known as the largest shooting in recent history. Therefore, because there is an upward trend in magnitude for each mass shooting along with other statistics which illustrate the magnitude of everyday gun violence, the statement that gun violence is decreasing seems questionable. Therefore, my partner and I are interested in digging deeper into the issue of gun violence in America to uncover some potential truths.

The significance of researching gun violence is that it can help shed some light on our pre-existing domain knowledge of gun violence in America. Apart from that, it can help determine other insightful information such as the highest percentage of gun violence based on location, collateral damage caused by gun violence, and many more. Currently, we hypothesize that gun violence is larger than it has ever been, and so, our goal is to really analyze that notion significantly. While performing our exploratory data analysis, we hope to get a broader understanding of the dynamics of gun violence in America and draw various staggering conclusions on the state of gun violence.

# <a id='2'> 2. Questions </a>

*1. What are some of the different trends associated with gun violence over time?*

*2. Could we explore some geographical pattern of gun violence in the U.S.?*

*3. What are the differences among different characteristics of gun violence?*

*4. What group has a higher probability of resorting to gun violence?*

# <a id='3'>3. Glimpse of Data</a>

In [3]:
gun_violence_df = pd.read_csv('gun_violence.csv') 
gun_violence_df.head(3)

Unnamed: 0,incident_id,date,state,city_or_county,address,n_killed,n_injured,incident_url,source_url,incident_url_fields_missing,...,participant_age,participant_age_group,participant_gender,participant_name,participant_relationship,participant_status,participant_type,sources,state_house_district,state_senate_district
0,461105,2013-01-01,Pennsylvania,Mckeesport,1506 Versailles Avenue and Coursin Street,0,4,http://www.gunviolencearchive.org/incident/461105,http://www.post-gazette.com/local/south/2013/0...,False,...,0::20,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male||1::Male||3::Male||4::Female,0::Julian Sims,,0::Arrested||1::Injured||2::Injured||3::Injure...,0::Victim||1::Victim||2::Victim||3::Victim||4:...,http://pittsburgh.cbslocal.com/2013/01/01/4-pe...,,
1,460726,2013-01-01,California,Hawthorne,13500 block of Cerise Avenue,1,3,http://www.gunviolencearchive.org/incident/460726,http://www.dailybulletin.com/article/zz/201301...,False,...,0::20,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male,0::Bernard Gillis,,0::Killed||1::Injured||2::Injured||3::Injured,0::Victim||1::Victim||2::Victim||3::Victim||4:...,http://losangeles.cbslocal.com/2013/01/01/man-...,62.0,35.0
2,478855,2013-01-01,Ohio,Lorain,1776 East 28th Street,1,3,http://www.gunviolencearchive.org/incident/478855,http://chronicle.northcoastnow.com/2013/02/14/...,False,...,0::25||1::31||2::33||3::34||4::33,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male||1::Male||2::Male||3::Male||4::Male,0::Damien Bell||1::Desmen Noble||2::Herman Sea...,,"0::Injured, Unharmed, Arrested||1::Unharmed, A...",0::Subject-Suspect||1::Subject-Suspect||2::Vic...,http://www.morningjournal.com/general-news/201...,56.0,13.0


## <a id='4'>3.1 Statistical Overview of Data</a>

In [4]:
gun_violence_df.describe() ##describes only numeric data

Unnamed: 0,incident_id,n_killed,n_injured,congressional_district,latitude,longitude,n_guns_involved,state_house_district,state_senate_district
count,239677.0,239677.0,239677.0,227733.0,231754.0,231754.0,140226.0,200905.0,207342.0
mean,559334.3,0.25229,0.494007,8.001265,37.546598,-89.338348,1.372442,55.447132,20.47711
std,293128.7,0.521779,0.729952,8.480835,5.130763,14.359546,4.678202,42.048117,14.20456
min,92114.0,0.0,0.0,0.0,19.1114,-171.429,1.0,1.0,1.0
25%,308545.0,0.0,0.0,2.0,33.9034,-94.158725,1.0,21.0,9.0
50%,543587.0,0.0,0.0,5.0,38.5706,-86.2496,1.0,47.0,19.0
75%,817228.0,0.0,1.0,10.0,41.437375,-80.048625,1.0,84.0,30.0
max,1083472.0,50.0,53.0,53.0,71.3368,97.4331,400.0,901.0,94.0


The table from above describes the information about the numeric columns of the gun violence data. Since the information is provided for only the numeric columns, and no information is provided about missing data, we created a more in-depth tool below to describe the information for all the attributes. 

# <a id='4'>4. Preparing the Data</a>

## <a id='6'>4.1 Check for missing data</a>

In [5]:
# Function to describe more information for all the attributes
def brief(data):
    
    df = data.copy()
    
    print("This dataset has {} Rows {} Attributes".format(df.shape[0],df.shape[1]), end='')
    print('\n')
    
    real_valued = {}
    symbolics = {}
    
    
    for i,col in enumerate(df.columns, 1):
        Missing = len(df[col]) - df[col].count()
        
        counter = 0
        for val in df[col].dropna():
            if isinstance(val, numbers.Number):
                    counter += 1
        
        if counter != len(df[col].dropna()):
            arity = len(df[col].dropna().unique())
            symbolics[i] = [i, col, Missing, arity]  
        else:
            Mean, Median, Sdev, Min, Max = df[col].mean(), df[col].median(), df[col].std(), df[col].min(), df[col].max()
            real_valued[i] =  [i, col, Missing, Mean, Median, Sdev, Min, Max]
            
    
    #Create array containing list of real valued
    real_valued_array = [real_valued[keys] for keys in real_valued.keys()]
    real_valued_transformed = np.array(real_valued_array).T
    
    symbolic_array = [symbolics[keys] for keys in symbolics.keys()]
    symbolic_transformed = np.array(symbolic_array).T
    
    # return symbolic_transformed
    real_cols = ['Attribute_ID', 'Attribute_Name', 'Missing', 'Mean', 'Median', 'Sdev', 'Min', 'Max']
    sym_cols = ['Attribute_ID', 'Attribute_Name', 'Missing','arity']
    
    
   
    index = range(1, len(real_valued.keys())+1)
    real_val_df = pd.DataFrame(data={unit[0]:unit[1] for unit in zip(real_cols, real_valued_transformed)}, index = index, columns=real_cols)
    

    index_sym = range(1, len(symbolics.keys())+1)
    sym_val_df = pd.DataFrame(data={unit[0]:unit[1] for unit in zip(sym_cols, symbolic_transformed)}, index = index_sym, columns = sym_cols)
    
    text = ("real valued attributes" + "\n" + "---------------------" 
            + "\n" + str(real_val_df) + "\n"  + "non-real valued attributes"  
            + "\n" + "-------------------" + "\n" + str(sym_val_df))
        
    return text


In [6]:
%time
print(brief(gun_violence_df))

CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 8.82 µs
This dataset has 239677 Rows 29 Attributes

real valued attributes
---------------------
   Attribute_ID               Attribute_Name Missing                 Mean  \
1             1                  incident_id       0    559334.3464037017   
2             6                     n_killed       0  0.25228953967214207   
3             7                    n_injured       0   0.4940065171042695   
4            10  incident_url_fields_missing       0                  0.0   
5            11       congressional_district   11944    8.001264638853394   
6            15                     latitude    7923   37.546598223116206   
7            17                    longitude    7923   -89.33834822915509   
8            18              n_guns_involved   99451   1.3724416299402393   
9            28         state_house_district   38772    55.44713172892661   
10           29        state_senate_district   32335   20.477110281563792

From the analysis above, you can see that some attributes such as participant_name and participant_relationship are missing almost as many values as the number of records in the dataset.

In [7]:
gun_violence_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239677 entries, 0 to 239676
Data columns (total 29 columns):
incident_id                    239677 non-null int64
date                           239677 non-null object
state                          239677 non-null object
city_or_county                 239677 non-null object
address                        223180 non-null object
n_killed                       239677 non-null int64
n_injured                      239677 non-null int64
incident_url                   239677 non-null object
source_url                     239209 non-null object
incident_url_fields_missing    239677 non-null bool
congressional_district         227733 non-null float64
gun_stolen                     140179 non-null object
gun_type                       140226 non-null object
incident_characteristics       239351 non-null object
latitude                       231754 non-null float64
location_description           42089 non-null object
longitude                    

We further compliment the analysis from before with additional information above. From the information above, it is clear that we will need to clean some of the data.

## <a id='7'>4.2 Data Cleaning</a>

In [8]:
# added important missing data point found in the description on Kaggle
missing =  ['sban_1', '2017-10-01', 'Nevada', 'Las Vegas', 'Mandalay Bay 3950 Blvd S', 59, 489, 'https://en.wikipedia.org/wiki/2017_Las_Vegas_shooting', 'https://en.wikipedia.org/wiki/2017_Las_Vegas_shooting', '-', '-', '-', '-', '-', '36.095', 'Hotel', 
            '-115.171667', 47, 'Route 91 Harvest Festiva; concert, open fire from 32nd floor. 47 guns seized; TOTAL:59 kill, 489 inj, number shot TBD,girlfriend Marilou Danley POI', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-']
gun_violence_df.loc[len(gun_violence_df)] = missing

print(gun_violence_df.shape)
drop_columns = gun_violence_df.columns[gun_violence_df.apply(lambda col: col.isnull().sum() >= (0.5 * len(gun_violence_df)))]
gun_violence_filtered = gun_violence_df.drop(drop_columns, axis=1)
print(gun_violence_filtered.shape)
print('Dropped Columns:', list(drop_columns))

(239678, 29)
(239678, 26)
Dropped Columns: ['location_description', 'participant_name', 'participant_relationship']


# <a id='8'>5. Time Series Analysis</a>

In this section, we will observe some of the different time trends of gun violence over the following years: 2013 to 2018.

In [9]:
gun_violence_filtered['date'] = pd.to_datetime(gun_violence_filtered['date'])
gun_violence_filtered = gun_violence_filtered.assign(year = gun_violence_filtered['date'].map(lambda dates: dates.year))
gun_violence_filtered = gun_violence_filtered.assign(month = gun_violence_filtered['date'].map(lambda dates: dates.month))
gun_violence_filtered = gun_violence_filtered.assign(day = gun_violence_filtered['date'].map(lambda dates: dates.weekday()))

y_yrs = gun_violence_filtered.groupby('year')['incident_id'].count().values
x_yrs = gun_violence_filtered.groupby('year')['incident_id'].count().index.values

y_months = gun_violence_filtered.\
            groupby(by=['year','month']).\
            agg('count').\
            groupby('month')['incident_id'].\
            mean().\
            values

x_months = ['Jan','Feb','Mar','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec']

y_days = gun_violence_filtered.\
            groupby(['year','day']).\
            agg('count').\
            groupby('day')['incident_id'].\
            mean().\
            values

x_days = ['Mon','Tues','Wed','Thurs','Fri','Sat','Sun']


trace1 = go.Bar(
    x=x_yrs,
    y=y_yrs
)
trace2 = go.Bar(
    x=x_months,
    y=y_months,
    xaxis='x2',
    yaxis='y2'
)
trace3 = go.Bar(
    x=x_days,
    y=y_days,
    xaxis='x3',
    yaxis='y3'
)

data = [trace1, trace2, trace3]
fig = plotly.tools.make_subplots(rows=3, cols=1, specs = [[{}], [{}],[{}]],vertical_spacing = 0.25, subplot_titles=('Number of Incidents per Year', 
                                                                 'Average Number of Incidents per Month over Years',
                                                                 'Average Number of Incidents per Day over Years'))

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 3, 1)

fig['layout']['xaxis1'].update(title='Years')
fig['layout']['xaxis2'].update(title='Months')
fig['layout']['xaxis3'].update(title='Days')


fig['layout']['yaxis1'].update(title='Count')
fig['layout']['yaxis2'].update(title='Avg. Frequency')
fig['layout']['yaxis3'].update(title='Avg. Frequency')


fig['layout'].update(showlegend=False, height=800, width=800, title='Incidents Over Time')
py.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]
[ (3,1) x3,y3 ]



Incidents per Year:
1. There seems to be an upward rising trend in gun violence incidents from 2014 to 2017. From 51,000 in 2014 to 61,000 in 2017
2. Since the data concludes on March 2018 (only 3 months into the year of 2018), there isn't enough data to determine the number of incidents in 2018. However, with only 3 months worth of data, it is astounding that 12,000 incidents have already been reported. 

Average Number of Incidents per Month Over the Years:
1. It seems like the most amount of incidents occur the summer season, with July and August having the highest count of incidents. Both months have approximately 4000 gun violence incidents.

Average Number of Incidents per Day over Years:
1. Again by observing the plot from above, the weekend is when most incidents occur. On Sunday, the frequency (in counts) of incidents over the years is 6000.

In [10]:
n_killed = gun_violence_filtered.\
                groupby('date').\
                sum()['n_killed'].values

n_injured = gun_violence_filtered.\
                groupby('date').\
                sum()['n_injured'].values

dates = gun_violence_filtered.\
                groupby('date').\
                count().\
                index

trace1 = go.Scatter(
    x = dates,
    y = n_killed,
    name = 'Number Killed',
    line = dict(
        dash = 'dot'
    )
)

trace2 = go.Scatter(
    x = dates,
    y = n_injured,
    name = 'Number Injured',
    line = dict(
        dash = 'dot'
    )
)

data = [trace1, trace2]

layout = dict(height=400,
              width=1000,
              title = 'Number of Total Incidents',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Count'),
              )

fig = dict(data = data, layout=layout)
iplot(fig)

n_killed_2017 = gun_violence_filtered[gun_violence_filtered.loc[:,'year'] == 2017].\
                    groupby('date').\
                    sum()['n_killed'].values

n_injured_2017 = gun_violence_filtered[gun_violence_filtered.loc[:,'year'] == 2017].\
                    groupby('date').\
                    sum()['n_injured'].values

dates_2017 = gun_violence_filtered[gun_violence_filtered.loc[:,'year'] == 2017].\
                    groupby('date').\
                    count().\
                    index



trace1 = go.Scatter(
    x = dates_2017,
    y = n_killed_2017,
    name = 'Number Killed',
    line = dict(
        dash = 'dot'
    )
)

trace2 = go.Scatter(
    x = dates_2017,
    y = n_injured_2017,
    name = 'Number Injured',
    line = dict(
        dash = 'dot'
    )
)

data = [trace1, trace2]

layout = dict(height=400,
              width=1000,
              title = 'Number of Incidents in 2017',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Count'),
              )

fig = dict(data = data, layout=layout)
iplot(fig)

Above there are two time series plots. One indicating the number of total incidents over all the years, and another indicating the number of incidents only in 2017. From both plots, we can observe that the number of injuries has been higher than the number of individuals killed from gun violence. Frightengly, in 2017, you can see the anomaly in the data, its the Las Vegas massacre where at over 500 people are injured.

# <a id='9'>6. Geographical Analysis</a>

# <a id='10'>6.1 Non-Calibrated for Population</a>

In [11]:
state = gun_violence_filtered.groupby('state')
state_incidents = state.count().sort_values(by='incident_id',ascending=False)['incident_id']
state_killed = state.sum()['n_killed']
state_injured = state.sum()['n_injured']

city = gun_violence_filtered.groupby('city_or_county')
city_incidents= city.count().sort_values(by='incident_id',ascending=False)['incident_id'].head(20)



trace = go.Bar(
    x = state_incidents.index,
    y = state_incidents,
)

layout = dict(height=400,
              width=1000,
              title =  'Top States with Highest Number of Gun Violence Incidents',
              yaxis = dict(title = 'Number of Incidents'),
              )

data = [trace]

fig = dict(data = data, layout=layout)
iplot(fig)

trace = go.Bar(
    x = city_incidents.index[:20],
    y = city_incidents,
)
    
layout = dict(height=400,
              width=1000,
              title = 'Top Twenty Cities with Highest Number of Gun Violence Incidents',
              yaxis = dict(title = 'Number of Incidents'),
             )
    
data = [trace]

fig = dict(data = data, layout=layout)
iplot(fig)

Without adjusting for different sizes of population in different states, it isn't suprising to see California as one of the top states with the highest number of gun violence incidents.

In addition, the number one state with the highest incidents seems to be Illinois, and the number one city with the highest incidents seems too be Chicago.

But since this is not adjusted for the different sizes of population, is any of this valid? We will see shortly.

In [12]:
trace1 = go.Bar(
    x = state_killed.index,
    y = state_killed,
    name = 'Number Killed'
)

trace2 = go.Bar(
    x = state_injured.index,
    y = state_injured,
    name = 'Number Injured'
)


data = [trace1, trace2]

layout = dict(height=400,
              width=1000,
              title = 'Number of People Injured/Killed Across States',
              yaxis = dict(title = 'Frequency'),
              )

fig = dict(data = data, layout=layout)
iplot(fig)

Again, from this chart, not only does Illinois have the highest number of incidents, but it has the highest number of injuries + deaths (as a sum). This might mean that not only are there frequent number of gun violence incidents in Illinois, but the magnitude of the incidents is far worse. Is this valid?

# <a id='11'>6.2 Population Adjusted</a>

In [13]:
population_adjusted_data = pd.read_html('https://www.enchantedlearning.com/usa/states/population.shtml')[1] #population data
population_adjusted_data['State'] = population_adjusted_data['State'].apply(lambda val: val[3:].strip())
pop_adj_dic = {k:v for k,v in population_adjusted_data.to_dict('split')['data']}

state_incidents = pd.DataFrame(state_incidents)
state_incidents['population'] = state_incidents.index.map(lambda states : pop_adj_dic[states])
state_incidents['adj_incidents'] = (state_incidents['incident_id']/state_incidents['population']) * 100000

state_incidents = state_incidents.sort_values(by='adj_incidents',ascending=False)['adj_incidents']

trace = go.Bar(
    x = state_incidents.index,
    y = state_incidents,
)

layout = dict(height=400,
              width=1000,
              title =  'Top States with Highest Number of Gun Violence Incidents Adjusted For Population',
              yaxis = dict(title = 'Number of Incidents'),
              )

data = [trace]

fig = dict(data = data, layout=layout)
iplot(fig)

After adjusting for population, it is suprising to see that the District of Columbia has the most number of incidents. You would think otherwise with it being our capital, and it having some of the highest security along with gun laws in this country. 

Furthermore, not only that, but Illinois is now ranked 6, behind South Carolina and Lousiana.

Finally, notice that California has dramatically shifted to the right hand sind of the plot. This is because after adjusting for population size, per every 100,000 people, there are only 41 incidents which shows that California is a relatively safe state.

In [14]:
state_killed = pd.DataFrame(state_killed)
state_killed['population'] = state_killed.index.map(lambda states : pop_adj_dic[states])
state_killed['adj_killings'] = (state_killed['n_killed']/state_killed['population']) * 100000

state_injured = pd.DataFrame(state_injured)
state_injured['population'] = state_injured.index.map(lambda states : pop_adj_dic[states])
state_injured['adj_injuries'] = (state_injured['n_injured']/state_injured['population']) * 100000




trace1 = go.Bar(
    x = state_killed.index,
    y = state_killed['adj_killings'],
    name = 'Number Killed'
)

trace2 = go.Bar(
    x = state_injured.index,
    y = state_injured['adj_injuries'],
    name = 'Number Injured'
)


data = [trace1, trace2]

layout = dict(height=400,
              width=1000,
              title = 'Number of People Injured/Killed Across States Adjusted for Population',
              yaxis = dict(title = 'Frequency'),
              )

fig = dict(data = data, layout=layout)
iplot(fig)

Once again, from this plot, we can correctly state that the District of Columbia not only has more incidents, but the magnitude of its gun violence incidents are large. Additionally, it seems like Illinois comes second in terms of the magnitude of injuries.

# <a id='12'>6.3 Most Impactful Incidents</a>

In [15]:
gun_violence_filtered['total_damage'] = gun_violence_filtered['n_injured'] + gun_violence_filtered['n_killed']

gun_violence_filtered.\
        loc[:,['date','year','state', 'city_or_county', 'address', 'total_damage']].\
        sort_values(by='total_damage', ascending = False).\
        head(10)

Unnamed: 0,date,year,state,city_or_county,address,total_damage
239677,2017-10-01,2017,Nevada,Las Vegas,Mandalay Bay 3950 Blvd S,548
130448,2016-06-12,2016,Florida,Orlando,1912 S Orange Avenue,103
217151,2017-11-05,2017,Texas,Sutherland Springs,216 4th St,47
101531,2015-12-02,2015,California,San Bernardino,1365 South Waterman Avenue,35
232745,2018-02-14,2018,Florida,Pompano Beach (Parkland),5901 Pine Island Rd,34
70511,2015-05-17,2015,Texas,Waco,4671 S Jack Kultgen Fwy,27
195845,2017-07-01,2017,Arkansas,Little Rock,220 W 6th St,25
137328,2016-07-25,2016,Florida,Fort Myers,3580 Evans Ave,21
11566,2014-04-02,2014,Texas,Fort Hood,Motor Pool Road and Tank Destroyer Boulevard,20
92624,2015-10-01,2015,Oregon,Roseburg,1140 Umpqua College Rd,19


Above are some of the most impactful incidents, measured by total damage (injuries + death) across the time frame of the dataset.

In [16]:
df = gun_violence_filtered[gun_violence_filtered['total_damage'] >= 10][['latitude', 'longitude', 'total_damage', 'n_killed']].dropna()
maps = folium.Map([39.50, -98.35],  zoom_start=4, tiles='Stamen Toner')
markers = []
for idx, row in df.iterrows():
    total = row['total_damage'] * 0.30   
    folium.CircleMarker([float(row['latitude']), float(row['longitude'])], radius=float(total), color='#ef4f61', fill=True).add_to(maps)
maps

Here we plot the most impactful incidents, total damage greater than 10, on the USA map. The sizes of the circles represent the magnitude of the incidents.

# <a id='13'>7. People involved in Gun Violence </a>

Another pattern we want to determine is whether there are certain groups(i.e. gender, age, etc.) that have higher probability of committing gun violence and being a victim. For example, if we were trying to measure gun violence based on gender, we would get the total count for males and females and plot a bar chart comparing gender to count. We can create more advanced visualizations related to groups and gun violence by possibly joining other datasets such as the public population data. 

## <a id='14'>7.1 Characteristics of Age</a>

To analyze this feature, we may use columns participant_age_group ,participant_gender and participant_type. As we can see below, the original columns are not easy to be intepret. We need some functions to make new clean columns.

In [17]:
gun_violence_filtered[['participant_age','participant_type','participant_gender']].head(4)

Unnamed: 0,participant_age,participant_type,participant_gender
0,0::20,0::Victim||1::Victim||2::Victim||3::Victim||4:...,0::Male||1::Male||3::Male||4::Female
1,0::20,0::Victim||1::Victim||2::Victim||3::Victim||4:...,0::Male
2,0::25||1::31||2::33||3::34||4::33,0::Subject-Suspect||1::Subject-Suspect||2::Vic...,0::Male||1::Male||2::Male||3::Male||4::Male
3,0::29||1::33||2::56||3::33,0::Victim||1::Victim||2::Victim||3::Subject-Su...,0::Female||1::Male||2::Male||3::Male


For the code below, we firstly create new columns with each cell is a dictionary with keys contains index of people involve and values contains corresponding information.

In [18]:
# Convert string into dictionary
def StringToDic(S1):
    dic1 = {}
    list1 = str(S1).split('||')
    for i in list1:
        try:
            index = i.split('::')[0]
            value = i.split('::')[1]
            dic1[index] = value
        except:
            pass
        
    return dic1
        
    
# Apply the function above to each column, creating new column
gun_violence_filtered['participant_age_dic'] \
= gun_violence_filtered['participant_age'].apply(lambda x: StringToDic(x))

gun_violence_filtered['participant_type_dic'] \
= gun_violence_filtered['participant_type'].apply(lambda x: StringToDic(x)) 

gun_violence_filtered['participant_gender_dic'] \
= gun_violence_filtered['participant_gender'].apply(lambda x: StringToDic(x)) 


# Create another two new column, with new dictionary mapping type and age, type and gender
mappingCol1='participant_type_dic'
def MapThroughRow(df,mappingCol1,mappingCol2):
    newDic = {'Victim':[],'Suspect':[]}
    for rowName,row in df.iterrows():
        for keys,values in row[mappingCol1].items():
            if (keys in row[mappingCol2]) and (values =='Victim'):
                newDic['Victim'].append(row[mappingCol2][keys])
            elif (keys in row[mappingCol2]) and ('Suspect' in values):
                newDic['Suspect'].append(row[mappingCol2][keys])
                
    return newDic

In [19]:
%time
mappingCol2 = 'participant_age_dic'
mappingCol3 = 'participant_gender_dic'
df = gun_violence_filtered
MapTypeAge = MapThroughRow(df,mappingCol1,mappingCol2)
for key,values in MapTypeAge.items():
    MapTypeAge[key] = [int(i) for i in values]
    
MapTypeGender = MapThroughRow(df,mappingCol1,mappingCol3)

print(len(MapTypeAge['Victim']))
print(len(MapTypeAge['Suspect']))
print(len(MapTypeGender['Victim']))
print(len(MapTypeGender['Suspect']))

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 10 µs
107429
110949
167025
179454


As the above number showing, there are some missing data regarding the age and gender information of both victims and suspects. Then, we can visulize the distribution of age for victims.

In [20]:
def countDic(L):
    dic = {}
    for i in L:
        if i not in dic:
            dic[i] = 1
        else:
            dic[i] += 1
    return dic

In [21]:
VicageList = list(countDic(MapTypeAge['Victim']).keys())
VicageCount = list(countDic(MapTypeAge['Victim']).values())
SusageList = list(countDic(MapTypeAge['Suspect']).keys())
SusageCount = list(countDic(MapTypeAge['Suspect']).values())

In [22]:
# For Victim
trace1 = go.Bar(
    x=VicageList,
    y=VicageCount,
    name='Age distribution of Victim',
    marker=dict(
        color='rgb(55, 83, 109)'
    )
)


data = [trace1]
layout = go.Layout(
    title='Age Distribution of Victims',
    xaxis=dict(
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)',
        ),
        range=[0,100]
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [23]:
# For Suspects
trace1 = go.Bar(
    x=SusageList,
    y=SusageCount,
    name='Age distribution of Suspects',
    marker=dict(
        color='maroon'
    )
)


data = [trace1]
layout = go.Layout(
    title='Age Distribution of Suspects',
    xaxis=dict(
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)',
        ),
        range=[0,100]
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

## <a id='15'>7.2 Characteristics of Gender</a>

In [24]:
VicGenderList = list(countDic(MapTypeGender['Victim']).keys())
VicGenderCount = list(countDic(MapTypeGender['Victim']).values())
SusGenderList = list(countDic(MapTypeGender['Suspect']).keys())
SusGenderCount = list(countDic(MapTypeGender['Suspect']).values())

In [25]:
# It has a incorrectly recorded data here, but it is fine
print((VicGenderList,VicGenderCount))
print(sum(VicGenderCount))

(['Male', 'Female', 'Male, female'], [136394, 30630, 1])
167025


In [26]:
print((SusGenderList,SusGenderCount))
print(sum(SusGenderCount))

(['Female', 'Male'], [11746, 167708])
179454


In [27]:
import plotly.plotly as py
import plotly.graph_objs as go

fig = {
  "data": [
    {
      "values": [136394, 30630],
      "labels": [
        "Male",
        "Female"
      ],
      "domain": {"x": [0, .48]},
      "name": "Victims",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    },
    {
      "values": [167708,11746],
      "labels": [
         "Male",
        "Female"
      ],
      "text":["Suspects"],
      "textposition":"inside",
      "domain": {"x": [.52, 1]},
      "name": "Proportion of Gender for Victims and Suspects",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "title":"Proportion of Gender for Victims and Suspects",
        "annotations": [
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "Victims",
                "x": 0.20,
                "y": 0.5
            },
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "Suspects",
                "x": 0.8,
                "y": 0.5
            }
        ]
    }
}
iplot(fig, filename='donut')

# <a id='16'>8. Guns involved in Gun Violence </a>

After reasoning data by space,time and people, we try to also discover insights in terms characteristics of gun. This can be accomplished by exploring the gun type column. For instance, we could make a basic bar graph to show the count of each gun type, sorting descendingly. From graph like this, we can locate the most used gun by suspects. Also, we can calculate the degree of severity of the gun defined by the total number of killings and injured and see which type of gun is the most dangerous. 


## <a id='17'>8.1 The Most Used Guns</a>

In [28]:
gun_violence_filtered[['gun_type']].head(7)

Unnamed: 0,gun_type
0,
1,
2,0::Unknown||1::Unknown
3,
4,0::Handgun||1::Handgun
5,
6,0::22 LR||1::223 Rem [AR-15]


As we can see above, the column of gun type also cannot be used directly. We need to make it into a dictionary with keys of gun type and values of the amount each gun type involve.

In [29]:
# Apply the function have defined above to each column, creating a new dictionary column
gun_violence_filtered['gun_type_dic'] \
= gun_violence_filtered['gun_type'].apply(lambda x: StringToDic(x))

In [30]:
def CountDfValue(df,col='gun_type_dic'):
    newDic = {}
    for index,row in df.iterrows():
        for key,value in row[col].items():
            if value not in newDic:
                newDic[value] = 1
            else:
                newDic[value] += 1
                
    return newDic

dicGun = CountDfValue(gun_violence_filtered)
del dicGun['Unknown']

In [31]:
gun_violence_filtered[['gun_type_dic']].head(7)

Unnamed: 0,gun_type_dic
0,{}
1,{}
2,"{'0': 'Unknown', '1': 'Unknown'}"
3,{}
4,"{'0': 'Handgun', '1': 'Handgun'}"
5,{}
6,"{'0': '22 LR', '1': '223 Rem [AR-15]'}"


In [32]:
gunList = []
gunCount = []
for i in sorted(dicGun.items(),key=lambda items:items[1],reverse=True):
    gunList.append(i[0])
    gunCount.append(i[1])

In [33]:
# For Victim
trace1 = go.Bar(
    x=gunList,
    y=gunCount,
    marker=dict(
        color='orange'
    )
)


data = [trace1]
layout = go.Layout(
    title='Distribution of Types of Gun',
    xaxis=dict(
        tickfont=dict(
            size=12,
            color='rgb(107, 107, 107)',
        )
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1,
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

## <a id='18'>8.2 Guns that Caused the Most Harm</a>

However, we may notice that knowing the total amount of each type of gun is not sufficient to have a better knowledge on which gun is the most dangerous one. To better undertand this, we should consider the total and average death or injured number that are correlated with each type of guns.

In [34]:
df['gun_type_appear'] = df['gun_type_dic'].apply(lambda x: set(x.values()))
def FurthurColCal(df=gun_violence_filtered,colToCal='n_injured',colToMap='gun_type_appear'):
    dicGunCal = {}
    
    for index,row in df.iterrows():
        for item in row[colToMap]:
            if item not in dicGunCal:
                dicGunCal[item] = [1]
                dicGunCal[item].append(int(row[colToCal]))
            else:
                dicGunCal[item][0] += 1
                dicGunCal[item][1] += int(row[colToCal])
    return dicGunCal  

In [35]:
dicGunInjured = FurthurColCal()
del dicGunInjured['Unknown']
dicGunKilled = FurthurColCal(colToCal='n_killed')
del dicGunKilled['Unknown']

In [36]:
GunInjuredList = [(key,values[1]/values[0],values[1]) for key,values in list(dicGunInjured.items())]
GunKilledList = [(key,values[1]/values[0],values[1]) for key,values in list(dicGunKilled.items())]
GunTotalList = [(injured[0],injured[1]+kill[1],injured[2]+kill[2]) for injured,kill in zip(GunInjuredList,GunKilledList)]
GunType = [i[0] for i in GunTotalList]
GunInjuredAverage = [i[1] for i in GunInjuredList]
GunInjuredTotal = [i[2] for i in GunInjuredList]
GunKilledAverage = [i[1] for i in GunKilledList]
GunKilledTotal = [i[2] for i in GunKilledList]
GunTotalAverage = [i[1] for i in GunTotalList]
GunTotalTotal = [i[2] for i in GunTotalList]

In [37]:
# A simple glance of one of the lists
GunTotalList[:5]

[('Handgun', 0.4321085808393435, 7609),
 ('223 Rem [AR-15]', 0.5761217948717949, 719),
 ('22 LR', 0.32266408018105397, 998),
 ('Shotgun', 0.4410112359550562, 1570),
 ('9mm', 0.3301324503311258, 1994)]

In [38]:
# For Victim
trace1 = go.Bar(
    x=GunType,
    y=GunInjuredAverage,
    marker=dict(
        color='orange'
    ),
    name = 'Average Injured'
)
trace2 = go.Bar(
    x=GunType,
    y=GunKilledAverage,
    marker=dict(
        color='red'
    ),
    name = 'Average Killed'
)
data = [trace1,trace2]
layout = go.Layout(
    title='Number of Average Injured and Killed Caused by Each Gun Type',
    xaxis=dict(
        tickfont=dict(
            size=12,
            color='rgb(107, 107, 107)',
        )
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1,
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [39]:
# For Victim
trace1 = go.Bar(
    x=GunType,
    y=GunTotalAverage,
    marker=dict(
        color='maroon'
    ),
    name = 'Average Injured & Killed'
)

data = [trace1]
layout = go.Layout(
    title='Number of Average Injured and Killed Caused by Each Gun Type',
    xaxis=dict(
        tickfont=dict(
            size=12,
            color='rgb(107, 107, 107)',
        )
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1,
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

#### 223 Rem [AR-15]

![](https://cdn3.volusion.com/vrfwc.byruj/v/vspfiles/photos/60531-2.jpg?1539359654)

In [40]:
# For Victim
trace1 = go.Bar(
    x=GunType,
    y=GunInjuredTotal,
    marker=dict(
        color='orange'
    ),
    name = 'Total Injured'
)
trace2 = go.Bar(
    x=GunType,
    y=GunKilledTotal,
    marker=dict(
        color='red'
    ),
    name = 'Total Killed'
)
data = [trace1,trace2]
layout = go.Layout(
    title='Number of Total Injured and Killed Caused by Each Gun Type',
    xaxis=dict(
        tickfont=dict(
            size=10,
            color='rgb(107, 107, 107)',
        )
    ),
    yaxis=dict(
        title='Count',
        range = [0,5000],
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1,
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

It can be seen that even though the average damage 223 Rem [AR-15] caused is the highest, the total damage it caused is not as high as that by handgun. This indicates that for guns like AK-47 or 223 Rem, most of the people would not use it. However, once they were being used, the damage they cause could be pretty high. 

Therefore, it is reasonable for us to wonder why different types of guns caused different average loss. One plausiable assumption may be certain types of guns like AK-47 are more likely to be involved in some dangerous incident types like mass shooting but for common weapon like handgun, it is not so likely. To discover this, we did the following analysis.


### <a id='19'>8.3 Distribution of Incidents among Different Types of Guns</a>

To achieve our goal, we need to focus on these two data columns.

In [41]:
gun_violence_filtered[['incident_characteristics','gun_type']].head(6)

Unnamed: 0,incident_characteristics,gun_type
0,Shot - Wounded/Injured||Mass Shooting (4+ vict...,
1,"Shot - Wounded/Injured||Shot - Dead (murder, a...",
2,"Shot - Wounded/Injured||Shot - Dead (murder, a...",0::Unknown||1::Unknown
3,"Shot - Dead (murder, accidental, suicide)||Off...",
4,"Shot - Wounded/Injured||Shot - Dead (murder, a...",0::Handgun||1::Handgun
5,"Shot - Dead (murder, accidental, suicide)||Hom...",


As we can see above, we still need some cleaning before plotting and it may be complex.

In [42]:
# Convert string into dictionary
def StringToList(S1):
    dic1 = {}
    list1 = str(S1).split('||')
        
    return list1

In [43]:
# Apply the function have defined above to each column, creating a new dictionary column
gun_violence_filtered['incident_dic'] \
= gun_violence_filtered['incident_characteristics'].apply(lambda x: StringToList(x))

In [44]:
gun_violence_filtered[['incident_dic']].head(4)

Unnamed: 0,incident_dic
0,"[Shot - Wounded/Injured, Mass Shooting (4+ vic..."
1,"[Shot - Wounded/Injured, Shot - Dead (murder, ..."
2,"[Shot - Wounded/Injured, Shot - Dead (murder, ..."
3,"[Shot - Dead (murder, accidental, suicide), Of..."


In [45]:
typeDic = {i:{'Shot - Wounded/Injured':0} for i in GunType}
typeDic['Unknown'] = {'Shot - Wounded/Injured':0}

In [46]:
# This function iterate through two column,
# Count appearances of element of one column(list) in terms of another column(dictionary)
def countIncidentType(df=gun_violence_filtered,incidentCol='incident_dic',typeCol='gun_type_appear',typeDic=typeDic):
    dic = {}
    
    for index,row in df.iterrows():
        for guntype in row[typeCol]:
            for incidentList in row[incidentCol]:
                if incidentList not in typeDic[guntype]:
                    typeDic[guntype][incidentList] = 1
                elif incidentList in typeDic[guntype]:
                    typeDic[guntype][incidentList] += 1
             
    return typeDic

In [47]:
def sortDic(dic):
    sortedDic = sorted(dic.items(),key = lambda item: item[1],reverse=True)
    return sortedDic

In [48]:
typeIncidentDic = countIncidentType()

In [49]:
incidentHandGun = [i[0] for i in sortDic(typeIncidentDic['Handgun'])][:15]
incidentHandGunCount = [i[1] for i in sortDic(typeIncidentDic['Handgun'])][:15]
incidentAR = [i[0] for i in sortDic(typeIncidentDic['223 Rem [AR-15]'])][:15]
incidentARCount = [i[1] for i in sortDic(typeIncidentDic['223 Rem [AR-15]'])][:15]
incidentAK = [i[0] for i in sortDic(typeIncidentDic['7.62 [AK-47]'])][:15]
incidentAKCount = [i[1] for i in sortDic(typeIncidentDic['7.62 [AK-47]'])][:15]
incidentRifle = [i[0] for i in sortDic(typeIncidentDic['Rifle'])][:15]
incidentRifleCount = [i[1] for i in sortDic(typeIncidentDic['Rifle'])][:15]
incidentShotgun = [i[0] for i in sortDic(typeIncidentDic['Shotgun'])][:15]
incidentShotgunCount = [i[1] for i in sortDic(typeIncidentDic['Shotgun'])][:15]
incident9mm = [i[0] for i in sortDic(typeIncidentDic['9mm'])][:15]
incident9mmCount = [i[1] for i in sortDic(typeIncidentDic['9mm'])][:15]

In [50]:
# A galance on two of the lists
print('The most frequent incident for Handgun:',sortDic(typeIncidentDic['Handgun'])[:5])
print()
print('The most frequent incident for Rifle:',sortDic(typeIncidentDic['Rifle'])[:5])

The most frequent incident for Handgun: [('Non-Shooting Incident', 9565), ('Possession (gun(s) found during commission of other crimes)', 5787), ('ATF/LE Confiscation/Raid/Arrest', 4042), ('Drug involvement', 3436), ('Brandishing/flourishing/open carry/lost/found', 3357)]

The most frequent incident for Rifle: [('Non-Shooting Incident', 1809), ('Possession (gun(s) found during commission of other crimes)', 1113), ('ATF/LE Confiscation/Raid/Arrest', 930), ('Drug involvement', 704), ('Possession of gun by felon or prohibited person', 629)]


In [51]:
# Distribution of Incident among Different Guns(we take 4 types here)
trace1 = go.Bar(
    x=incidentHandGun,
    y=incidentHandGunCount,
    marker=dict(
        color='orange'
    ),
    name = 'HandGun'
)
trace2 = go.Bar(
    x=incidentAR,
    y=incidentARCount,
    marker=dict(
        color='red'
    ),
    name = '223 Rem [AR-15]'
)
trace3 = go.Bar(
    x=incidentAK,
    y=incidentAKCount,
    marker=dict(
        color='maroon'
    ),
    name = '7.62 [AK-47]'
)
trace4 = go.Bar(
    x=incidentRifle,
    y=incidentRifleCount,
    marker=dict(
        color='purple'
    ),
    name = 'Rifle'
)
trace5 = go.Bar(
    x=incidentShotgun,
    y=incidentShotgunCount,
    marker=dict(
        color='plum'
    ),
    name = 'Shotgun'
)

trace6 = go.Bar(
    x=incident9mm,
    y=incident9mmCount,
    marker=dict(
        color='tan'
    ),
    name = '9mm'
)


fig = tools.make_subplots(rows=3, cols=2,subplot_titles=('Handgun', '223 Rem [AR-15]',
                                                          '7.62 [AK-47]', 'Rifle',
                                                          'Shotgun','9mm'))

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 3, 1)
fig.append_trace(trace6, 3, 2)

fig['layout'].update(height=1000, 
                     width=1000, 
                     title='Distribution of Incidents among Different Guns',
                     xaxis1=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                            )),
                     xaxis2=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                            )),
                     xaxis3=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                             )),
                    xaxis4=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                            )),
                    xaxis5=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                             )),
                    xaxis6=dict(
                            tickfont=dict(
                            size=7,
                            color='rgb(107, 107, 107)'
                             )))
iplot(fig, filename='simple-subplot-with-annotations')

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]
[ (3,1) x5,y5 ]  [ (3,2) x6,y6 ]



From the plots above, it can be seen that even though the total amount of incidents involving each gun type can be different, the distribution of types of incidents they were involved are similar. The top five incidents type for these six types of guns are assault weapon, non-shooting incident, possession, raid/arrest and drug involvement. This may indicate that our deduction from the previous graph may be wrong -- guns like handgun are also likely to be invloved in irregular and dangerous incidents. Namely, the enormous damage handguns has caused are not only happened in situations like self defence but situations like drug involvement. Therefore, different types of gun do not have significant differences in terms of incidents they were invloved. Every type of guns could be used if the suspects intend to do harm regardless of the characteristics of the incident.

## <a id='20'>9. Conclusion</a>

We conclude that while gun violence incidents are increasing, the magnitude of violence has stayed relatively the same. We base our analysis from the visualizations above. From the Incidents over Time visualization, we have observed that gun violence incidents have increased from 2014 to 2017. However, aside from the largest gun violence massacre in 2017, other gun violence fatalities have approximately the same number of injuries and deaths. In addition, from the time series plots, it is easy to observe that there are more injuries than deaths. This allows us to conclude that the incidents have increased, but the total damage has stayed nearly the same (apart from the Las Vegas shooting).

Furthermore, additional gun violence demographics include that most victims and suspects are aged between 18 and 30, and most suspects are male. Additionally, the most common type of guns are handguns while the most dangerous guns are the AR-15 and the AK-47. However, the distribution of incident types for each gun is the relatively the same. 

Finally, while alarming, we found it very interesting that the District of Columbia has the highest number of gun violence incidents even though it is our capital, and it has some of the strictest gun laws in the nations. We also found it bizzare that in terms of states, Alaska had the highest number of gun violence incidents. 

## <a id='21'>10. Citations/References</a>

1. A good amount of our inspiration came from the Kaggle notebook Deep Exploration of Gun Violence in US created by Shivam Bansal. It was inevitable that we would come across the notebook while learning about the dataset.
2. 95 percent of the code in this Jupyter notebook is written in our own method. In terms of ideas, we've combined some of his with ours, but we've added our own uniqueness throughout.
3. The folium map from above is completely from the Kaggle notebook, but we really liked the visualization. So, we adapted it, and included in our visualization. 