# Exploring Gun Violence Incidents in USA

Gun violence in the United States results in tens of thousands of deaths and injuries annually. In 2013, there were 73,505 nonfatal firearm injuries (23.2 injuries per 100,000 U.S. citizens), and 33,636 deaths due to "injury by firearms" (10.6 deaths per 100,000 U.S. citizens). These deaths consisted of 11,208 homicides, 21,175 suicides, 505 deaths due to accidental or negligent discharge of a firearm, and 281 deaths due to firearms use with "undetermined intent". Of the 2,596,993 total deaths in the US in 2013, 1.3% were related to firearms. The ownership and control of guns are among the most widely debated issues in the country.

This notebook represents is the exploration and visualizations of Gun Violence [Dataset](https://www.kaggle.com/jameslko/gun-violence-data). This data set consists of all the recorded gun violence incidents in USA since 2013. 

**Contents**

**1. Reading Data**  
**2. Creating Additional Features**   
**3. Exploration**    
&nbsp;&nbsp;&nbsp;&nbsp; **When they occured ?**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.1 Is the number of Gun Violence incidents per year increasing ? (from 2013 to 2017)  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.2 Which months has higher Gun Violence Incidents ?  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.3 Which Day-of-the-Week are not not safe ?   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.4 Which Day-of-the-Month is most safe ?  
&nbsp;&nbsp;&nbsp;&nbsp; **Where they occured ?**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.5 Which State had the highest number of Gun Violences  ?  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.6.1 Which locations in US witnessed these incidents ?      
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.6.2 Which location types have witnessed these incidents ?      
&nbsp;&nbsp;&nbsp;&nbsp; **Cities with high risk**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.7 California - Which Cities has high risk of gun violence ?   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.8 Texas - Which Cities has high risk of gun violence ?    
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.9 Florida - Which Cities has high risk of gun violence ?  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.10 Illinois - Which Cities has high risk of gun violence ?  
&nbsp;&nbsp;&nbsp;&nbsp; **What was mentioned in Incident Notes ?**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.11 California - Which keywords were used in Incident Notes  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.12 Texas - Which keywords were used in Incident Notes  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.13 Florida - Which keywords were used in Incident Notes  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.14 Illinois - Which keywords were used in Incident Notes  
&nbsp;&nbsp;&nbsp;&nbsp; **People Killed and Injured**  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.15 Total Loss and Total Incidents across different states  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.16 How many number of Guns were used in these incidents ?   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.17 Understanding the State wise Number of Incidents+Number of Killed+Number of Injured   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.18 Most Serious Incidents in last 4 years   
&nbsp;&nbsp;&nbsp;&nbsp; **Which Guns are used ?**   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.19 Which Gun Types were used ?   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.20 Which Guns are liked with highest loss ?    
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.21 Which Guns have killed maximum number of people ?    
**4. Population Adjusted Dataset**    
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.1 Cities having highest ratio of Incidents and Population Size  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.2 Cities having highest ratio of PeopleKilled and Population Size   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.3 Cities having highest ratio of PeopleInjured and Population Size     

In [None]:
# import the required libraries 

from plotly.offline import init_notebook_mode, iplot
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import plotly.graph_objs as go
from wordcloud import WordCloud
from textblob import TextBlob 

import plotly.plotly as py
from plotly import tools
import seaborn as sns
import pandas as pd
import string, os, random
import calendar

init_notebook_mode(connected=True)
punc = string.punctuation

## 1. Read the dataset

Read the dataset using pandas read_csv function. The data includes a column called "date" so we will parse it while the time of reading.

In [None]:
path = "../input/gun-violence-data_01-2013_03-2018.csv"
df = pd.read_csv(path, parse_dates = ['date'])
df.head(10)

## 2. Feature Engineering - Create Additional features

Lets create some additional features in our dataframe from the existing dataset 

- Year, Month, Monthday, Weekday
- Total Loss = sum of total killed and total injured

In [None]:
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['monthday'] = df['date'].dt.day
df['weekday'] = df['date'].dt.weekday
df['loss'] = df['n_killed'] + df['n_injured']

## 3. Exploration 

### 3.1. Is the number of Gun Violence incidents per year increasing ? (from 2013 to 2017)

In [None]:
# function to aggregate and return keys and values
def create_stack_bar_data(col):
    aggregated = df[col].value_counts()
    x_values = aggregated.index.tolist()
    y_values = aggregated.values.tolist()
    return x_values, y_values

x1, y1 = create_stack_bar_data('year')
x1 = x1[:-1]
y1 = y1[:-1]
trace1 = go.Bar(x=x1, y=y1, opacity=0.75, name="year count", marker=dict(color=['rgba(10, 220, 150, 0.6)', 'rgba(10, 220, 150, 0.6)', 'rgba(10, 220, 150, 0.6)', 'rgba(10, 220, 150, 0.6)', 'rgba(222,45,38,0.8)']))
layout = dict(height=400, title='Year wise Number of Gun Violence Incidents in US', legend=dict(orientation="h"));
fig = go.Figure(data=[trace1], layout=layout);
iplot(fig);

> - Number of gun violence incidents in US have increased by 10,000 from 2014 to 2017, while the number is continuously rising in 2018. There is a clear increasing trend as observed form the dataset. 
> - In this dataset, 2017 had the largest number of gun violence incidents equal to 61000. 

### 3.2. Which months has higher Gun Violence Incidents ?

In [None]:
x2, y2 = create_stack_bar_data('month')
mapp = {}
for m,v in zip(x2, y2):
    mapp[m] = v
xn = [calendar.month_abbr[int(x)] for x in sorted(x2)]
vn = [mapp[x] for x in sorted(x2)]

trace1 = go.Bar(x=xn, y=vn, opacity=0.75, name="month", marker=dict(color='rgba(100, 20, 10, 0.6)'))
layout = dict(height=400, title='Month Wise Number of Gun Violence Incidents in US', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

> - The first few months of the year has the highest number of gun violence incidents reported in US, while November has the lowest.
> - On an average about 20,000 incidinets are reported every month in US
> - March has the highest number of gun violence incidents (with about 22000 incidents) than any other month 

### 3.3. Which Day-of-the-Week are not not safe ?

In [None]:
x1, y1 = create_stack_bar_data('weekday')
weekmap = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}
x1 = [weekmap[x] for x in x1]
trace1 = go.Bar(x=x1, y=y1, opacity=0.75, name="weekday", marker=dict(color='rgba(21, 10, 250, 0.6)'))
layout = dict(height=400, title='Week Day wise number of Gun Violences', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

> - The data shows that weekends are not safe, as Saturdays and Sundays has the highest number of gun violence incidents than any other day. 
> - Thrusdays has the lowest number of gun violence incidents. 
> - This is probably due to people are involved in the jobs on the weekdays while they go out on weekends. 

### 3.4. Which Day-of-the-Month is most safe ?

In [None]:
x1, y1 = create_stack_bar_data('monthday')
trace1 = go.Bar(x=x1, y=y1, opacity=0.75, name="monthday", marker=dict(color='rgba(210, 120, 10, 0.6)'))
layout = dict(height=400, title='Month Day wise number of Gun Violences', legend=dict(orientation="h"));

fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

> - From the above graph, maximum number of incidents have occured on the 1st date of the month (about 8555) while least on last dates of the month (about 6000 on an average of 30th and 31st). 

### Plotting the time series of total people killed and injured 

In [None]:
# trace1 = go.Scatter(x = df.date,
#                     y = df.loss,
#                     mode = "lines",
#                     name = "citations",
#                     marker = dict(color = 'rgba(16, 112, 2, 0.8)'),
#                     )
# data = [trace1]
# layout = dict(title = 'Total Gun Violence Loss(Killed+Injured) over the time',
#               xaxis= dict(title='Date Time',ticklen= 5,zeroline= False)
#              )
# fig = dict(data = data, layout = layout)
# iplot(fig)

![](https://i.imgur.com/6gCx9pZ.png)

### 3.5. Which States are not safe ? 

In [None]:
states_df = df['state'].value_counts()

statesdf = pd.DataFrame()
statesdf['state'] = states_df.index
statesdf['counts'] = states_df.values

scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
            [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]

state_to_code = {'District of Columbia' : 'dc','Mississippi': 'MS', 'Oklahoma': 'OK', 'Delaware': 'DE', 'Minnesota': 'MN', 'Illinois': 'IL', 'Arkansas': 'AR', 'New Mexico': 'NM', 'Indiana': 'IN', 'Maryland': 'MD', 'Louisiana': 'LA', 'Idaho': 'ID', 'Wyoming': 'WY', 'Tennessee': 'TN', 'Arizona': 'AZ', 'Iowa': 'IA', 'Michigan': 'MI', 'Kansas': 'KS', 'Utah': 'UT', 'Virginia': 'VA', 'Oregon': 'OR', 'Connecticut': 'CT', 'Montana': 'MT', 'California': 'CA', 'Massachusetts': 'MA', 'West Virginia': 'WV', 'South Carolina': 'SC', 'New Hampshire': 'NH', 'Wisconsin': 'WI', 'Vermont': 'VT', 'Georgia': 'GA', 'North Dakota': 'ND', 'Pennsylvania': 'PA', 'Florida': 'FL', 'Alaska': 'AK', 'Kentucky': 'KY', 'Hawaii': 'HI', 'Nebraska': 'NE', 'Missouri': 'MO', 'Ohio': 'OH', 'Alabama': 'AL', 'Rhode Island': 'RI', 'South Dakota': 'SD', 'Colorado': 'CO', 'New Jersey': 'NJ', 'Washington': 'WA', 'North Carolina': 'NC', 'New York': 'NY', 'Texas': 'TX', 'Nevada': 'NV', 'Maine': 'ME'}
statesdf['state_code'] = statesdf['state'].apply(lambda x : state_to_code[x])

data = [ dict(
        type='choropleth',
        colorscale = scl,
        autocolorscale = False,
        locations = statesdf['state_code'],
        z = statesdf['counts'],
        locationmode = 'USA-states',
        text = statesdf['state'],
        marker = dict(
            line = dict (
                color = 'rgb(255,255,255)',
                width = 2
            ) ),
        colorbar = dict(
            title = "Gun Violence Incidents")
        ) ]

layout = dict(
        title = 'State wise number of Gun Violence Incidents',
        geo = dict(
            scope='usa',
            projection=dict( type='albers usa' ),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    
fig = dict( data=data, layout=layout )
iplot( fig, filename='d3-cloropleth-map' )

> - From the above visual, Illinoies has the highest number of incidents (close to 17,000) followed by California with 16K, Florida with 15K and Texas with 13K gun violence incidents reported in last few years. 
> - Some of the states having lowest gun violence incidents include Hawaii (289 incidents), Vermont (472 incidents), Wyoming (494 incidents), South Dakota (544 incidetns), and North Dakota (573 incidents)

### 3.6.1 Where in US have these accidents occured

Lets plot the latlongs of the incidents in order to understand and visualize where in US have these accidents occured. 

The code is commented, but the following graph can be generated from the same code. 

#### Locations of Gun Violence Incidents

In [None]:
# scl = [ [0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
#     [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"] ]

# data = [ dict(
#         type = 'scattergeo',
#         locationmode = 'USA-states',
#         lon = df['longitude'],
#         lat = df['latitude'],
#         text = df['city_or_county'],
#         mode = 'markers',
#         marker = dict(
#             size = 2,
#             opacity = 0.8,
#             reversescale = True,
#             autocolorscale = False,
#             symbol = 'point',
#             colorscale = scl,
#             cmin = 0,
#         ))]

# layout = dict(
#         title = 'Most trafficked US airports<br>(Hover for airport names)',
#         colorbar = True,
#         geo = dict(
#             projection=dict( type='albers usa' ),
#             showland = True,
#             landcolor = "rgb(250, 250, 250)",
#             subunitcolor = "rgb(217, 217, 217)",
#             countrycolor = "rgb(217, 217, 217)",
#             countrywidth = 0.5,
#             subunitwidth = 0.5
#         ),
#     )

# fig = dict( data=data, layout=layout )
# iplot( fig, validate=False, filename='d3-airports' )

![](https://image.ibb.co/kbbRmx/newplot_18.png)

> - From this graph, it is observed that East side of US is relatively not safe as compared to west side. 
> - Central US has also witnessed less number of incidents as compared to extereme west or exteme east. 

### 3.6.2 Which location types ? 

In [None]:
td1 = df['location_description'].value_counts()

trace1 = go.Bar(
    y=list(reversed(list(td1.index[:15]))),
    x=list(reversed(list(td1.values[:15]))),
    name='Location Types',
    orientation = 'h',
    marker=dict(color='rgb(80,22,25)'),
    opacity=0.4
)

data = [trace1]
layout = go.Layout(
    width=800,
    margin=dict(l=300),
    barmode='group',
    legend=dict(dict(x=-.1, y=1.2)),
    title = 'State wise number of Gun Violence Incidents and Total Loss',
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

> - Places such as Walmart, Burger King, McDonald are very prone to gun violence incidents

### 3.7. California - Which Cities has high risk of gun violence ?

In [None]:
# top cities of california 
california_df = df[df['state'] == 'California']
agg = california_df['city_or_county'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Pie(labels=labels, values=values, marker=dict(colors=['red']))
layout = dict(title='Top 10 Cities with Gun Violence in California', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

# california_df['city_or_county'].value_counts()

> - Oakland with 1478, LA with 1066, and Fresno with 1057 are the top 3 cities of California having highest number of gun violence incidents
> - Most safe areas include Lynwood, Yolo, Chico

### 3.8. Texas - Which Cities has high risk of gun violence ?

In [None]:
texas_df = df[df['state'] == 'Texas']
agg = texas_df['city_or_county'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Bar(x=values, y=labels, opacity=0.75, orientation='h', name="month", marker=dict(color='rgba(0, 20, 50, 0.6)'))
trace1 = go.Pie(labels=labels, values=values)
layout = dict(title='Top 10 Cities with Gun Violence Incidents in Texas', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

# texas_df['city_or_county'].value_counts()

> - Houstan (2480 incidents), San Antonia (1687 incidents), and Dallas (1152 incidents) are the cities in Texas having highest number of gun violence incidents 
> - Most safe areas of Texas include Sudan, Castroville, and Mckinney

### 3.9. Florida - Which Cities has high risk of gun violence ?

In [None]:
florida_df = df[df['state'] == 'Florida']
agg = florida_df['city_or_county'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Bar(x=values, y=labels, opacity=0.75, orientation='h', name="month", marker=dict(color='rgba(0, 20, 50, 0.6)'))
trace1 = go.Pie(labels=labels, values=values)
layout = dict(title='Top 10 Cities with Gun Violence Incidents in Florida', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

# florida_df['city_or_county'].value_counts()

> - Jacksonville (2317 incidents), Orlando (1020 incidents), and Miami (837 incidents) are the top three most risky cities of florida having high gun violence
> - Homestead (Princeton), Trquesta, and Fort Pierce are some of the most safe areas of Florida 

### 3.10. Illinois - Which Cities has high risk of gun violence ?

In [None]:
illinois_df = df[df['state'] == 'Illinois']
agg = illinois_df['city_or_county'].value_counts()[:10]
labels = list(reversed(list(agg.index )))
values = list(reversed(list(agg.values)))

trace1 = go.Bar(x=values, y=labels, opacity=0.75, orientation='h', name="month", marker=dict(color='rgba(0, 20, 50, 0.6)'))
trace1 = go.Pie(labels=labels, values=values)
layout = dict(title='Top 10 Cities with Gun Violence Incidents in Illinois', legend=dict(orientation="h"));


fig = go.Figure(data=[trace1], layout=layout)
iplot(fig, filename='stacked-bar')

> - Chicago with more than 10,000 gun violence incidents is the city having highest number of incidents. 
> - This not only includes Illinois but the entire United States of America 

### 3.11. California - Which keywords were used in Incident Notes 

In [None]:
# california_df['notes'] = california_df['notes'].fillna(" ")
california_df['notes'] = california_df['notes'].fillna('').astype(str)
killed_notes = " ".join(california_df['notes']).lower()
# for noise in ['shot', 'near', 'victim', 'vic', 'suspect', 'killed','perp','man', 'nan' 'inj']:
#         killed_notes = killed_notes.replace(noise," ")

In [None]:
wordcloud = WordCloud(max_font_size=50, width=600, height=300).generate(killed_notes)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Keywords in California incidents", fontsize=35)
plt.axis("off")
plt.show() 

### 3.12. Texas - Which keywords were used in Incident Notes 

In [None]:
texas_df['notes'] = texas_df['notes'].fillna(" ")

killed_notes = " ".join(texas_df['notes']).lower()
for noise in ['shot', 'near', 'victim', 'vic', 'suspect', 'killed','perp','man', 'inj']:
        killed_notes = killed_notes.replace(noise," ")

In [None]:
wordcloud = WordCloud(max_font_size=50, width=600, height=300).generate(killed_notes)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Keywords in Texas incidents", fontsize=35)
plt.axis("off")
plt.show() 

### 3.13. Florida - Top Keywords in Incident Notes 

In [None]:
florida_df['notes'] = florida_df['notes'].fillna(" ")
killed_notes = " ".join(florida_df['notes']).lower()
for noise in ['shot', 'near', 'victim', 'vic', 'suspect', 'killed','perp','man', 'inj']:
        killed_notes = killed_notes.replace(noise," ")

In [None]:
wordcloud = WordCloud(max_font_size=50, width=600, height=300).generate(killed_notes)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Keywords in Florida incidents", fontsize=35)
plt.axis("off")
plt.show() 

### 3.14. Illinois - Top Keywords in Incident Notes

In [None]:
illinois_df['notes'] = illinois_df['notes'].fillna(" ")
killed_notes = " ".join(illinois_df['notes']).lower()
for noise in ['shot', 'near', 'victim', 'vic', 'suspect', 'killed','perp','man', 'inj']:
        killed_notes = killed_notes.replace(noise," ")

In [None]:
wordcloud = WordCloud(max_font_size=50, width=600, height=300).generate(killed_notes)
plt.figure(figsize=(15,8))
plt.imshow(wordcloud)
plt.title("Top Keywords in Illinois incidents", fontsize=35)
plt.axis("off")
plt.show() 

> - A larger number of keywords related to "gang" are used. This shows that a large number of incidents involved gangs and probably more loss in Illinois. 

### 3.15. Total Loss and Total Incidents across different states

In [None]:
statdf = df.reset_index().groupby(by=['state']).agg({'loss':'sum', 'year':'count'}).rename(columns={'year':'count'})
statdf['state'] = statdf.index

trace1 = go.Bar(
    x=statdf['state'],
    y=statdf['count'],
    name='Count of Incidents',
    marker=dict(color='rgb(255,10,225)'),
    opacity=0.6
)
trace2 = go.Bar(
    x=statdf['state'],
    y=statdf['loss'],
    name='Total Loss',
    marker=dict(color='rgb(58,22,225)'),
    opacity=0.6
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='group',
    legend=dict(dict(x=-.1, y=1.2)),
    title = 'State wise number of Gun Violence Incidents and Total Loss',
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

> - Apart from the states having more number of incidents, total loss (number of people killed and injured) is also higher in North Carolina, New York, Louisiana, and Georgia. 
> - California may have high gun violence incidents but the total loss is not completely proportial to the number of incidents as there is relatively a larger gap in incidents to loss number. 

### 3.16. In what number of cases, Number of Guns Used was greater than 1 ? 

In [None]:
df['n_guns'] = df['n_guns_involved'].apply(lambda x : "10+" if x>=10 else str(x))
# df['n_guns'].value_counts()
tempdf = df['n_guns'].value_counts().reset_index()
tempdf = tempdf[tempdf['index'] != 'nan']
tempdf = tempdf[tempdf['index'] != '1.0']

labels = list(tempdf['index'])
values = list(tempdf['n_guns'])

trace1 = go.Pie(labels=labels, values=values, marker=dict(colors = ['#FEBFB3', '#E1396C', '#96D38C', '#D0F9B1', '#c0d1ed', '#efaceb', '#f5f794', '#94f794', '#fcc771']))
layout = dict(height=500, title='Number of Guns Used (More than 1)', legend=dict(orientation="h"));
fig = go.Figure(data=[trace1], layout=layout)
iplot(fig)

> - In about 8000 cases, 2 guns were used in the gun violence incidents. There are about 127548 cases in which single gun was used. 
> - There are about 1100 incidents in which more than 10 guns were used. 
> - The number of guns is not known for about 99451 cases

### 3.17. Understanding which state has high number of incidents, number of people killed, and number of people injured 

In [None]:
stdf = df.reset_index().groupby(by=['state']).agg({'n_killed':'sum', 'n_injured':'sum', 'year': 'count'}).reset_index().rename(columns={'year':'count'})

def normalize_loss(num):
    ranges = [100,200,300,400,500,600,700,800,900,1000,2000,3000,5000,8000,10000,20000]
    sizes = [2,3,4,5,6,8,10,12,14,18,21,24,33,45,70,100]
    for i, lim in enumerate(ranges):
        if num <= lim:
            return sizes[i]

stdf['size'] = stdf['count'].apply(normalize_loss)

from plotly.graph_objs import *

data = [
    {
        'x': stdf['n_killed'],
        'y': stdf['n_injured'],
        'text' : stdf['state'],
        'mode': 'markers',
        'marker': {
            'color': stdf['count'],
            'size': stdf['size'],
            'showscale': True,
            'colorbar' : ColorBar(
                title='Number of Incidents'
            ),
        }
    }
]

layout = go.Layout(
    title="Every Bubble: State | BubbleSize: TotalIncidents | X-axis:No. Killed | Y-axis: No. Injured",
    xaxis=dict(
        autorange=True,
        showgrid=True,
        zeroline=False,
        showline=True,
        autotick=True,
        ticks='',
        showticklabels=True,
        title="Number of People Killed"
    ),
    yaxis=dict(
        autorange=True,
        showgrid=True,
        zeroline=False,
        showline=True,
        autotick=True,
        ticks='',
        showticklabels=True,
        title="Number of People Injured"
    ))
fig = go.Figure(data=data, layout=layout)

iplot(fig, filename='scatter-colorscale')


> - From the above multidimentional graph, more number of people have been injured in Illinois than any other state (about 13K)
> - More number of people have been killed in gun violence incidents in California (7644 people) than any other state
> - Flordia, Illinois, Califronia, and Texas are again related with high number of people killed and injured

### 3.18 Most Serious Incidents - (Maximum Injured + Killed)

In [None]:
df1 = df.sort_values(['loss'], ascending=[False])
df1[['date', 'state', 'city_or_county', 'address', 'n_killed', 'n_injured']].head(10)

In [None]:
limits = [(0,2),(3,10),(11,20),(21,50),(50,3000)]
colors = ["rgb(0,116,217)","rgb(255,65,54)","rgb(133,20,75)","rgb(255,133,27)","lightgrey"]


dfs = df1.head(20)
data = [ dict(
        type = 'scattergeo',
        locationmode = 'USA-states',
        lon = dfs['longitude'],
        lat = dfs['latitude'],
        text = dfs['city_or_county'],
        mode = 'markers',
        marker = dict(
            size = dfs['loss']/1.5,
            opacity = 0.7,
            cmin = 0,
        ))]

layout = dict(
        title = 'Most Serious Gun Violence Incidents in US',
        colorbar = True,
        geo = dict(
            projection=dict( type='albers usa' ),
            subunitcolor = "rgb(221, 221, 221)",
            subunitwidth = 1.0
        ),
    )

fig = dict( data=data, layout=layout )
iplot( fig, validate=False)

> - Orlando Shootings in 2016 when about 50 people were killed and other 50 injured remains the most serious incident of past few years. 
> - Most recent texas shootings in November, 2017 was another serious incident when 25+ people were killed


### 3.20 Gun Used in the Incidents

It will be interesting to note which guns were used in the incidents and which guns has maximum destruction

In [None]:
# colks = ['gun_type', 'incident_characteristics', 'location_description']

df['gun_type_parsed'] = df['gun_type'].fillna('0:Unknown')
gt = df.groupby(by=['gun_type_parsed']).agg({'n_killed': 'sum', 'n_injured' : 'sum', 'state' : 'count'}).reset_index().rename(columns={'state':'count'})

results = {}
for i, each in gt.iterrows():
    wrds = each['gun_type_parsed'].split("||")
    for wrd in wrds:
        if "Unknown" in wrd:
            continue
        wrd = wrd.replace("::",":").replace("|1","")
        gtype = wrd.split(":")[1]
        if gtype not in results: 
            results[gtype] = {'killed' : 0, 'injured' : 0, 'used' : 0}
        results[gtype]['killed'] += each['n_killed']
        results[gtype]['injured'] +=  each['n_injured']
        results[gtype]['used'] +=  each['count']

gun_names = list(results.keys())
used = [each['used'] for each in list(results.values())]
killed = [each['killed'] for each in list(results.values())]
injured = [each['injured'] for each in list(results.values())]
danger = []
for i, x in enumerate(used):
    danger.append((killed[i] + injured[i]) / x)

trace1 = go.Bar(x=gun_names, y=used, name='SF Zoo', orientation = 'v',
    marker = dict(color = '#b58af2', 
        line = dict(color = '#b58af2', width = 1) ))
data = [trace1]
layout = dict(height=400, title='Useage of different Gun Types', legend=dict(orientation="h"));
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='marker-h-bar')

> - Handgun is the most used gun in these incidents with over 25,000 total such cases.
> - Handgun is follwed by 9mm with 6459 and Shotgun with 4270 incidents

### 3.21 Which Gun is associated with highest number of People Killed and People Injured (Most Dangerous Gun Types)

To get which gun is the most dangerous and is associated with high number of people killed and injured, I calculated the sum of people killed and injured divided by the total number of times that gun was used. 

In [None]:
trace1 = go.Bar(x=gun_names, y=danger, name='SF Zoo', orientation = 'v', marker = dict(color = '#ef92ac',  line = dict(color = '#ef92ac', width = 1) ))
data = [trace1]
layout = dict(height=400, title='(PeoplKilled + PeopleInjured) / GunUsedCount', legend=dict(orientation="h"));
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='marker-h-bar')

> - AK 47 and AR 15 are the guns associated with high number of people killed and injured, which indicates that these are the guns which leads to maximum loss.
> - Though handgun was the most used gun but the number of people killed and injured with respect to its useage is lesser. 

### 3.22 People Killed and People Injured with respect to gun type

In [None]:
# Create traces
# trace0 = go.Bar(x = gun_names, y = danger, name = '# Used')
trace1 = go.Bar(x = gun_names, y = killed, name = 'People Killed' ,   marker=dict(color='rgb(255,10,225)', opacity=0.6))
trace2 = go.Bar(x = gun_names, y = injured, name = 'People Injured',     marker=dict(color='rgb(58,22,225)', opacity=0.6))
data = [trace1, trace2]
iplot(data, filename='line-mode')

## 4. Population Adjusted Dataset

The above analysis was centered around absolute numbers of Incidents and the top locations, but for a better analysis, a population adjusted dataset is required. I added the population dataset of top 1000 cities obtained from the following website and added some new population adjusted features: 

http://www.biggestuscities.com/top-1000 

In [None]:
import seaborn as sns

population = {'Carol Stream': '40069', 'La Quinta': '40956', 'Leander': '42761', 'Edinburg': '87650', 'Chino Hills': '78822', 'Pinellas Park': '52137', 'Pacifica': '39062', 'San Luis Obispo': '47536', 'Apache Junction': '39954', 'Toledo': '278508', 'Nampa': '91382', 'Jurupa Valley': '103541', 'Renton': '100953', 'Joplin': '52195', 'Pomona': '152494', 'Hoover': '84978', 'Columbia': '134309', 'Maple Grove': '69576', 'Rowlett': '61999', 'Bayonne': '66238', 'Fremont': '233136', 'Scottsdale': '246645', 'Bozeman': '45250', 'El Paso': '683080', 'Santa Maria': '106290', 'Cedar Park': '68918', 'Shelton': '41334', 'Atlanta': '472522', 'Kettering': '55306', 'Bartlett': '58622', 'Orem': '97499', 'Bullhead City': '39970', 'Bradenton': '55687', 'Waterloo': '67934', 'Scranton': '77291', 'Glendora': '51851', 'Lakeland': '106420', 'West Valley City': '136574', 'Moreno Valley': '205499', 'Brea': '42471', 'Normal': '54264', 'Bolingbrook': '74518', 'Rocky Mount': '55466', 'Redding': '91808', 'Chico': '91567', 'Virginia Beach': '452602', 'Charlotte': '842051', 'Parker': '51163', 'Bloomington': '85319', 'Wake Forest': '40112', 'Beaumont': '118299', 'Huntersville': '54839', 'Oakland Park': '44326', 'Beavercreek': '46376', 'Broken Arrow': '107403', 'Seattle': '704352', 'Santa Barbara': '91930', 'Sandy': '95836', 'El Monte': '115807', 'Grand Island': '51517', 'Santa Monica': '92478', 'Ocala': '59253', 'Oro Valley': '43781', 'Waco': '134432', 'Midwest City': '57305', 'Allentown': '120443', 'Garden Grove': '174858', 'Bellflower': '77790', 'Denver': '693060', 'Minot': '48743', 'Des Plaines': '58141', 'Burlington': '52709', 'Orange': '140504', 'Dunwoody': '48884', 'San Mateo': '103959', 'Minnetonka': '52369', 'Eagan': '66428', 'Oceanside': '175464', 'Avondale': '82881', 'Alameda': '78906', 'San Diego': '1406630', 'Haverhill': '62873', 'Waukegan': '88182', 'Daytona Beach': '66645', 'Grove City': '39721', 'Altoona': '44589', 'Hamilton': '62127', 'Kirkland': '87701', 'Honolulu': '351792', 'Macon-Bibb County': '152555', 'Stillwater': '49504', 'Colorado Springs': '465101', 'Cedar Hill': '48343', 'Jupiter': '63813', 'Overland Park': '188966', 'Boise City': '223154', 'Missoula': '72364', 'Burbank': '104447', 'Youngstown': '64312', 'Sunnyvale': '152771', 'Hoboken': '54379', 'Ankeny': '58627', 'Harrisonburg': '53078', 'Rancho Santa Margarita': '48969', 'Fitchburg': '40414', 'Revere': '53157', 'Fountain Valley': '56529', 'San Bruno': '42957', 'Pembroke Pines': '168587', 'Friendswood': '39396', 'Bountiful': '44078', 'Terre Haute': '60852', 'Methuen': '49917', 'Las Vegas': '632912', 'South San Francisco': '66980', 'Cedar Falls': '41390', 'Carlsbad': '113952', 'Quincy': '93688', 'Waterbury': '108272', 'Thornton': '136703', 'Mount Pleasant': '84170', 'Livermore': '89115', 'Lansing': '116020', 'Richmond': '223170', 'Providence': '179219', 'Clearwater': '114361', 'Pasadena': '153351', 'Clovis': '106583', 'Augusta-Richmond County': '197081', 'Lacey': '47688', 'East Providence': '47337', 'Columbus': '860090', 'Indianapolis': '855164', 'Danbury': '84992', 'Sarasota': '56610', 'Mountain View': '80447', 'South Bend': '101735', 'Watsonville': '53796', 'Ann Arbor': '120782', 'Valdosta': '56474', 'Berwyn': '55748', 'Akron': '197633', 'Memphis': '652717', 'Enid': '51004', 'Brighton': '38314', 'Lake Charles': '76848', 'West Lafayette': '45872', 'Littleton': '46333', 'Orlando': '277173', 'Loveland': '76897', 'Spanish Fork': '38861', 'Edmonds': '41840', 'Goose Creek': '42039', 'Monterey Park': '61075', 'Woburn': '39452', 'Upland': '76684', 'Edina': '51350', 'Bowling Green': '65234', 'Oxnard': '207906', 'San Rafael': '58954', 'Costa Mesa': '112822', 'Roseville': '132671', 'Birmingham': '212157', 'Bothell': '44546', 'Buckeye': '64629', 'Oklahoma City': '638367', 'Broomfield': '66529', 'Hattiesburg': '46926', 'West Des Moines': '64560', 'Denton': '133808', 'Weslaco': '40033', 'Flagstaff': '71459', 'Annapolis': '39418', 'Yucaipa': '53309', 'West Palm Beach': '108161', 'West Jordan': '113699', 'Bonita Springs': '54198', 'Plainfield': '50636', 'Buffalo Grove': '41346', 'Utica': '60652', 'Paterson': '147000', 'Fond du Lac': '42951', 'Harlingen': '65539', 'Ocoee': '44820', 'Moline': '42250', 'Paramount': '54909', 'Winter Garden': '41988', 'Phoenix': '1615017', 'State College': '41992', 'Irving': '238289', 'Edmond': '91191', 'Manhattan': '54983', 'Pocatello': '54746', 'Hartford': '123243', 'Missouri City': '74561', 'Summerville': '49323', 'Joliet': '148262', 'Azusa': '49628', 'Alpharetta': '65338', 'Boynton Beach': '75569', 'North Las Vegas': '238702', 'Sioux Falls': '174360', 'Chattanooga': '177571', 'Port Orange': '61105', 'Federal Way': '96757', 'Lexington-Fayette': '318449', 'Riverton': '42838', 'Chesapeake': '237940', 'Compton': '97550', 'Smyrna': '56664', 'Rock Island': '38210', 'Duluth': '86293', 'Petaluma': '60530', 'Doral': '57947', 'Wheaton': '53389', 'Oak Park': '51774', 'Wheeling': '38315', 'St. Louis': '311404', 'Ceres': '48278', 'Tallahassee': '190894', 'Lakewood': '154393', 'Mount Prospect': '54171', 'Nashua': '87882', 'Rocklin': '62787', 'Woonsocket': '41406', 'Salinas': '157218', 'Janesville': '64159', 'New Rochelle': '79557', 'Euclid': '47360', 'Turlock': '72796', 'Schenectady': '64913', 'McKinney': '172298', 'Yorba Linda': '68235', 'Redondo Beach': '67867', 'Bowie': '58393', 'Grapevine': '51971', 'Arlington': '392772', 'Perth Amboy': '52499', 'Wyoming': '75567', 'Montgomery': '200022', 'Newark': '281764', 'Apple Valley': '72553', 'Wilson': '49620', 'Sandy Springs': '105703', 'Charlottesville': '46912', 'Warwick': '81579', 'Pico Rivera': '63635', 'Irvine': '266122', 'Waltham': '63002', 'Chesterfield': '47659', 'Palm Beach Gardens': '53778', 'Lodi': '64641', 'Stanton': '38644', 'Thousand Oaks': '128888', 'Harrisburg': '48904', 'Elyria': '53715', 'Marlborough': '39697', 'Menifee': '88531', 'Fort Wayne': '264488', 'Conroe': '82286', 'Medford': '81636', 'Rochester Hills': '73422', 'Jonesboro': '74889', 'Carrollton': '133351', 'Lenexa': '52903', 'West New York': '53343', 'Vallejo': '121299', 'Cutler Bay': '44707', 'Norfolk': '245115', 'Rogers': '65021', 'Colton': '54712', 'Miami': '453579', 'Lorain': '63730', 'Kyle': '39060', 'Cerritos': '50555', 'Topeka': '126808', 'Kenosha': '99631', 'New Britain': '72558', 'Brookhaven': '52444', 'Glenview': '47475', 'Montclair': '38944', 'Tigard': '51902', 'Indio': '88488', 'East Lansing': '48870', 'North Port': '64274', 'Pittsburg': '70679', 'Savannah': '146763', 'Newton': '89045', 'Pharr': '77320', 'Fort Worth': '854113', 'McAllen': '142212', 'Palm Coast': '85109', 'North Little Rock': '66278', 'Gary': '76424', 'Orland Park': '58862', 'Gilroy': '55069', 'Coppell': '41360', 'Maricopa': '46903', 'Pearland': '113570', 'Sparks': '98345', 'Evanston': '74895', 'San Clemente': '65309', 'Hesperia': '93724', 'Port St. Lucie': '185132', 'Reading': '87575', 'Rosemead': '54500', 'Gilbert': '237133', 'Laredo': '257156', 'Folsom': '77271', 'Bossier City': '68485', 'Elizabeth': '128640', 'Westland': '81545', 'Everett': '109043', 'Hawthorne': '88031', 'Dubuque': '58531', 'Hilton Head Island': '40500', 'Palm Bay': '110104', 'Newport News': '181825', 'Hanford': '55547', 'Manchester': '110506', 'Chino': '87776', 'Euless': '54769', 'Cranston': '81034', 'Miami Beach': '91917', 'St. Charles': '69293', 'Florissant': '51776', 'Diamond Bar': '56793', 'Iowa City': '74398', 'Kentwood': '51689', 'Barnstable Town': '44254', 'Frisco': '163656', 'Tucson': '530706', 'Apopka': '49458', 'Urbandale': '43018', 'St. Louis Park': '48747', 'Westfield': '41552', 'Redwood City': '84950', 'Buffalo': '256902', 'Cambridge': '110651', 'Lake Oswego': '38945', 'Boston': '673184', 'Kennewick': '80454', 'Dayton': '140489', 'Delray Beach': '67371', 'Layton': '75655', 'Owensboro': '59273', 'Sunrise': '93734', 'Arlington Heights': '75525', 'Huntington': '48113', 'Erie': '98593', 'Longview': '82055', 'Norwalk': '106178', 'Sanford': '58605', 'Ontario': '173212', 'Eastvale': '61151', 'Georgetown': '67140', 'Noblesville': '60183', 'Jackson': '169148', 'Hollywood': '151998', 'New Braunfels': '73959', 'Miramar': '138449', 'West Haven': '54516', 'Prescott Valley': '43132', 'Novi': '59211', 'Rancho Palos Verdes': '42435', 'Placentia': '52228', 'Miami Gardens': '113058', 'Alhambra': '85474', 'Jeffersonville': '47124', 'Blue Springs': '54431', 'Shreveport': '194920', 'Escondido': '151613', 'Mesa': '484587', 'Cathedral City': '54056', 'Sioux City': '82872', 'Lincoln': '280364', 'Great Falls': '59178', 'Vineland': '60525', 'Rockville': '66940', 'Hagerstown': '40452', 'La Crosse': '52109', 'Fort Smith': '88133', 'Tinley Park': '56831', 'Malden': '60840', 'Beverly': '41365', 'Lawton': '94653', 'Flower Mound': '73547', 'Kalamazoo': '75984', 'Wichita Falls': '104724', 'Concord': '128726', 'Morgan Hill': '44155', 'Anderson': '55130', 'Schertz': '39453', 'Auburn': '77472', 'New York': '8537673', 'Hendersonville': '57050', 'Hurst': '39160', 'Casper': '59324', 'Ormond Beach': '42162', 'Buena Park': '83156', 'Haltom City': '44361', 'Rockwall': '43586', 'Abilene': '122225', 'Crystal Lake': '40339', 'Lubbock': '252506', 'Sumter': '40723', 'Philadelphia': '1567872', 'Westminster': '113875', 'Downey': '113267', 'Parma': '79425', 'Danville': '44631', 'DeKalb': '43194', 'Fort Myers': '77146', 'North Lauderdale': '43699', 'Torrance': '147195', 'Odessa': '117871', 'Sayreville': '44905', 'Shoreline': '55333', 'Muskogee': '38352', 'Mankato': '41720', 'Linden': '42457', 'Logan': '50676', 'Grand Junction': '61881', 'Grand Rapids': '196445', 'Carpentersville': '38291', 'Antioch': '110898', 'Sheboygan': '48686', 'Appleton': '74370', 'Dothan': '68468', 'Decatur': '72706', 'Pueblo': '110291', 'Greenacres': '40013', 'Hallandale Beach': '39500', 'Elk Grove': '169743', 'Cape Girardeau': '39628', 'Ogden': '86701', 'Frederick': '70060', 'Lafayette': '127626', 'Manassas': '41483', 'Prescott': '42513', 'Kannapolis': '47839', 'San Bernardino': '216239', 'Pleasant Grove': '38756', 'Romeoville': '39706', 'El Centro': '44201', 'Galveston': '50550', 'Trenton': '84056', 'Fall River': '88930', 'Chapel Hill': '59246', 'Coral Gables': '50815', 'Union City': '75322', 'Coon Rapids': '62359', 'Suffolk': '89273', 'Lauderhill': '71626', 'Encinitas': '63131', 'Bend': '91122', 'La Puente': '40377', 'Corpus Christi': '325733', 'Marion': '38480', 'Duncanville': '39457', 'Hickory': '40567', 'Sierra Vista': '43208', 'Hammond': '77134', 'Shawnee': '65194', 'South Gate': '95538', 'Burnsville': '61290', 'Ames': '66191', 'Davis': '68111', 'Clarksville': '150287', 'Kent': '127514', 'Charleston': '134385', 'Collierville': '49177', 'Lynchburg': '80212', 'Independence': '117030', 'Henderson': '292969', 'Redmond': '62458', 'Tulsa': '403090', 'Laguna Niguel': '65328', 'Sacramento': '495234', 'Johns Creek': '83873', 'Lewisville': '104659', 'York': '43859', 'Fishers': '90127', 'DeSoto': '52599', 'Biloxi': '45975', 'Whittier': '86883', 'Camarillo': '67363', 'Redlands': '71288', 'The Colony': '42408', 'Saginaw': '48984', 'Altamonte Springs': '43492', 'Dublin': '59583', 'Jefferson City': '43013', 'Cedar Rapids': '131127', 'Mentor': '46732', 'Sherman': '41567', 'Rio Rancho': '96028', 'Largo': '83065', 'St. Cloud': '67641', 'Niagara Falls': '48632', 'Santa Clarita': '181972', 'Olympia': '51202', 'Warner Robins': '74388', 'Moore': '61415', 'Sammamish': '63773', 'Washington': '681170', 'Covington': '40797', 'Minneapolis': '413651', 'Cuyahoga Falls': '49206', 'Visalia': '131074', 'Livonia': '94041', 'Salt Lake City': '193744', 'Casa Grande': '54534', 'Allen': '99179', 'Chula Vista': '267172', 'Pawtucket': '71427', 'White Plains': '58241', 'Camden': '74420', 'Manteca': '76908', 'Gulfport': '72076', 'Rochester': '208880', 'Fontana': '209665', 'Knoxville': '186239', 'San Leandro': '90465', 'Arvada': '117453', 'Rapid City': '74048', 'Kokomo': '57799', 'Bremerton': '40675', 'Cheyenne': '64019', 'Grand Forks': '57339', 'Surprise': '132677', 'Downers Grove': '49473', 'San Buenaventura (Ventura)': '109592', 'Evansville': '119477', 'St. Peters': '57289', 'Cicero': '82992', 'Coral Springs': '130059', 'Poway': '50077', 'Southaven': '53214', 'Taylorsville': '60436', 'Pontiac': '59698', 'Hempstead': '55555', 'Wausau': '38872', 'Aurora': '361710', 'Lehi': '61130', 'New Orleans': '391495', 'Elmhurst': '46387', 'Los Angeles': '3976322', 'Strongsville': '44631', 'San Antonio': '1492510', 'Rockford': '147651', 'Marana': '43474', 'Kenner': '67089', 'North Charleston': '109298', 'Bell Gardens': '42806', 'Bridgeport': '145936', 'Hoffman Estates': '51738', 'Blacksburg': '45038', 'Milford': '52536', 'Tustin': '80395', 'Round Rock': '120892', 'Streamwood': '40166', 'College Station': '112141', 'Sugar Land': '88177', 'Pompano Beach': '109393', 'Maplewood': '40150', 'Cupertino': '60643', 'Portsmouth': '95252', 'Daly City': '106472', 'Battle Creek': '51534', 'Santa Fe': '83875', 'Holyoke': '40280', 'Modesto': '212175', 'Mesquite': '143736', 'La Mirada': '49216', 'Eden Prairie': '63914', 'Tacoma': '211277', 'Cypress': '48906', 'Weymouth Town': '55972', 'Palm Springs': '47689', 'Santa Clara': '125948', 'Kearny': '42126', 'Delaware': '38643', 'Fullerton': '140721', 'San Gabriel': '40404', 'Schaumburg': '74446', 'Santa Ana': '334217', 'Riverside': '324722', 'Lompoc': '43712', 'Temple': '73600', 'Peachtree Corners': '42773', 'Cary': '162320', 'Victorville': '122265', 'Johnson City': '66677', 'Chicago': '2704958', 'Cape Coral': '179804', 'Farmington': '41629', 'Apex': '47349', 'El Cajon': '103768', 'Marietta': '60941', 'Mansfield': '65631', 'Perris': '76331', 'Twin Falls': '48260', 'League City': '102010', 'Germantown': '39056', 'Rohnert Park': '42622', 'New Bedford': '95032', 'Port Arthur': '55427', 'Leominster': '41663', 'Puyallup': '40640', 'Wellington': '63900', 'West Allis': '60087', 'Anaheim': '351043', 'Richardson': '113347', 'Fayetteville': '204759', 'Davenport': '102612', 'New Brunswick': '56910', 'Sterling Heights': '132427', 'Elgin': '112123', "Lee's Summit": '96076', 'Conway': '65300', 'Deltona': '90124', 'North Miami Beach': '43891', 'Merced': '82594', 'Worcester': '184508', 'Delano': '52707', 'Martinez': '38259', 'Salem': '167419', 'Bethlehem': '75293', 'St. George': '82318', 'Cleveland': '385809', 'Campbell': '40939', 'Lynwood': '71187', 'Leesburg': '52607', 'Somerville': '81322', 'City': '2016 Population', 'Temecula': '113054', 'Inglewood': '110654', 'Bismarck': '72417', 'Murrieta': '111674', 'Pensacola': '53779', 'Hackensack': '44756', 'Plantation': '92706', 'St. Clair Shores': '59775', 'Coachella': '44953', 'Chandler': '247477', 'Lowell': '110558', 'Oak Lawn': '56257', 'Mishawaka': '48679', 'Norwich': '39556', 'Huntsville': '193079', 'Lake Havasu City': '53743', 'Amarillo': '199582', 'Hemet': '84281', 'Mobile': '192904', 'Fort Pierce': '45295', 'Rancho Cordova': '72326', 'San Ramon': '75639', 'Oshkosh': '66579', 'Muskegon': '38349', 'Plano': '286057', 'Beaverton': '97590', 'Waukesha': '72363', 'San Angelo': '100702', 'Centennial': '109932', 'Bakersfield': '376380', 'Gastonia': '75536', 'Chelsea': '39699', 'Anchorage': '298192', 'Athens-Clarke County': '123371', 'Freeport': '43279', 'Kansas City': '481420', 'Skokie': '64270', 'Deerfield Beach': '79764', 'Culver City': '39364', 'Raleigh': '458880', 'Bristol': '60147', 'Glendale': '245895', 'Monroe': '49297', 'Rialto': '103314', 'Carmel': '91065', 'Houston': '2303482', 'Hampton': '135410', 'Garland': '234943', 'Cleveland Heights': '44633', 'Gresham': '111523', 'Reno': '245255', 'Council Bluffs': '62524', 'Citrus Heights': '87432', 'Des Moines': '215472', 'Milwaukee': '595047', 'Portland': '639863', 'Elkhart': '52221', 'St. Paul': '302398', 'Roanoke': '99660', 'Albany': '98111', 'Middletown': '48813', 'Richland': '54989', 'Woodland': '59068', 'Provo': '116868', 'Titusville': '46019', 'Tamarac': '65199', 'Tyler': '104798', 'Indian Trail': '38222', 'Santa Rosa': '175155', 'San Francisco': '870887', 'Plant City': '38200', 'Yuma': '94906', 'Spokane Valley': '96340', 'Bellevue': '141400', 'Champaign': '86637', 'Shakopee': '40610', 'Grand Prairie': '190682', 'Victoria': '67670', 'Flint': '97386', 'Marysville': '67626', 'Berkeley': '121240', 'Oakley': '40622', 'Dearborn': '94444', 'Palmdale': '157356', 'Chicopee': '55991', 'Tampa': '377165', 'Greenwood': '56545', 'Palm Desert': '52231', 'East Orange': '64789', 'Pine Bluff': '43841', 'Corona': '166785', 'Winston-Salem': '242203', 'Bellingham': '87574', 'Burleson': '45016', 'Castle Rock': '57666', 'Attleboro': '44434', 'Canton': '71323', 'Walnut Creek': '69122', 'Alexandria': '155810', 'Cincinnati': '298800', 'Fresno': '522053', 'West Covina': '107847', 'Kingsport': '52806', 'Hialeah': '236387', 'Long Beach': '470130', 'Troy': '83641', 'Durham': '263016', 'Wilkes-Barre': '40569', 'San Jacinto': '47413', 'San Marcos': '95261', 'Naperville': '147122', 'Warren': '135125', 'Rock Hill': '72937', 'Milpitas': '77528', 'Santee': '57834', 'Lynn': '92697', 'Caldwell': '53149', 'Gainesville': '131591', 'Palatine': '68766', 'Napa': '80416', 'Baytown': '75992', 'Louisville/Jefferson County': '616261', 'Baldwin Park': '76464', 'National City': '61147', 'Muncie': '69010', 'Goodyear': '77258', 'Meridian': '95623', 'Eugene': '166575', 'Murfreesboro': '131947', 'Jersey City': '264152', 'Draper': '47328', 'Passaic': '70635', 'Pittsburgh': '303625', 'Yakima': '93986', 'Albuquerque': '559277', 'Nashville-Davidson': '660388', 'North Richland Hills': '69798', 'Brockton': '95630', 'Syracuse': '143378', 'Arcadia': '58523', 'Spokane': '215973', 'Killeen': '143400', 'Wichita': '389902', 'San Jose': '1025350', 'Davie': '101871', 'High Point': '111223', 'Las Cruces': '101759', 'Longmont': '92858', 'Margate': '57870', 'Brownsville': '183823', 'Oviedo': '39337', 'Madison': '252551', 'Taylor': '61177', 'Fort Collins': '164207', 'Westerville': '38985', 'Atlantic City': '38735', 'Peabody': '52491', 'Commerce City': '54869', 'Tuscaloosa': '99543', 'Roy': '38201', 'Blaine': '62892', 'La Habra': '61664', 'Huntington Beach': '200652', 'Keizer': '38980', 'Brooklyn Park': '79707', 'Rancho Cucamonga': '176534', 'Pasco': '70579', 'Salina': '47336', 'Midland': '134610', 'Dallas': '1317929', 'Urbana': '42014', 'Novato': '56004', 'Winter Haven': '38953', 'Greenville': '91495', 'St. Petersburg': '260999', 'Fairfield': '114756', 'Portage': '48508', 'Meriden': '59622', 'Tulare': '62779', 'Corvallis': '57110', 'Racine': '77571', 'Boulder': '108090', 'Fort Lauderdale': '178752', 'Findlay': '41422', 'Norman': '122180', 'Lawrence': '95358', 'Aliso Viejo': '51424', 'Baton Rouge': '227715', 'Asheville': '89121', 'Santa Cruz': '64465', 'Newport Beach': '86688', 'Binghamton': '45672', 'Billings': '110323', 'Olathe': '135473', 'Kissimmee': '69369', 'Vacaville': '98303', 'Calexico': '40232', 'Homestead': '67996', 'Green Bay': '105139', 'Bryan': '83260', 'Boca Raton': '96114', 'Bedford': '49528', "Coeur d'Alene": '50285', 'Hutchinson': '41310', 'Little Rock': '198541', 'Detroit': '672795', 'Lake Forest': '83240', 'Belleville': '41906', 'Franklin': '74794', 'Porterville': '58978', 'New Berlin': '39803', 'Highland': '54939', 'Austin': '947890', 'Greensboro': '287027', 'Southfield': '73100', 'Lake Elsinore': '64205', 'Stockton': '307072', 'Mission Viejo': '96396', 'Mount Vernon': '68344', 'Greeley': '103990', 'Madera': '64444', 'Murray': '49230', 'Pleasanton': '82270', 'Lombard': '43815', 'Lakeville': '61938', 'North Miami': '62139', 'Gaithersburg': '67776', 'Florence': '39959', 'Bentonville': '47093', 'Lancaster': '160106', 'Vista': '101659', 'Wylie': '47701', 'Yonkers': '200807', 'Springdale': '78557', 'Huntington Park': '58879', 'Dearborn Heights': '55761', 'Texas City': '48262', 'Pflugerville': '59245', 'Carson': '92797', 'Fargo': '120762', 'Idaho Falls': '60211', 'Vancouver': '174826', 'Little Elm': '42504', 'Hillsboro': '105164', 'Springfield': '167319', 'Coconut Creek': '59405', 'Weston': '70015', 'Wauwatosa': '47945', 'Tempe': '182498', 'Woodbury': '68820', 'Simi Valley': '126327', 'St. Joseph': '76472', 'Farmington Hills': '81129', 'South Jordan': '69034', 'Tracy': '89274', 'Hayward': '158937', 'Jacksonville': '880619', 'Milton': '38411', 'Carson City': '54742', 'Omaha': '446970', 'Northglenn': '38982', 'Yuba City': '66845', 'Wilmington': '117525', 'Gardena': '60048', 'Melbourne': '81185', 'Plymouth': '77216', 'Palo Alto': '67024', "O'Fallon": '86274', 'Eau Claire': '68339', 'Baltimore': '614664', 'Brentwood': '60532', 'New Haven': '129934', 'Covina': '48549', 'Pittsfield': '42846', 'Roswell': '94598', 'Taunton': '56843', 'Tupelo': '38842', 'Clifton': '85845', 'Mission': '83563', 'Montebello': '63335', 'West Sacramento': '52981', 'La Mesa': '59948', 'Royal Oak': '59006', 'Stamford': '129113', 'Keller': '46646', 'Moorhead': '42492', 'Oakland': '420005', 'Peoria': '164173', 'Burien': '50997'}
df['city_population'] = df['city_or_county'].apply(lambda x : int(population[x]) if x in population else 0)

tempdf = df.groupby(by=['city_or_county']).agg({'n_killed': 'sum', 'n_injured' : 'sum', 'city_population' : 'mean', 'state' : 'count'}).reset_index().rename(columns={'state' : 'total_incidents', 'n_killed' : 'total_killed', 'n_injured' : 'total_injured'})
tempdf['incidents_population_ratio'] = 1000*tempdf['total_incidents'] / (tempdf['city_population']+1) 
tempdf['killed_population_ratio'] = 1000*tempdf['total_killed'] / (tempdf['city_population']+1) 
tempdf['injured_population_ratio'] = 1000*tempdf['total_injured'] / (tempdf['city_population']+1) 
tempdf['loss_population_ratio'] = 1000*(tempdf['total_killed'] + tempdf['total_injured']) / (tempdf['city_population']+1) 

### 4.1 Which cities has highest  GunViolence Incidents to CityPopulation ratio 

In [None]:
i_p = tempdf.sort_values(['incidents_population_ratio'], ascending=[False])
i_p = i_p[i_p['city_population'] > 500000][:25]
sns.set(rc={'figure.figsize':(12,8)})
ax = sns.barplot(y='city_or_county', x='incidents_population_ratio', data=i_p, color='#ed5569')
ax.set(xlabel='GunViolence Incidents and CityPopulation Ration (* 1000)', ylabel='City');

> - Baltimore (City in Maryland) has the highest ratio of GunViolence Incidents and City Population, In contrast to Chicago where absolute number of gun violence incidents was highest. Baltimore had 3943 total gun violence incidents and its population in 2017 was 614,664.
> - Chicago had more than 10,000 gun violence incidents in recent years, but it is one of the most populated city of US (2017 population = 2704958) and its ratio of gunviolence incidents and population comes at number four after Baltimore, Washingon, and Milwaukee.

### 4.2 Which cities has highest PeopleKilled to Population Ratio

In [None]:
i_p = tempdf.sort_values(['killed_population_ratio'], ascending=[False])
i_p = i_p[i_p['city_population'] > 500000][:25]
sns.set(rc={'figure.figsize':(12,8)})
ax = sns.barplot(y='city_or_county', x='killed_population_ratio', data=i_p, color='#85e86d')
ax.set(xlabel='Ratio of PeopleKilled and City Population (*1000)', ylabel='City');

> - Baltimore again tops the list as there are about 1000 people which were killed in gun violence incidents and this ratio is equal to 1.716382
> - Memphis with the population size of 652717 and Las Vegas with 632912 comes at 2nd and 3rd place as 623 and 601 people were killed. Their respecitve killed to population ratio is 0.95 and 0.94
> - Chicago, where maximum incidents has occured has the killed to population ratio of 0.77 which comes at 6th spot

### 4.3 People Injured and CityPopulation Ratio

In [None]:
i_p = tempdf.sort_values(['injured_population_ratio'], ascending=[False])
i_p = i_p[i_p['city_population'] > 500000][:25]
sns.set(rc={'figure.figsize':(12,8)})
ax = sns.barplot(y='city_or_county', x='injured_population_ratio', data=i_p, color='#60f7f4')
ax.set(xlabel='Ratio of Gun Violence Incidents and City Population', ylabel='City');

> - Baltmore and Chicago are the cities with highest ratio of people injured in gun violence incidents and city population.
> - Apart from these, Milwaukee, Memphis, and Washington are other top cities where the ratio of people getting injured in gun violence incidents is quite higher .

Thanks for viewing the notebook. Please upvote if you liked