## Gun Deaths in the US

In this analysis, I will be analyzing a dataset posted by FiveThirtyEight investigating gun deaths in the U.S. to determine if there are any trends in death as varied by time of year, gender, or race.

In [1]:
# Read in the data
import pandas as pd

direct_link = "https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv"
data = pd.read_csv(direct_link, low_memory=0)      

In [2]:
data.head()

Unnamed: 0.1,Unnamed: 0,year,month,intent,police,sex,age,race,hispanic,place,education
0,1,2012,1,Suicide,0,M,34.0,Asian/Pacific Islander,100,Home,BA+
1,2,2012,1,Suicide,0,F,21.0,White,100,Street,Some college
2,3,2012,1,Suicide,0,M,60.0,White,100,Other specified,BA+
3,4,2012,2,Suicide,0,M,64.0,White,100,Home,BA+
4,5,2012,2,Suicide,0,M,31.0,White,100,Other specified,HS/GED


In [3]:
# Quick exploration of the data
data.shape

(100798, 11)

In [4]:
data.dtypes

Unnamed: 0      int64
year            int64
month           int64
intent         object
police          int64
sex            object
age           float64
race           object
hispanic        int64
place          object
education      object
dtype: object

In [5]:
# Count how many deaths occurred each year
data['year'].value_counts()

2013    33636
2014    33599
2012    33563
Name: year, dtype: int64

It doesn't look like the number of gun deaths has changed much by year from 2012 to 2014. We will explore whether there were any more deaths in a certain month or season.

In [9]:
# Convert month and year into a datetime object
import datetime
data["year_month"] = data["year"].map(str) + "_" + data["month"].map(str)

In [10]:
data["year_month"].head()

0    2012_1
1    2012_1
2    2012_1
3    2012_2
4    2012_2
Name: year_month, dtype: object

In [11]:
data["year_month"].value_counts()

2013_7     3079
2012_7     3026
2012_5     2999
2014_8     2970
2012_8     2954
2014_6     2931
2013_6     2920
2014_9     2914
2014_7     2884
2014_10    2865
2013_1     2864
2014_5     2864
2013_3     2862
2014_4     2862
2013_8     2859
2014_12    2857
2012_9     2852
2012_6     2826
2013_10    2808
2013_5     2806
2013_4     2798
2012_4     2795
2012_12    2791
2013_12    2765
2013_11    2758
2012_1     2758
2014_11    2756
2012_3     2743
2013_9     2742
2012_10    2733
2012_11    2729
2014_3     2684
2014_1     2651
2013_2     2375
2014_2     2361
2012_2     2357
Name: year_month, dtype: int64

In [24]:
# Plot a histogram by year and month
## Make this work and figure out how to sort values!!!!
import matplotlib.pyplot as plt
%matplotlib.inline
plt.hist("year_month", data = data)


UsageError: Line magic function `%matplotlib.inline` not found.


Next, we'll look at how gun deaths in the US vary by sex and race.

In [25]:
data["sex"].value_counts()

M    86349
F    14449
Name: sex, dtype: int64

In [26]:
data["race"].value_counts()

White                             66237
Black                             23296
Hispanic                           9022
Asian/Pacific Islander             1326
Native American/Native Alaskan      917
Name: race, dtype: int64

By looking at counts of deaths in this dataset, we observe that more men than women are involved in gun deaths in the US. We also observe that there is quite a large difference between the number of deaths between different races. It would be interesting to determine whether the counts are proportional to the demographic makeup of U.S.

In [27]:
# Read in US census file
census = pd.read_csv("census.csv")
census.head()

Unnamed: 0,Id,Year,Id.1,Sex,Id.2,Hispanic Origin,Id.3,Id2,Geography,Total,Race Alone - White,Race Alone - Hispanic,Race Alone - Black or African American,Race Alone - American Indian and Alaska Native,Race Alone - Asian,Race Alone - Native Hawaiian and Other Pacific Islander,Two or More Races
0,cen42010,"April 1, 2010 Census",totsex,Both Sexes,tothisp,Total,0100000US,,United States,308745538,197318956,44618105,40250635,3739506,15159516,674625,6984195


In [42]:
# Map US census data to the race categories that we have in our gun deaths dataset
census_mapping = {'Asian/Pacific Islander':3739506+15159516+674625, 'Black':40250635, 'Native American/Native Alaskan':3739506, 'Hispanic':44618105, 'White':197318956}
race_counts = data["race"].value_counts().rename_axis('race').reset_index(name = 'counts')
race_counts["census_count"] = race_counts["race"].map(census_mapping)
race_counts.head()

Unnamed: 0,race,counts,census_count
0,White,66237,197318956
1,Black,23296,40250635
2,Hispanic,9022,44618105
3,Asian/Pacific Islander,1326,19573647
4,Native American/Native Alaskan,917,3739506


The 2010 US census data has the counts for the population per race. To get the rate of gun deaths per race, we can divide the total number of gun deaths by the population of each race.

In [44]:
# Express the number of gun deaths as a rate per 100000 people, as that is the typical statistic reported
race_counts["deaths_race_per_100k"]= race_counts["counts"]/race_counts["census_count"]*100000

In [46]:
race_counts.head()

Unnamed: 0,race,counts,census_count,deaths_race_per_100k
0,White,66237,197318956,33.568493
1,Black,23296,40250635,57.877348
2,Hispanic,9022,44618105,20.220491
3,Asian/Pacific Islander,1326,19573647,6.774415
4,Native American/Native Alaskan,917,3739506,24.521956


It looks like there have been about 58 deaths per 100,000 people in the black community compared to 34 deaths per 100,000 people, despite the number of white people being five times more than the number of black people. We will investigate the intent of the deaths and filter to just gun-related murder rates.

In [48]:
data_homicides = data[data["intent"] == "Homicide"]
data_homicides.shape

(35176, 12)

In [49]:
homicide_counts_by_race = data_homicides["race"].value_counts().rename_axis('race').reset_index(name = 'counts')
homicide_counts_by_race["census_count"] = homicide_counts_by_race["race"].map(census_mapping)
homicide_counts_by_race.head()

Unnamed: 0,race,counts,census_count
0,Black,19510,40250635
1,White,9147,197318956
2,Hispanic,5634,44618105
3,Asian/Pacific Islander,559,19573647
4,Native American/Native Alaskan,326,3739506


In [51]:
# Express the number of gun deaths as a rate per 100000 people, as that is the typical statistic reported
homicide_counts_by_race["deaths_race_per_100k"]= homicide_counts_by_race["counts"]/homicide_counts_by_race["census_count"]*100000
homicide_counts_by_race

Unnamed: 0,race,counts,census_count,deaths_race_per_100k
0,Black,19510,40250635,48.471285
1,White,9147,197318956,4.635642
2,Hispanic,5634,44618105,12.627161
3,Asian/Pacific Islander,559,19573647,2.855881
4,Native American/Native Alaskan,326,3739506,8.717729


By filtering to just the homicide-related gun deaths, we observe that blacks have a substantially higher rate of death per 100,000 people. This is a striking observation as black people make up much less of the US population than white people.

# Conclusions and Next Steps

In this analysis, we assessed how the number of gun-related deaths in US vary by time of the year, gender, and race. By normalizing the number of deaths per race for homicide-related deaths by the US population demographics, we observed that blacks have the highest homicide-related deaths in the US. 

Further investigation could include:
- Figuring out the link, if any, between month and homicide rate.
- Exploring the homicide rate by gender.
- Exploring the rates of other intents, like Accidental, by gender and race.
- Finding out if gun death rates correlate to location and education.