#US Gun Deaths

In this guided project during my course at Dataquest, I will be analyzing data on gun deaths in the US.

The dataset comes from FiveThirtyEight on Github. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim.

In [2]:
#Read the dataset

import csv

guns = open("guns.csv","r")

data = list(csv.reader(guns)) #Get a list of all the data in the file

In [3]:
#Here are the first 5 rows of the data set

print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+']]


In [4]:
#Extract the first row of the data as headers
# Remove the first row from the main data, we do not need these in order to analyze the data

headers = data[0]

data = data[1:]

print (headers)

print (data[:5])



['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED']]


In [5]:
#Extract the year information to work out gun deaths per year

years = [n[1] for n in data] #extract year data into a new list

year_counts = {}

for n in years: #count number of deaths each year
    if n in year_counts:
        year_counts[n] += 1
    else: 
        year_counts[n] = 1

print (year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


In [6]:
#The numbers of gun deaths do not seem to change much by year. Hence, I will investigate further by counting the death numbers by month and year

import datetime #import datetime module

dates = [datetime.datetime(year=int(n[1]), month=int(n[2]), day =1) for n in data] #create a list of datetime object for each row

print (dates[:5])

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


In [7]:
#Count the number of deaths for each unique date

date_counts = {}

for n in dates:
    if n in date_counts:
        date_counts[n] += 1
    else:
        date_counts[n] = 1
        
print (date_counts)

{datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2012, 2, 1, 0, 0): 2357, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2012, 4, 1, 0, 0): 2795, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2012, 7, 1, 0, 0): 3026, datetime.datetime(2012, 8, 1, 0, 0): 2954, datetime.datetime(2012, 9, 1, 0, 0): 2852, datetime.datetime(2012, 10, 1, 0, 0): 2733, datetime.datetime(2012, 11, 1, 0, 0): 2729, datetime.datetime(2012, 12, 1, 0, 0): 2791, datetime.datetime(2013, 1, 1, 0, 0): 2864, datetime.datetime(2013, 2, 1, 0, 0): 2375, datetime.datetime(2013, 3, 1, 0, 0): 2862, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2013, 9, 1, 0, 0): 2742, datetime.datetime(2013, 10, 1, 0, 0): 2808, datetime.datetime(2013, 11, 1, 0, 0): 2758, datet

In [8]:
#Gun deaths seem consistent over month and year, hovering around the high 2000 with a range of 2000 - 3000

In [9]:
#Next, I explore deaths by sex by counting how many gun deaths per sex there are

sex = [n[5] for n in data] 

sex_counts = {} 

for n in sex:
    if n in sex_counts:
        sex_counts[n] += 1
    else:
        sex_counts[n] = 1  
        
print (sex_counts)

{'M': 86349, 'F': 14449}


In [10]:
#The first significant patten emerges that there are more gun deaths from males than females

death_ratio_by_sex = sex_counts['M']/sex_counts['F']

print (death_ratio_by_sex)

#In fact, male gun deaths are nearly 6 times more than female

5.976122915080628


In [11]:
#Then, I proceed to count gun deaths for race

race = [n[7] for n in data] 

race_counts = {} 

for n in race:
    if n in race_counts:
        race_counts[n] += 1
    else:
        race_counts[n] = 1   
        
print (race_counts)

{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}


White has the most gun deaths, followed by Black, then Hispanic. Native American has the least gun deaths among all available races.It will be intersting if we investigate further to see if there is a particular intent with the most gun deaths as well as education level of the victim.

In [13]:
#So far, we only get to now the total number of gun deaths in the US by race. However, without the the proportion of each race in the US, we cannot draw any meaningful conclusion
#Hence, I will calculate the rate of gun deaths per 100,000 people of each race

census_data = open("census.csv","r") #getting the data on total population of the US as well as of each racial group 

census = list(csv.reader(census_data))

print (census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


Below is a list of the race name in data, and the corresponding race name in census:

Asian/Pacific Islander: Race Alone - Asian + Race Alone - Native Hawaiian and Other Pacific Islander
Black: Race Alone - Black or African American
Hispanic: Race Alone - Hispanic
Native American/Native Alaskan: Race Alone - American Indian and Alaska Native
White: Race Alone - White

In [15]:
#Following the instruction of the project, I will manually create a dictionary that maps each key race from race_counts to the population count of the race from census
#Later on, I will try to accomplish this step with regular expressions

mapping = {"Asian/Pacific Islander":15834141,
           "Black":40250635,
           "Hispanic":44618105,
           "Native American/Native Alaskan":3739506,
           "White":197318956}

In [17]:
#Using regrex to match race with population data

import re

regrex_mapping = {}
for i, a in enumerate (census[0]):
    for n in race:
        if re.search(n, a) is not None:
            regrex_map[n] = census[1][i]

print (regrex_mapping)

{'Hispanic': '44618105', 'White': '197318956', 'Black': '40250635'}


I face a complication in looking for population data for Asian/Pacific Islander and Native American/Native Alaskan since they do not match any partial string of how they are named in census data. Hispanic, White, and Black are easier since there is only one word in each to look for. I managed to match these with their correct population. 

With what I have learned so far this regrex method seems to work only partially. I hope to address this after I advance further in my study.

In [19]:
#Compute death counts per 100,000 per race

race_per_hundredk ={}
for n in race_counts:
    per_hundredk = race_counts[n]/mapping[n]*100000
    race_per_hundredk[n] = per_hundredk
    
print (race_per_hundredk)

{'Asian/Pacific Islander': 8.374309664161762, 'White': 33.56849303419181, 'Native American/Native Alaskan': 24.521955573811088, 'Black': 57.8773477735196, 'Hispanic': 20.220491210910907}


This is different from what we saw with total death per race numbers. Here, Black race has the higher rate gun deaths per 100,000, followed by White, then Native American. Moreover, Black race almost has double the death rate compared to White.

In [21]:
#I next filter the results and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100,000 people in each racial category is

intents = [n[3] for n in data]

races = [n[7] for n in data]

homicide_race_counts = {}

for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else: 
            homicide_race_counts[race] = 1
            
print (homicide_race_counts)

{'White': 9147, 'Asian/Pacific Islander': 559, 'Black': 19510, 'Native American/Native Alaskan': 326, 'Hispanic': 5634}


In [22]:
#Perform the same procedure as previously using mapping on homicide_race_counts to get from raw numbers to rates per 100,000

homicide_race_per_hundredk ={}
for n in homicide_race_counts:
    per_hundredk = homicide_race_counts[n]/mapping[n]*100000
    homicide_race_per_hundredk[n] = per_hundredk
    
print (homicide_race_per_hundredk)

{'White': 4.6356417981453335, 'Asian/Pacific Islander': 3.530346230970155, 'Black': 48.471284987180944, 'Native American/Native Alaskan': 8.717729026240365, 'Hispanic': 12.627161104219914}


Here, Black race has the most gun murder rate per 100,000. However, Hispanic takes the second place, but at 4 times less than the Black. White falls to the second last place, before Asian.

In [25]:
#Similarly, I explore the homicide rate by gender

intents = [n[3] for n in data]

genders = [n[5] for n in data]

homicide_gender_counts = {}

for i, gender in enumerate(genders):
    if intents[i] == "Homicide":
        if gender in homicide_gender_counts:
            homicide_gender_counts[gender] += 1
        else: 
            homicide_gender_counts[gender] = 1
            
print (homicide_gender_counts)
print(homicide_gender_counts["M"]/homicide_gender_counts["F"])

{'M': 29803, 'F': 5373}
5.546808114647311


We see that homicide numbers for males are 5.5 times more than females. However, similar to race, I would need population data for each gener to arrive at a more meaningful conclusion. I, however, still think that even with population data per gender, male homicide rate is still higher than female since the population proportion of male/female should not be extreme

In [29]:
#Next, I repeat the steps above but this time I expolore police involved death rates between races.

police = [n[4] for n in data]

races = [n[7] for n in data]

police_death_counts = {}

for i, race in enumerate(races):
    if police[i] == "1":
        if race in police_death_counts:
            police_death_counts[race] += 1
        else: 
            police_death_counts[race] = 1
            
print (police_death_counts)

police_race_per_hundredk ={}
for n in police_death_counts:
    per_hundredk = police_death_counts[n]/mapping[n]*100000
    police_race_per_hundredk[n] = per_hundredk
    
print (police_race_per_hundredk)

{'White': 709, 'Native American/Native Alaskan': 25, 'Black': 356, 'Hispanic': 282, 'Asian/Pacific Islander': 30}
{'White': 0.3593167196769478, 'Native American/Native Alaskan': 0.6685375020122979, 'Black': 0.8844580961269306, 'Hispanic': 0.6320304280067475, 'Asian/Pacific Islander': 0.18946401955117112}


Now we can see that even there are more gun deaths involving police for White than other races but when we incorporate populationd data, death rate per 100,000 sees Black race with the highest number, nearing 1 death per 100,000. This is followed by Native American and Hispanic. White is 4th in the list. 

In [None]:
#Find out if gun death rates correlate to location and education.