# Exploring Gun Deaths in the US

This Notebook Explores Gun Deaths in the US between the Years 2012 and 2014 (inclusive) following and inspired by the guidelines by Dataquest.io. The datasets were also provided by them.

### Loading the Data 

In [1]:
import csv
import datetime

In [2]:
f = open('guns.csv', 'r')
data = list(csv.reader(f))
headers = data[0]
data = data[1:]

In [3]:
print(headers)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


### Looking at the Death Counts over Time

#### Yearly Counts

In [4]:
years = [row[1] for row in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else :
        year_counts[year] = 1

In [5]:
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

Gun deaths seem to hover around 33k per year for the 3 years for which we have data.

#### Monthly Counts

In [6]:
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
date_counts = {}

In [7]:
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else :
        date_counts[date] = 1

In [8]:
print(date_counts)

{datetime.datetime(2012, 4, 1, 0, 0): 2795, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2012, 9, 1, 0, 0): 2852, datetime.datetime(2014, 8, 1, 0, 0): 2970, datetime.datetime(2014, 5, 1, 0, 0): 2864, datetime.datetime(2014, 4, 1, 0, 0): 2862, datetime.datetime(2013, 1, 1, 0, 0): 2864, datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2014, 11, 1, 0, 0): 2756, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2013, 2, 1, 0, 0): 2375, datetime.datetime(2012, 11, 1, 0, 0): 2729, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2014, 1, 1, 0, 0): 2651, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2013, 12, 1, 0, 0): 2765, datetime.datetime(2012, 8, 1, 0, 0): 2954, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2013, 10, 1, 0, 0): 2808, dateti

### Looking at the Death Counts by Gender and Race

#### By Gender

In [9]:
sex_counts = {'M' : 0, 'F' : 0}

for row in data:
    if row[5] == 'M':
        sex_counts['M'] += 1
    elif row[5] == 'F':
        sex_counts['F'] +=1

In [10]:
sex_counts

{'F': 14449, 'M': 86349}

Far more men (86k) die as a result of guns, than women. (14k) 

#### By Race

In [11]:
race_counts = {}

In [12]:
for row in data:
    if row[7] in race_counts:
        race_counts[row[7]] += 1
    else :
        race_counts[row[7]] = 1

In [13]:
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

66k whites died as a result of guns compared to 23k blacks. But how would these numbers look when adjusted for the size of the population? To look at this, we need to get census data:

#### Loading and Exploring Census Data

In [14]:
f = open('census.csv', 'r')

In [15]:
census = list(csv.reader(f))

In [16]:
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [17]:
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

#### Mapping Race to Population Size

In [18]:
mapping = {}
mapping['Asian/Pacific Islander'] = int(census[1][14]) + int(census[1][15])
mapping['Black'] = int(census[1][12])
mapping['Hispanic'] = int(census[1][11])
mapping['Native American/Native Alaskan'] = int(census[1][13])
mapping['White'] = int(census[1][10])

In [19]:
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [20]:
race_per_hundredk = {}
for race, count in race_counts.items():
    race_per_hundredk[race] = 10000*count/mapping[race]

In [21]:
race_per_hundredk

{'Asian/Pacific Islander': 0.8374309664161763,
 'Black': 5.78773477735196,
 'Hispanic': 2.022049121091091,
 'Native American/Native Alaskan': 2.4521955573811085,
 'White': 3.356849303419181}

Restricting by Intent

In [22]:
intents = [row[3] for row in data]

In [23]:
races = [row[7] for row in data]

In [24]:
homicide_race_counts = {}

In [25]:
for i, race in enumerate(races):
    if intents[i]=='Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else :
            homicide_race_counts[race] = 1

In [26]:
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [27]:
for key, value in homicide_race_counts.items():
    homicide_race_counts[key] = 10000*value/mapping[key]

In [28]:
homicide_race_counts

{'Asian/Pacific Islander': 0.3530346230970155,
 'Black': 4.847128498718095,
 'Hispanic': 1.2627161104219913,
 'Native American/Native Alaskan': 0.8717729026240364,
 'White': 0.4635641798145334}

The data shows African Americans to be 10 times more likely to be victims of homicide than Whites.

I would be curious about the breakdown by race of suicides. 
I would want to know the age breakdown of suicides and homicides, maybe draw a histogram.

In [29]:
suicide_race_counts = {}

for i, race in enumerate(races):
    if intents[i] == 'Suicide':
        if race in suicide_race_counts:
            suicide_race_counts[race] += 1
        else :
            suicide_race_counts[race] = 1

In [30]:
suicide_race_counts

{'Asian/Pacific Islander': 745,
 'Black': 3332,
 'Hispanic': 3171,
 'Native American/Native Alaskan': 555,
 'White': 55372}

In [31]:
for key, value in suicide_race_counts.items():
    suicide_race_counts[key] = 10000*value/mapping[key]

In [32]:
suicide_race_counts

{'Asian/Pacific Islander': 0.4705023152187416,
 'Black': 0.8278130270491385,
 'Hispanic': 0.710698045109715,
 'Native American/Native Alaskan': 1.4841532544673013,
 'White': 2.806217969245692}

It seems that rate of suicide is highest among Whites and lowest among Asian/Pacific Islanders according to this dataset.

### What percentage of deaths are due to Police

In [33]:
police_death_count = 0
for row in data:
    if row[4] == '1':
        police_death_count += 1

In [34]:
police_death_count

1402

In [35]:
1402/len(data)

0.01390900613107403

About 1% of deaths seem to be police-related.

What percentage of deaths are police-related when broken down by race:

In [36]:
police_death_by_race_counts = {}
for i, race in enumerate(races):
    if data[i][4]=='1':
        if race in police_death_by_race_counts:
            police_death_by_race_counts[race] += 1
        else :
            police_death_by_race_counts[race] = 1

In [37]:
police_death_by_race_counts

{'Asian/Pacific Islander': 30,
 'Black': 356,
 'Hispanic': 282,
 'Native American/Native Alaskan': 25,
 'White': 709}

Let's look at what percentage of the deaths are police-related for each race group?

In [38]:
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

In [39]:
death_rate_due_police_by_race = {}
for race, count in race_counts.items():
    death_rate_due_police_by_race[race] = 100*police_death_by_race_counts[race]/race_counts[race]

In [40]:
death_rate_due_police_by_race

{'Asian/Pacific Islander': 2.262443438914027,
 'Black': 1.5281593406593406,
 'Hispanic': 3.1256927510529815,
 'Native American/Native Alaskan': 2.7262813522355507,
 'White': 1.070398719748781}

We do not know if this is statistically significant, but this seems to suggest that the highest ratio of deaths accounted for as having to do with police are among Hispanics.

Let's also look at the likelihood of a police death per person (when adjusted by population size) broken down by race:

In [41]:
for key, value in police_death_by_race_counts.items():
    police_death_by_race_counts[key] = 10000*value/mapping[key]

In [42]:
police_death_by_race_counts

{'Asian/Pacific Islander': 0.018946401955117112,
 'Black': 0.08844580961269306,
 'Hispanic': 0.06320304280067475,
 'Native American/Native Alaskan': 0.06685375020122979,
 'White': 0.035931671967694784}

#### Let's See if there is a link between month and homicide rate.

First look at overall deaths by month:

In [43]:
deaths_by_month = {}
for date in dates:
    if date.month in deaths_by_month:
        deaths_by_month[date.month] += 1
    else :
        deaths_by_month[date.month] = 1

In [44]:
deaths_by_month

{1: 8273,
 2: 7093,
 3: 8289,
 4: 8455,
 5: 8669,
 6: 8677,
 7: 8989,
 8: 8783,
 9: 8508,
 10: 8406,
 11: 8243,
 12: 8413}

The deaths seem to increase in the summer months.

Now let's look at only homicides:

In [45]:
homicides_by_month = {'01' : 0, '02' : 0, '03' : 0, '04' : 0, '05' : 0, '06' : 0, 
                      '07' : 0, '08' : 0, '09' : 0, '10' : 0, '11' : 0, '12' : 0}
for row in data:
    if row[3] == 'Homicide':
        homicides_by_month[row[2]] += 1

In [46]:
homicides_by_month

{'01': 2829,
 '02': 2178,
 '03': 2780,
 '04': 2845,
 '05': 2976,
 '06': 3130,
 '07': 3269,
 '08': 3125,
 '09': 2966,
 '10': 2968,
 '11': 2919,
 '12': 3191}

Hard to say without doing statistical analysis here, however, we see that there is an increase in the homicides in summer months with a sharp increase in the month of December still perhaps due to New Year's Eve?

#### Let's Explore Homicide Rate by Gender

In [47]:
homicides_by_gender = {'M' : 0, 'F' : 0}
for row in data:
    if row[3] == 'Homicide':
        homicides_by_gender[row[5]] += 1

In [48]:
homicides_by_gender

{'F': 5373, 'M': 29803}

Far more men are victims of homicide than women.

### Accidental Gun Deaths by Gender and Race

In [49]:
accidental_gun_deaths_by_gender = {'M' : 0, 'F' : 0}

for i, intent in enumerate(intents):
    if intent=='Accidental':
        accidental_gun_deaths_by_gender[data[i][5]] += 1

In [50]:
accidental_gun_deaths_by_gender

{'F': 218, 'M': 1421}

Interesting to Analyze:
Gun Ownership

In [51]:
accidental_gun_deaths_by_race = {'Asian/Pacific Islander' : 0, 'Black' : 0, 'Hispanic' : 0, 'Native American/Native Alaskan' : 0, 'White' : 0 }
for i, intent in enumerate(intents):
    if intent=='Accidental':
        accidental_gun_deaths_by_race[data[i][7]] += 1

In [52]:
accidental_gun_deaths_by_race

{'Asian/Pacific Islander': 12,
 'Black': 328,
 'Hispanic': 145,
 'Native American/Native Alaskan': 22,
 'White': 1132}

In [53]:
data[60]

['61', '2012', '07', 'Suicide', '0', 'M', '22', 'White', '100', 'Home', '2']

In [54]:
headers

['',
 'year',
 'month',
 'intent',
 'police',
 'sex',
 'age',
 'race',
 'hispanic',
 'place',
 'education']

In [55]:
places = [row[9] for row in data]

In [56]:
places_set = set(places)

In [57]:
deaths_by_place = {}
for place in places_set:
    deaths_by_place[place] = 0

In [58]:
for i, intent in enumerate(intents):
        deaths_by_place[data[i][9]] += 1

deaths_by_place_percentage = {}        
        
for key, value in deaths_by_place.items():
    deaths_by_place_percentage[key] = value/len(data)

In [59]:
deaths_by_place_percentage

{'Farm': 0.004662790928391437,
 'Home': 0.6000714299886902,
 'Industrial/construction': 0.0024603662771086727,
 'NA': 0.0137304311593484,
 'Other specified': 0.13642135756661838,
 'Other unspecified': 0.08796801523839759,
 'Residential institution': 0.002013928847794599,
 'School/instiution': 0.006656878112660965,
 'Sports': 0.0012698664656044763,
 'Street': 0.11062719498402745,
 'Trade/service area': 0.03411774043135776}

In [60]:
homicides_count = 0
for intent in intents:
    if intent=='Homicide':
        homicides_count += 1

In [61]:
homicides_by_place = {}
for place in places_set:
    homicides_by_place[place] = 0

for i, intent in enumerate(intents):
    if intent=='Homicide':
        homicides_by_place[data[i][9]] += 1

homicides_by_place_percentage = {}
        
for key, value in homicides_by_place.items():
    homicides_by_place_percentage[key] = value/homicides_count

In [62]:
homicides_by_place_percentage

{'Farm': 0.002160564020923357,
 'Home': 0.3869968160109165,
 'Industrial/construction': 0.0024164202865590177,
 'NA': 0.03931657948601319,
 'Other specified': 0.15433818512622244,
 'Other unspecified': 0.10245622015010235,
 'Residential institution': 0.002075278599044803,
 'School/instiution': 0.010262679099385944,
 'Sports': 0.0006254264271093928,
 'Street': 0.2520752785990448,
 'Trade/service area': 0.04727655219467819}

25% of homicides recorded in this dataset happened on the street and close to 40% happened at home. Sports arenas seem to be safer than other places both in terms of overall gun deaths and in terms of homicides.

#### Let's See if there is a correlation between gun deaths and education

Is there a difference in the educational distribution of suicides vs homicides

In [63]:
education = [row[-1] for row in data]

In [64]:
set(education)

{'1', '2', '3', '4', '5', 'NA'}

In [65]:
education_counts = {'1' : 0, '2' : 0, '3' : 0, '4' : 0, '5' : 0, 'NA' : 0}

In [66]:
for row in data:
    education_counts[row[-1]] += 1

In [67]:
education_counts

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

The largest group here seems to be high school graduates. Since we do not have data on the education level of overall population, we cannot take per person ratios for all these educational groups.

Let's look at the educational break down of suicides:

In [68]:
suicide_by_education_count = {'1' : 0, '2' : 0, '3' : 0, '4' : 0, '5' : 0, 'NA' : 0}
for i, intent in enumerate(intents):
    if intent=='Suicide':
        suicide_by_education_count[data[i][-1]] += 1

In [69]:
suicide_by_education_count

{'1': 9292, '2': 26321, '3': 15533, '4': 11147, '5': 876, 'NA': 6}

In [70]:
homicide_by_education_count = {'1' : 0, '2' : 0, '3' : 0, '4' : 0, '5' : 0, 'NA' : 0}
for i, intent in enumerate(intents):
    if intent=='Homicide':
        homicide_by_education_count[data[i][-1]] += 1

In [71]:
suicide_percentage_of_gundeaths_by_education = {}
for education_level, count in suicide_by_education_count.items():
    suicide_percentage_of_gundeaths_by_education[education_level] = count/education_counts[education_level]

In [72]:
suicide_percentage_of_gundeaths_by_education

{'1': 0.425789304861843,
 '2': 0.6131572203974188,
 '3': 0.7164667896678967,
 '4': 0.8610381585045573,
 '5': 0.6398831263696129,
 'NA': 0.11320754716981132}

As education level rises, the rate of gun deaths that are due to suicide increases. 