#  Exploring U.S. Gun Deaths

Dataset contains information on gun deaths in the U.S. from 2012 to 2014 where each row represents a single fatality.

Data source is FiveThirtyEight.

Column names and description: 
*  index - row number
*  year - year the fatality occurred
*  month - month the fatality occurred
*  intent - perpetrator intent of the crime (Suicide, Accidental, NA, Homicide, Undetermined)
*  police - was police involved? 0 false or 1 true
*  sex - M or F
*  age - victim age
*  race - Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, White
*  hispanic - code indicating Hispanic origin
*  place - where the shooting occurred
*  education - educational status of the victim
    -  1 less than highschool
    -  2 graduated from high school
    -  3 some college
    -  4 graduated from college
    -  5 not available
    
General question: Are there demographic patterns for the victims? 


In [2]:
#read in the file and view the data in list of lists format
import csv 
f=open("guns.csv","r")
csvreader=csv.reader(f)
data=list(csvreader)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [3]:
#view header then remove header from dataset 
headers=data[0]
print(headers)
data=data[1:]
print(data[0:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


In [4]:
#calculate how many gun deaths occurred each year 
years=[event[1] for event in data]

year_counts={}
for event in years:
    if event in year_counts:
        year_counts[event]=year_counts[event]+1
    else:
        year_counts[event]=1
print(year_counts)

{'2013': 33636, '2014': 33599, '2012': 33563}


Observation: Gun deaths have been fairly constant within the US from 2012-2014 at ~34,000 deaths/year. The most deaths durign this time period occurred in 2013. Next, let's explore how gun deaths varied by month and year. 

In [5]:
#create a dictionary with year & month as the key with number of events as the value
import datetime

dates=[datetime.datetime(year=int(event[1]),month=int(event[2]), day=1 ) for event in data]
dates[0:5]

date_counts={}
for an_event in dates:
    if an_event in date_counts:
        date_counts[an_event]=date_counts[an_event]+1
    else:
        date_counts[an_event]=1
            
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

In [6]:
sex_counts={}
for event in data: 
    if event[5] in sex_counts:
        sex_counts[event[5]]=sex_counts[event[5]]+1
    else:
        sex_counts[event[5]]=1

race_counts={}
for event in data: 
    if event[7] in race_counts:
        race_counts[event[7]]=race_counts[event[7]]+1
    else:
        race_counts[event[7]]=1
        
print(sex_counts)
print(race_counts)

{'M': 86349, 'F': 14449}
{'White': 66237, 'Hispanic': 9022, 'Asian/Pacific Islander': 1326, 'Black': 23296, 'Native American/Native Alaskan': 917}


Observation: Looking at gun deaths by gender, there are about 6 times more male victims than women victims. Looking at gun deaths by race, the victims are mostly White. Let's normalize this data by considering the percentage of each race within America. 

In [7]:
max_deaths=0
month_of_deaths=None
for ind,value in date_counts.items():
    if value > max_deaths:
        max_deaths= value
        month_of_deaths=ind

print("The most deaths took place in " + str(month_of_deaths)+ " with " + str(max_deaths) + " victims.")

The most deaths took place in 2013-07-01 00:00:00 with 3079 victims.


In [8]:
#read in second file containing the total number of people by racial group
#convert data to a list of lists 
f2=open("census.csv","r")
csvreader2=csv.reader(f2)
census=list(csvreader2)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [9]:
#create dictionary from census data with correlating race names as first file with total population
mapping={
    "Asian/Pacific Islander":int(census[1][13])+int(census[1][14]),
    "Black":int(census[1][12]),
    "Native American/Native Alaskan":int(census[1][13]),
    "Hispanic":int(census[1][11]),
    "White":int(census[1][10])
}
print(mapping)

{'White': 197318956, 'Hispanic': 44618105, 'Asian/Pacific Islander': 18899022, 'Black': 40250635, 'Native American/Native Alaskan': 3739506}


In [10]:
#create dictionary that gives the number of deaths per 10,000ppl by race
race_per_hundredk={}

for key,value in race_counts.items():
    race_per_hundredk[key]=round((value*100000)/mapping[key],2)

print(race_per_hundredk)

{'White': 33.57, 'Asian/Pacific Islander': 7.02, 'Hispanic': 20.22, 'Black': 57.88, 'Native American/Native Alaskan': 24.52}


Observation: Now that the data has been normalized by race population, you can see that the Black community has the highest gun deaths per 10,0000 ppl even though there were more overall White deaths from 2012-2014.

In [11]:
#create dictionary that counts deaths with homicide intent by normalized race population
intents=[event[3] for event in data]
races=[event[7] for event in data]

homicide_race_counts={}
for i,race in enumerate(races):
    if intents[i]=="Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race]=homicide_race_counts[race]+1
        else: 
            homicide_race_counts[race]=1

for key,value in homicide_race_counts.items():
    homicide_race_counts[key]=(value*100000)/mapping[key]

print(homicide_race_counts)
  

{'White': 4.6356417981453335, 'Hispanic': 12.627161104219912, 'Asian/Pacific Islander': 2.9578250133789994, 'Black': 48.47128498718095, 'Native American/Native Alaskan': 8.717729026240365}


In considering the gun deaths by homicide that occurred between 2012-2014, the race most affected by population percentage was the Black community.  

Question 1: Is there a link bewteen month and homicide rate? 

In [12]:
#create dictionary that counts deaths with homicide intent by normalized race population
intents=[event[3] for event in data]
month=[event[2] for event in data]

homicide_month_counts={}
for i,id_month in enumerate(month):
    if intents[i]=="Homicide":
        if id_month in homicide_month_counts:
            homicide_month_counts[id_month]=homicide_month_counts[id_month]+1
        else: 
            homicide_month_counts[id_month]=1
homicide_month_counts

{'01': 2829,
 '02': 2178,
 '03': 2780,
 '04': 2845,
 '05': 2976,
 '06': 3130,
 '07': 3269,
 '08': 3125,
 '09': 2966,
 '10': 2968,
 '11': 2919,
 '12': 3191}

Observation: Looking at homicides per month, it appears the most homicides occur in July where the least homicides occur in February. It also seems that homicide tend to be worse in the summer months and during the holidays.

Question 2: How does the homicide rate look by gender? And by education levels? 

In [13]:
#created a function that is flexible in understanding gun death relationships
def new_trends(input1, input2, interest):
    column1=[event[input1] for event in data]
    column2=[event[input2] for event in data]
    
    trend_dict={}
    for key, value in enumerate(column2):
        if column1[key]==interest:
            if value in trend_dict:
                trend_dict[value]=trend_dict[value]+1
            else:
                trend_dict[value]=1
            
    return trend_dict

In [14]:
#let's test how my general function compared to my prevous results
new_trends(3,2,"Homicide")

{'01': 2829,
 '02': 2178,
 '03': 2780,
 '04': 2845,
 '05': 2976,
 '06': 3130,
 '07': 3269,
 '08': 3125,
 '09': 2966,
 '10': 2968,
 '11': 2919,
 '12': 3191}

In [15]:
#new_trends function agrees with previous results
#let's explore homicide's relationship with gender
new_trends(3,5,"Homicide")

{'F': 5373, 'M': 29803}

In [16]:
new_trends(3,4,"Homicide")

{'0': 33774, '1': 1402}

In [17]:
new_trends(3,10,"Homicide")

{'1': 11839, '2': 15649, '3': 5640, '4': 1559, '5': 455, 'NA': 34}

In [18]:
new_trends(3,6,"Homicide")

{'0': 22,
 '1': 34,
 '10': 37,
 '107': 1,
 '11': 39,
 '12': 50,
 '13': 85,
 '14': 125,
 '15': 261,
 '16': 474,
 '17': 710,
 '18': 1138,
 '19': 1364,
 '2': 29,
 '20': 1467,
 '21': 1570,
 '22': 1628,
 '23': 1459,
 '24': 1442,
 '25': 1300,
 '26': 1245,
 '27': 1136,
 '28': 1119,
 '29': 1055,
 '3': 36,
 '30': 993,
 '31': 928,
 '32': 964,
 '33': 884,
 '34': 867,
 '35': 782,
 '36': 718,
 '37': 646,
 '38': 625,
 '39': 552,
 '4': 32,
 '40': 555,
 '41': 540,
 '42': 537,
 '43': 493,
 '44': 435,
 '45': 399,
 '46': 401,
 '47': 438,
 '48': 386,
 '49': 414,
 '5': 34,
 '50': 387,
 '51': 375,
 '52': 336,
 '53': 295,
 '54': 261,
 '55': 232,
 '56': 250,
 '57': 214,
 '58': 204,
 '59': 204,
 '6': 42,
 '60': 145,
 '61': 155,
 '62': 155,
 '63': 112,
 '64': 130,
 '65': 101,
 '66': 106,
 '67': 89,
 '68': 95,
 '69': 60,
 '7': 38,
 '70': 54,
 '71': 56,
 '72': 51,
 '73': 58,
 '74': 43,
 '75': 35,
 '76': 40,
 '77': 25,
 '78': 46,
 '79': 29,
 '8': 23,
 '80': 37,
 '81': 26,
 '82': 20,
 '83': 28,
 '84': 30,
 '85': 19

In [19]:
new_trends(3,9,"Homicide")

{'Farm': 76,
 'Home': 13613,
 'Industrial/construction': 85,
 'NA': 1383,
 'Other specified': 5429,
 'Other unspecified': 3604,
 'Residential institution': 73,
 'School/instiution': 361,
 'Sports': 22,
 'Street': 8867,
 'Trade/service area': 1663}

Observation: In looking at trends for homicide gun deaths, I have a very generalized undestanding of the victim demographic and situation: The victim may be a Black male (by % race) between 18-29 with a highschool eduation or less. The crime may have taken place sometime within the summer or around the holidays in the victims home and most likely did not involve police. 

In [20]:
#what are the unique values for intent
intents=[event[3] for event in data]
set(intents)

{'Accidental', 'Homicide', 'NA', 'Suicide', 'Undetermined'}

In [21]:
#what are the unique values for location
location=[event[9] for event in data]
set(location)

{'Farm',
 'Home',
 'Industrial/construction',
 'NA',
 'Other specified',
 'Other unspecified',
 'Residential institution',
 'School/instiution',
 'Sports',
 'Street',
 'Trade/service area'}

Question 3: How do gender and race very by these other intents? 

In [22]:
new_trends(3,2,"Suicide")

{'01': 5220,
 '02': 4732,
 '03': 5309,
 '04': 5438,
 '05': 5506,
 '06': 5367,
 '07': 5514,
 '08': 5421,
 '09': 5343,
 '10': 5256,
 '11': 5086,
 '12': 4983}

In [23]:
new_trends(3,4,"Suicide")

{'0': 63175}

In [24]:
new_trends(3,5,"Suicide")

{'F': 8689, 'M': 54486}

In [25]:
new_trends(3,6,"Suicide")

{'10': 3,
 '100': 1,
 '101': 2,
 '102': 2,
 '11': 14,
 '12': 54,
 '13': 127,
 '14': 217,
 '15': 276,
 '16': 355,
 '17': 432,
 '18': 556,
 '19': 631,
 '20': 687,
 '21': 864,
 '22': 1012,
 '23': 949,
 '24': 936,
 '25': 883,
 '26': 935,
 '27': 893,
 '28': 826,
 '29': 846,
 '30': 832,
 '31': 870,
 '32': 824,
 '33': 782,
 '34': 795,
 '35': 810,
 '36': 765,
 '37': 821,
 '38': 836,
 '39': 812,
 '40': 827,
 '41': 909,
 '42': 934,
 '43': 1009,
 '44': 979,
 '45': 948,
 '46': 1007,
 '47': 1059,
 '48': 1205,
 '49': 1218,
 '50': 1248,
 '51': 1338,
 '52': 1334,
 '53': 1376,
 '54': 1390,
 '55': 1324,
 '56': 1343,
 '57': 1226,
 '58': 1274,
 '59': 1196,
 '60': 1187,
 '61': 1123,
 '62': 917,
 '63': 909,
 '64': 967,
 '65': 910,
 '66': 857,
 '67': 761,
 '68': 755,
 '69': 799,
 '70': 812,
 '71': 717,
 '72': 667,
 '73': 664,
 '74': 618,
 '75': 633,
 '76': 532,
 '77': 543,
 '78': 545,
 '79': 537,
 '80': 497,
 '81': 524,
 '82': 493,
 '83': 516,
 '84': 453,
 '85': 418,
 '86': 394,
 '87': 298,
 '88': 328,
 '89'

In [26]:
scd_by_race=new_trends(3,7,"Suicide")

for key,value in scd_by_race.items():
    scd_by_race[key]=(value*100000)/mapping[key]

scd_by_race

{'Asian/Pacific Islander': 3.942002924807432,
 'Black': 8.278130270491385,
 'Hispanic': 7.10698045109715,
 'Native American/Native Alaskan': 14.841532544673013,
 'White': 28.06217969245692}

In [27]:
new_trends(3,9,"Suicide")

{'Farm': 378,
 'Home': 45415,
 'Industrial/construction': 155,
 'Other specified': 8031,
 'Other unspecified': 4774,
 'Residential institution': 125,
 'School/instiution': 295,
 'Sports': 98,
 'Street': 2181,
 'Trade/service area': 1723}

In [28]:
new_trends(3,10,"Suicide")

{'1': 9292, '2': 26321, '3': 15533, '4': 11147, '5': 876, 'NA': 6}

Observation: In looking at trends for suicide gun deaths, I have a very generalized undestanding of the victim demographic and situation: The victim may be a white (by % race) male between 40-60 with a highschool eduation or some college education. The death probably occurred in the victims home but mostly likely did not take place around the holidays.

In [29]:
new_trends(3,4,"Accidental")

{'0': 1639}

In [30]:
new_trends(3,5,"Accidental")

{'F': 218, 'M': 1421}

In [31]:
new_trends(3,6,"Accidental")

{'0': 11,
 '1': 4,
 '10': 12,
 '11': 7,
 '12': 11,
 '13': 11,
 '14': 17,
 '15': 21,
 '16': 25,
 '17': 23,
 '18': 45,
 '19': 49,
 '2': 21,
 '20': 41,
 '21': 45,
 '22': 47,
 '23': 38,
 '24': 40,
 '25': 31,
 '26': 32,
 '27': 23,
 '28': 24,
 '29': 30,
 '3': 28,
 '30': 28,
 '31': 21,
 '32': 19,
 '33': 23,
 '34': 26,
 '35': 24,
 '36': 15,
 '37': 27,
 '38': 20,
 '39': 19,
 '4': 19,
 '40': 15,
 '41': 21,
 '42': 13,
 '43': 15,
 '44': 21,
 '45': 15,
 '46': 21,
 '47': 23,
 '48': 20,
 '49': 25,
 '5': 7,
 '50': 29,
 '51': 31,
 '52': 26,
 '53': 24,
 '54': 25,
 '55': 23,
 '56': 21,
 '57': 23,
 '58': 21,
 '59': 17,
 '6': 7,
 '60': 13,
 '61': 20,
 '62': 17,
 '63': 16,
 '64': 17,
 '65': 21,
 '66': 22,
 '67': 11,
 '68': 14,
 '69': 14,
 '7': 5,
 '70': 9,
 '71': 14,
 '72': 9,
 '73': 14,
 '74': 7,
 '75': 5,
 '76': 6,
 '77': 5,
 '78': 4,
 '79': 6,
 '8': 9,
 '80': 11,
 '81': 10,
 '82': 7,
 '83': 3,
 '84': 4,
 '85': 2,
 '86': 1,
 '87': 1,
 '88': 1,
 '89': 4,
 '9': 8,
 '91': 4,
 '92': 2,
 '94': 1,
 '96': 1,
 'N

In [32]:
acc_by_race=new_trends(3,7,"Accidental")
print(acc_by_race)
for key,value in acc_by_race.items():
    acc_by_race[key]=(value*100000)/mapping[key]

acc_by_race

{'White': 1132, 'Black': 328, 'Hispanic': 145, 'Asian/Pacific Islander': 12, 'Native American/Native Alaskan': 22}


{'Asian/Pacific Islander': 0.06349534912441501,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985,
 'Native American/Native Alaskan': 0.5883130017708221,
 'White': 0.5736904466492312}

In [33]:
new_trends(3,9,"Accidental")

{'Farm': 13,
 'Home': 923,
 'Industrial/construction': 7,
 'Other specified': 212,
 'Other unspecified': 350,
 'Residential institution': 5,
 'School/instiution': 11,
 'Sports': 5,
 'Street': 78,
 'Trade/service area': 35}

In [34]:
new_trends(3,10,"Accidental")

{'1': 492, '2': 633, '3': 327, '4': 146, '5': 28, 'NA': 13}

Observation: In looking at trends for accidnetal gun deaths, I have a very generalized undestanding of the victim demographic and situation: The victim may be a black male (by % race) between late teens into 30s with a some or complete highschool eduation. The death probably occurred in the victims home but did not involve any police. In looking only  at total accidental deaths, there are more white victims. 

Question 4: How do gun death rates correlate with location and education?

In [35]:
#create function to find a singluar trend with gun deaths like above
def one_trend(input):
    one_dict={}
    for event in data: 
        if event[input] in one_dict:
            one_dict[event[input]]=one_dict[event[input]]+1
        else:
            one_dict[event[input]]=1
    
    return one_dict

In [36]:
one_trend(9)

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

In [37]:
one_trend(10)

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

Observation: Overall, the gun deaths occur in the victim's home and the victim typically graduated from high school but didn't attend college.

Question 5: What about female gun deaths? 

In [40]:
new_trends(5,3,"F")

{'Accidental': 218, 'Homicide': 5373, 'Suicide': 8689, 'Undetermined': 169}

In [41]:
new_trends(5,4,"F")

{'0': 14386, '1': 63}

In [42]:
new_trends(5,6,"F")

{'0': 14,
 '1': 21,
 '10': 19,
 '101': 1,
 '11': 21,
 '12': 31,
 '13': 52,
 '14': 65,
 '15': 78,
 '16': 114,
 '17': 128,
 '18': 173,
 '19': 201,
 '2': 16,
 '20': 233,
 '21': 273,
 '22': 292,
 '23': 240,
 '24': 306,
 '25': 250,
 '26': 248,
 '27': 249,
 '28': 260,
 '29': 254,
 '3': 18,
 '30': 265,
 '31': 262,
 '32': 256,
 '33': 235,
 '34': 246,
 '35': 250,
 '36': 206,
 '37': 212,
 '38': 231,
 '39': 223,
 '4': 20,
 '40': 260,
 '41': 278,
 '42': 251,
 '43': 281,
 '44': 270,
 '45': 256,
 '46': 271,
 '47': 297,
 '48': 340,
 '49': 328,
 '5': 24,
 '50': 325,
 '51': 333,
 '52': 303,
 '53': 324,
 '54': 289,
 '55': 286,
 '56': 290,
 '57': 260,
 '58': 238,
 '59': 232,
 '6': 31,
 '60': 187,
 '61': 172,
 '62': 170,
 '63': 141,
 '64': 192,
 '65': 158,
 '66': 151,
 '67': 116,
 '68': 120,
 '69': 132,
 '7': 12,
 '70': 93,
 '71': 109,
 '72': 80,
 '73': 73,
 '74': 59,
 '75': 66,
 '76': 61,
 '77': 50,
 '78': 62,
 '79': 50,
 '8': 13,
 '80': 59,
 '81': 55,
 '82': 44,
 '83': 53,
 '84': 38,
 '85': 36,
 '86': 2

In [43]:
new_trends(5,7,"F")

{'Asian/Pacific Islander': 244,
 'Black': 2318,
 'Hispanic': 1073,
 'Native American/Native Alaskan': 127,
 'White': 10687}

In [45]:
new_trends(5,9,"F")

{'Farm': 50,
 'Home': 10274,
 'Industrial/construction': 20,
 'NA': 63,
 'Other specified': 1442,
 'Other unspecified': 1202,
 'Residential institution': 40,
 'School/instiution': 110,
 'Sports': 12,
 'Street': 840,
 'Trade/service area': 396}

In [46]:
new_trends(5,10,"F")

{'1': 2379, '2': 5588, '3': 3930, '4': 2346, '5': 187, 'NA': 19}

Observation: In looking at all female deaths, I can conclude the female gun deaths are typically suicide victims which is fairly constant from 20 - 55. 