# Gun Deaths in US

In this notebook I will explore demographic data on gun deaths in the US, and break them down by gender, race, date and intention (accidental vs homicide) looking for meaningful trends

### Explanation of variables in guns.csv (index is 1 less)

1. Row number
1. Year
2. Month (1-12)
3. Intent (Suicide, Accidental, NA, Homicide, Undetermined)
4. Police
5. Sex (M/F)
6. Age (to one decimal)
7. Race (Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White)
8. Hispanic
9. Place (Home, Street, Other specified
10. Education (1: less than HS, 2: graduated HS, 3: some college, 4: graduated college, 5: not available)

In [1]:
import csv
import datetime

with open("guns.csv","r") as input_file:
    data = list(csv.reader(input_file))

In [2]:
headers = data[0]
data = data[1:]

In [3]:
def count_deaths(index, dataset):
    death_count = {}
    for each in dataset:
        var = each[index]
        if var in death_count:
            death_count[var] += 1
        else:
            death_count[var] = 1
    return death_count 

years = [each[1] for each in data]
year_counts = count_deaths(1, data)

print (year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


In [4]:
dates = [datetime.datetime(year=int(each[1]),month=int(each[2]),day=1) for each in data]

date_counts = {}
for each in dates:
    if each in date_counts:
        date_counts[each] += 1
    else:
        date_counts[each] = 1

        
for date in date_counts:
    print(date.strftime("%b %Y") + ": %d" % (date_counts[date]))

Jan 2012: 2758
Feb 2012: 2357
Mar 2012: 2743
Apr 2012: 2795
May 2012: 2999
Jun 2012: 2826
Jul 2012: 3026
Aug 2012: 2954
Sep 2012: 2852
Oct 2012: 2733
Nov 2012: 2729
Dec 2012: 2791
Jan 2013: 2864
Feb 2013: 2375
Mar 2013: 2862
Apr 2013: 2798
May 2013: 2806
Jun 2013: 2920
Jul 2013: 3079
Aug 2013: 2859
Sep 2013: 2742
Oct 2013: 2808
Nov 2013: 2758
Dec 2013: 2765
Jan 2014: 2651
Feb 2014: 2361
Mar 2014: 2684
Apr 2014: 2862
May 2014: 2864
Jun 2014: 2931
Jul 2014: 2884
Aug 2014: 2970
Sep 2014: 2914
Oct 2014: 2865
Nov 2014: 2756
Dec 2014: 2857


**Gun deaths occur randomly throughout the year**

# Gun deaths by race

Below I will explore gun deaths by race.  Initially I will determine how many absolute gun deaths by race occurred, and then I will determine the death rate per 100k by race.  Finally I will stratify by intent, obtaining the homicide and accidental death rates per 100k

In [5]:
sex_counts = count_deaths(5,data)
race_counts = count_deaths(7,data)
print(race_counts)
print(sex_counts)

{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}
{'M': 86349, 'F': 14449}


Males die much more frequently than women by gun.  We have total numbers of gun deaths by race, but unclear of significance without breaking it down into death rates / 100k people.  In the next cells we'll use demographic data from US census to convert death counts by race into rates

In [6]:
with open("census.csv","r") as input_file:
    census = list(csv.reader(input_file))

Census data is table with only **2 rows**, shown below

<table class="dataframe">
<thead>
<tr>
<th>Id</th>
<th>Year</th>
<th>Id.1</th>
<th>Sex</th>
<th>Id.2</th>
<th>Hispanic Origin</th>
<th>Id.3</th>
<th>Id2</th>
<th>Geography</th>
<th>Total</th>
<th>Race Alone - White</th>
<th>Race Alone - Hispanic</th>
<th>Race Alone - Black or African American</th>
<th>Race Alone - American Indian and Alaska Native</th>
<th>Race Alone - Asian</th>
<th>Race Alone - Native Hawaiian and Other Pacific Islander</th>
<th>Two or More Races</th>
</tr>
</thead>
<tbody>
<tr>
<td>cen42010</td>
<td>April 1, 2010 Census</td>
<td>totsex</td>
<td>Both Sexes</td>
<td>tothisp</td>
<td>Total</td>
<td>0100000US</td>
<td>NaN</td>
<td>United States</td>
<td>308745538</td>
<td>197318956</td>
<td>44618105</td>
<td>40250635</td>
<td>3739506</td>
<td>15159516</td>
<td>674625</td>
<td>6984195</td>
</tr>
</tbody>
</table>

In [7]:
num_white = int(census[1][10])
num_asian = int(census[1][14]) + int(census[1][15])
num_hispanic = int(census[1][11])
num_black = int(census[1][12])
num_indian = int(census[1][13])

mapping = {
    "Asian/Pacific Islander": num_asian,
    "Black": num_black,
    "Native American/Native Alaskan": num_indian,
    "Hispanic": num_hispanic,
    "White": num_white
}

In [8]:
race_rates_per_hundredk = {}

for race in race_counts:
    rate = race_counts[race]/mapping[race] * 100000
    race_rates_per_hundredk[race] = rate

def display_rates(title, rates):
    print(title + ":")
    for each in rates:
        print(each + ": %.1f" % (rates[each]) + "/100,000")
        
display_rates("Rates of gun deaths by race",race_rates_per_hundredk)

Rates of gun deaths by race:
Asian/Pacific Islander: 8.4/100,000
White: 33.6/100,000
Native American/Native Alaskan: 24.5/100,000
Black: 57.9/100,000
Hispanic: 20.2/100,000


These represent rates of gun deaths per 100k people, but do not specifically focus on rates of gun *homicides*.  Will do this below by specifying intent

# Homicide gun death rates by race

* Blacks have >10x of rate dying by gun homicide as whites
* Hispanics have 3x rate as whites
* Native America/Native Alaskan have 2x rate as whites

In [9]:
intents = [each[3] for each in data]
races = [each[7] for each in data]

def get_death_counts(intent,demographic):
    homicide_counts = {}

    for i, value in enumerate(demographic):
        if intents[i] == intent:
            if value in homicide_counts:
                homicide_counts[value] += 1
            else:
                homicide_counts[value] = 1
    return homicide_counts

homicide_race_counts = get_death_counts("Homicide",races)
homicide_rates_by_race = {}

for each in homicide_race_counts:
    homicide_rates_by_race[each] = homicide_race_counts[each]/mapping[each]*100000

display_rates("Rates of gun death homicides by race",homicide_rates_by_race)

Rates of gun death homicides by race:
White: 4.6/100,000
Asian/Pacific Islander: 3.5/100,000
Black: 48.5/100,000
Native American/Native Alaskan: 8.7/100,000
Hispanic: 12.6/100,000


# Accidental gun death rates by rates

* Interestingly, these do not vary much by race

In [10]:
accident_race_counts = get_death_counts("Accidental",races)
accident_race_rates = {}

for each in accident_race_counts:
    accident_race_rates[each] = accident_race_counts[each]/mapping[each]* 100000
    
display_rates("Rates of gun death accidents by race",accident_race_rates)

Rates of gun death accidents by race:
White: 0.6/100,000
Black: 0.8/100,000
Hispanic: 0.3/100,000
Asian/Pacific Islander: 0.1/100,000
Native American/Native Alaskan: 0.6/100,000


# Rates by gender

In [11]:
# Obtain presumed number of people in each gender in US
total_people = int(census[1][9])
each_gender = total_people/2

In [12]:
#sex_counts was already obtained above by calling count_deaths(5,data)

print("Total death counts by gender:")
for each in sex_counts:
    print(each + ": %d" % (sex_counts[each]))


Total death counts by gender:
M: 86349
F: 14449


In [13]:
def get_rates_by_gender(intent):
    counts = {}
    gender = [each[5] for each in data]
    for i, value in enumerate(gender):
        if intents[i] == intent:
            if value in counts:
                counts[value] += 1
            else:
                counts[value] = 1
    rates = {}
    for each in gender:
        rates[each] = counts[each] / each_gender * 100000
    return rates

In [14]:
display_rates("Rates of gun death homicides by gender",get_rates_by_gender("Homicide"))

Rates of gun death homicides by gender:
M: 19.3/100,000
F: 3.5/100,000


In [15]:
display_rates("Rates of gun death accidents by gender",get_rates_by_gender("Accidental"))

Rates of gun death accidents by gender:
M: 0.9/100,000
F: 0.1/100,000


Men die approximately ***9 times*** more often by gun than women, both via homicide and accident

# Gun deaths by month

In [16]:
month_counts = {}

months = [int(each[2]) for each in data]
for each in months:
    month = int(each)
    if month in month_counts:
        month_counts[month] += 1
    else:
        month_counts[month] = 1


print("Total deaths by month")
for each in month_counts:
    print ("%d: %d" % (each,month_counts[each]))

Total deaths by month
1: 8273
2: 7093
3: 8289
4: 8455
5: 8669
6: 8677
7: 8989
8: 8783
9: 8508
10: 8406
11: 8243
12: 8413


There doesn't appear to be a link between total gun deaths by month, although February had significatly fewer.  Let's look at homicides and accidents specifically

In [17]:
accidents_by_month = get_death_counts("Accidental",months)

print("Total accidental deaths by month:")
for each in accidents_by_month:
    print("%d: %d" % (each, accidents_by_month[each]))

Total accidental deaths by month:
2: 127
7: 149
8: 164
9: 118
12: 179
1: 152
3: 134
4: 98
5: 115
11: 160
10: 131
6: 112


Seems random

In [18]:
homicides_by_month = get_death_counts("Homicide",months)

print("Total homicide deaths by month:")
for each in homicides_by_month:
    print("%d: %d" % (each, homicides_by_month[each]))

Total homicide deaths by month:
3: 2780
4: 2845
6: 3130
7: 3269
8: 3125
9: 2966
10: 2968
12: 3191
1: 2829
2: 2178
5: 2976
11: 2919


Slight Summer spike?