# Gun Deaths

This notebook explores deaths caused by firearms in the United States between 2012 and 2014.

We examine the data as compared by race and sex, as well as other criteria.

Charts and diagrams will come in later notebooks with Pandas/NumPy/Matlab.  This notebook is purely for showing the ability to summarize data within Python code.

Throughout this document, we refer to "deaths by gun" simply as deaths to avoid repetition.

(I make no judgements about the circumstances of these deaths other than to say that any preventable, premature death is sad.)

## Loading and Counting

In [12]:
import csv

f = open('guns.csv', 'r')
guns_csv = csv.reader(f)
data = list(guns_csv)

The first order is to find out what data we have available.  The headers of the CSV should provide an overview what we may analyze.

In [13]:
headers = data[0]
data = data[1:]
headers

['',
 'year',
 'month',
 'intent',
 'police',
 'sex',
 'age',
 'race',
 'hispanic',
 'place',
 'education']

Here we see the criteria by which we may analyze the various deaths as headers in the individual records.

Let's first examine if the total deaths increased or decreased during these years.

In [14]:
years = [gun_death[1] for gun_death in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

The number of deaths by gun didn't change much per year between 2012 and 2014.

From here, we will summarize much of the data using straight Python (without pandas).

In [15]:
import datetime
dates = [datetime.datetime(year=int(gun_death[1]), month=int(gun_death[2]), day=1) for gun_death in data]

In [16]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

In [17]:
sex_counts = {}
sexes = [gun_death[5] for gun_death in data]
for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
sex_counts

{'F': 14449, 'M': 86349}

In [18]:
race_counts = {}
races = [gun_death[7] for gun_death in data]
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

In [19]:
intent_counts = {}
intentions = [gun_death[3] for gun_death in data]
for intent in intentions:
    if intent in intent_counts:
        intent_counts[intent] += 1
    else:
        intent_counts[intent] = 1
intent_counts

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

## Analysis

This section is where we start are real analysis.

### Deaths by Month

Is there a particular time of year when deaths are more prevelant?

Let's examine the average number of deaths per day for each month.  This should give us a good indication and account for the different lengths of each month.

In [None]:
deaths_per_day = {}
for date in dates:
    if date in deaths_per_day:
        deaths_per_day[date] += 1
    else:
        deaths_per_day[date] = 1
days_per_month = [31, 28, 31, 30, 31, 30, 31, 30, 31, 31, 30, 31]
for key in deaths_per_day:
    deaths_per_day[key] = deaths_per_day[key] / days_per_month[key.month - 1]
deaths_per_day

{datetime.datetime(2012, 1, 1, 0, 0): 88.96774193548387,
 datetime.datetime(2012, 2, 1, 0, 0): 84.17857142857143,
 datetime.datetime(2012, 3, 1, 0, 0): 88.48387096774194,
 datetime.datetime(2012, 4, 1, 0, 0): 93.16666666666667,
 datetime.datetime(2012, 5, 1, 0, 0): 96.74193548387096,
 datetime.datetime(2012, 6, 1, 0, 0): 94.2,
 datetime.datetime(2012, 7, 1, 0, 0): 97.61290322580645,
 datetime.datetime(2012, 8, 1, 0, 0): 98.46666666666667,
 datetime.datetime(2012, 9, 1, 0, 0): 92.0,
 datetime.datetime(2012, 10, 1, 0, 0): 88.16129032258064,
 datetime.datetime(2012, 11, 1, 0, 0): 90.96666666666667,
 datetime.datetime(2012, 12, 1, 0, 0): 90.03225806451613,
 datetime.datetime(2013, 1, 1, 0, 0): 92.38709677419355,
 datetime.datetime(2013, 2, 1, 0, 0): 84.82142857142857,
 datetime.datetime(2013, 3, 1, 0, 0): 92.3225806451613,
 datetime.datetime(2013, 4, 1, 0, 0): 93.26666666666667,
 datetime.datetime(2013, 5, 1, 0, 0): 90.51612903225806,
 datetime.datetime(2013, 6, 1, 0, 0): 97.33333333333333

We see from the data that February is consistently has the fewest deaths per day.

The summer months (June, July, and August) have the highest rates.

My preliminary hypothesis is that it is related to weather, or more precisely temperature.  People outside perhaps are more prevelant to death by firearm.

This doesn't really make sense for suicides though, which make a large portion of deaths.  One would think that winter would be more prevelant due to the season making people sad.

### Deaths by Type per Race

How does each race primarily perish?

In [None]:
intent_by_race_count = {}
intentions_by_race = [(gun_death[3], gun_death[7]) for gun_death in data]
for intent in intentions_by_race:
    if intent in intent_by_race_count:
        intent_by_race_count[intent] += 1
    else:
        intent_by_race_count[intent] = 1
for kv in intent_by_race_count:
    intent_by_race_count[kv] = intent_by_race_count[kv] / race_counts[kv[1]]
intent_by_race_count

Some alarming statistics about suicides:
* Suicide accounts for 83% of deaths for whites
* Suicide also accounts for more than half of deaths for 'Asian/Pacific Islander' and 'Native American/Native Alaskan' races

About homicides:
* Homicide accounts for 84% of deaths for blacks
* Homicide accounts for 62% of deaths for hispanics
* Homicide accounts for 35% of deaths for 'Native American/Native Alaskan'

No group perishes mostly by accident, nor are most deaths undetermined.  Each race's death by suicide and homicide accounts for over 95% for each race.

### Deaths per 100000 per Race

Which race has more death per capita?  Here we measure the number of deaths per 100k people of each race.

We take the total population by race from the 2010 census.

In [25]:
c = open('census.csv', 'r')
census_csv = csv.reader(c)
census = list(census_csv)
mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
    'Black': int(census[1][12]),
    'Native American/Native Alaskan': int(census[1][13]),
    'Hispanic': int(census[1][11]),
    'White': int(census[1][10])
}
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

From this analysis, we see that there are by far more deaths for Blacks, followed by Whites, Nnative Americans, Hispanic, and Asian/Pacific Islanders.

The difference between the highest race, Blacks, is about 75% higher than the second highest race, Whites.

It might be worth exploring why there is such a discrepancy between the races.  Is it cultural or social, or is it coincidence?

### Deaths per 100k by Race and Intent

Here we combine analysis of the number of deaths per 100k by race and the number of deaths by race and intent.

In [27]:
c = open('census.csv', 'r')
census_csv = csv.reader(c)
census = list(census_csv)
mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
    'Black': int(census[1][12]),
    'Native American/Native Alaskan': int(census[1][13]),
    'Hispanic': int(census[1][11]),
    'White': int(census[1][10])
}
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000

# use already processed data
intent_race_100k = {}
for intent_race in intent_by_race_count:
    intent = intent_race[0]
    race = intent_race[1]
    intent_race_100k[intent_race] = intent_by_race_count[intent_race] * race_per_hundredk[race]
intent_race_100k

{('Accidental', 'Asian/Pacific Islander'): 0.07578560782046845,
 ('Accidental', 'Black'): 0.8148939762068351,
 ('Accidental', 'Hispanic'): 0.32498018461339856,
 ('Accidental', 'Native American/Native Alaskan'): 0.5883130017708221,
 ('Accidental', 'White'): 0.5736904466492312,
 ('Homicide', 'Asian/Pacific Islander'): 3.5303462309701548,
 ('Homicide', 'Black'): 48.47128498718095,
 ('Homicide', 'Hispanic'): 12.627161104219914,
 ('Homicide', 'Native American/Native Alaskan'): 8.717729026240365,
 ('Homicide', 'White'): 4.635641798145334,
 ('NA', 'White'): 0.00050679368078554,
 ('Suicide', 'Asian/Pacific Islander'): 4.705023152187415,
 ('Suicide', 'Black'): 8.278130270491385,
 ('Suicide', 'Hispanic'): 7.10698045109715,
 ('Suicide', 'Native American/Native Alaskan'): 14.841532544673013,
 ('Suicide', 'White'): 28.06217969245692,
 ('Undetermined', 'Asian/Pacific Islander'): 0.0631546731837237,
 ('Undetermined', 'Black'): 0.31303853964043055,
 ('Undetermined', 'Hispanic'): 0.16136947098044618,
 

To be honest, this analysis isn't particularly revelatory, considering we already have the deaths per 100k and also the ratios of deaths by race, but it proves instructive to show that we can use existing data and reformat it for new expressions for the information as necessary.

## Conclusion

The general conclusion that homicide and suicide are the primary causes of death by firearms.

Whites priarily perish by suicide, where as Blacks perish primarily by homicide.  Black in particular suffer far more deaths per capita by homicide than any other race, 4x the next highest race, Hispanic.

Data such as this can reveal much about where social efforts may be applied to maximum effect.
