# Gun Deaths

This notebook explores deaths caused by firearms in the United States between 2012 and 2014.

We examine the data as compared by race and sex, as well as other criteria.

Charts and diagrams will come in later notebooks with Pandas/NumPy/Matlab.  This notebook is purely for showing the ability to summarize data within Python code.

Throughout this document, we refer to "deaths by gun" simply as deaths to avoid repetition.

## Purpose

This notebook is meant to show a proficiency in Python and a little bit of analysis.  No plots or diagrams are 
generated, as that is not yet part of the curriculum.  Nor are libraries such as `NumPy` or `pandas` used for the same reason.

There are two sources of data used for this notebook:
* `guns.csv` - a collection of statistics about various deaths caused by firearms
* `census.csv` - a collection of statistics from the 2010 census

The structure is as follows:
* Loading and counting the various dimensions of data within `guns.csv` file
* Some analyses based on the data, particularly by date and race/intent

## Loading and Counting

In [1]:
import csv

f = open('guns.csv', 'r')
guns_csv = csv.reader(f)
data = list(guns_csv)

The first order is to find out what data we have available.  The headers of the CSV should provide an overview what we may analyze.

In [2]:
headers = data[0]
data = data[1:]
headers

['',
 'year',
 'month',
 'intent',
 'police',
 'sex',
 'age',
 'race',
 'hispanic',
 'place',
 'education']

Here we see the criteria by which we may analyze the various deaths as headers in the individual records.

Let's first examine which years we have available and if the total deaths increased or decreased during these years.

### By Year
Has the number of deaths increased or decreased year over year?

In [17]:
years = [gun_death[1] for gun_death in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
diffs = [year_counts['2013'] - year_counts['2012'], year_counts['2014'] - year_counts['2013']]
print(year_counts)
print('Difference between 2012 and 2013 deaths: {} ({:.2%})'.format(diffs[0], diffs[0]/year_counts['2012']))
print('Difference between 2013 and 2014 deaths: {} ({:.2%})'.format(diffs[1], diffs[1]/year_counts['2013']))

{'2012': 33563, '2013': 33636, '2014': 33599}
Difference between 2012 and 2013 deaths: 73 (0.22%)
Difference between 2013 and 2014 deaths: -37 (-0.11%)


The years range between 2012 and 2014.  The number of deaths by gun didn't change much per year between 2012 and 2014, .  The difference between each year is less than a single percent.

From here, we will summarize much of the data using straight Python (without pandas).

In [18]:
import datetime
dates = [datetime.datetime(year=int(gun_death[1]), month=int(gun_death[2]), day=1) for gun_death in data]

### Gender
The division of deaths by gender.

In [28]:
gender_counts = {}
genders = [gun_death[5] for gun_death in data]
gender_total = 0
for gender in genders:
    if gender in gender_counts:
        gender_counts[gender] += 1
    else:
        gender_counts[gender] = 1
    gender_total += 1
for gender in gender_counts:
    print('Gender: {} Deaths: {} ({:2%})'.format(gender, gender_counts[gender], gender_counts[gender]/gender_total))

Gender: M Deaths: 86349 (85.665390%)
Gender: F Deaths: 14449 (14.334610%)


The number of deaths are primarily men.  Only 15% of the deaths were women.

### Race
The division of deaths by race.

In [30]:
race_counts = {}
races = [gun_death[7] for gun_death in data]
race_total = 0
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
    race_total += 1
for race in race_counts:
    print('Race: {} Deaths: {} ({:2%})'.format(race, race_counts[race], race_counts[race]/race_total))

Race: Asian/Pacific Islander Deaths: 1326 (1.315502%)
Race: White Deaths: 66237 (65.712613%)
Race: Native American/Native Alaskan Deaths: 917 (0.909740%)
Race: Black Deaths: 23296 (23.111570%)
Race: Hispanic Deaths: 9022 (8.950574%)


The number of deaths for each race are not normalized by population ratios.  We will do that below.

### Intent
The `intent` column specifies the motivation for the death, primarily suicide, homicide, or accidental.

In [33]:
intent_counts = {}
intentions = [gun_death[3] for gun_death in data]
intent_total = 0
for intent in intentions:
    if intent in intent_counts:
        intent_counts[intent] += 1
    else:
        intent_counts[intent] = 1
    intent_total += 1
for intent in intent_counts:
    print('Intent: {} Deaths: {} ({:2%})'.format(intent, intent_counts[intent], intent_counts[intent]/intent_total))

Intent: Suicide Deaths: 63175 (62.674855%)
Intent: Undetermined Deaths: 807 (0.800611%)
Intent: Accidental Deaths: 1639 (1.626024%)
Intent: Homicide Deaths: 35176 (34.897518%)
Intent: NA Deaths: 1 (0.000992%)


We see that suicide is by far the most prevelant, followed by homicide.  Accidental and unaccounted for deaths 
only make ~3% of the deaths.

Unfortunately the data doesn't account for how the detemrinations were made.  It would be a useful exercise to see how the conclusions for intent were determined.  However, it is not likely that such determinations were erroneous enough times to change the percentages significantly.  (There is probably a statistical analysis that can be determined if this is the case, which I hope to learn later in the course.)

## Analysis

For brevity, this analysis does not include all the dimensions we counted in the section above.  We concentrate on the time of year and deaths by race.

### Deaths by Month

Is there a particular time of year when deaths are more prevelant?

Let's examine the average number of deaths per day for each month.  This should give us a good indication and account for the different lengths of each month.

In [40]:
deaths_per_day = {}
for date in dates:
    if date in deaths_per_day:
        deaths_per_day[date] += 1
    else:
        deaths_per_day[date] = 1
days_per_month = [31, 28, 31, 30, 31, 30, 31, 30, 31, 31, 30, 31]
for key in deaths_per_day:
    deaths_per_day[key] = deaths_per_day[key] / days_per_month[key.month - 1]
    print('Month: {} Number of deaths per day: {:.2f}'.format(key.strftime('%b, %Y'), deaths_per_day[key]))

Month: Jan, 2012 Number of deaths per day: 88.97
Month: Feb, 2012 Number of deaths per day: 84.18
Month: Mar, 2012 Number of deaths per day: 88.48
Month: Apr, 2012 Number of deaths per day: 93.17
Month: May, 2012 Number of deaths per day: 96.74
Month: Jun, 2012 Number of deaths per day: 94.20
Month: Jul, 2012 Number of deaths per day: 97.61
Month: Aug, 2012 Number of deaths per day: 98.47
Month: Sep, 2012 Number of deaths per day: 92.00
Month: Oct, 2012 Number of deaths per day: 88.16
Month: Nov, 2012 Number of deaths per day: 90.97
Month: Dec, 2012 Number of deaths per day: 90.03
Month: Jan, 2013 Number of deaths per day: 92.39
Month: Feb, 2013 Number of deaths per day: 84.82
Month: Mar, 2013 Number of deaths per day: 92.32
Month: Apr, 2013 Number of deaths per day: 93.27
Month: May, 2013 Number of deaths per day: 90.52
Month: Jun, 2013 Number of deaths per day: 97.33
Month: Jul, 2013 Number of deaths per day: 99.32
Month: Aug, 2013 Number of deaths per day: 95.30
Month: Sep, 2013 Num

We see from the data that February is consistently has the fewest deaths per day.

The summer months (June, July, and August) have the highest rates.

My preliminary hypothesis is that it is related to weather, or more precisely temperature.  People outside perhaps are more likely to die.

This doesn't really make sense for suicides though, which make a large portion of deaths.  One would think that winter would be more prevelant due to the season making people sad.

There are other associations that could be tested here, but I will not be concentrating on these at this time.

### Deaths by Type per Race

How does each race primarily perish?  Here we measure the percentage of deaths per race by the intent/cause of the death.

In [53]:
intent_by_race_count = {}
intentions_by_race = [(gun_death[3], gun_death[7]) for gun_death in data]
for intent in intentions_by_race:
    if intent in intent_by_race_count:
        intent_by_race_count[intent] += 1
    else:
        intent_by_race_count[intent] = 1
for kv in intent_by_race_count:
    intent_by_race_count[kv] = intent_by_race_count[kv] / race_counts[kv[1]]
# intent_by_race_count
for kv in sorted(intent_by_race_count):
    print('Intent:{:<13}  Race: {:<30}  Percentage: {:.2%}'.format(kv[0], kv[1], intent_by_race_count[kv]))

Intent:Accidental     Race: Asian/Pacific Islander          Percentage: 0.90%
Intent:Accidental     Race: Black                           Percentage: 1.41%
Intent:Accidental     Race: Hispanic                        Percentage: 1.61%
Intent:Accidental     Race: Native American/Native Alaskan  Percentage: 2.40%
Intent:Accidental     Race: White                           Percentage: 1.71%
Intent:Homicide       Race: Asian/Pacific Islander          Percentage: 42.16%
Intent:Homicide       Race: Black                           Percentage: 83.75%
Intent:Homicide       Race: Hispanic                        Percentage: 62.45%
Intent:Homicide       Race: Native American/Native Alaskan  Percentage: 35.55%
Intent:Homicide       Race: White                           Percentage: 13.81%
Intent:NA             Race: White                           Percentage: 0.00%
Intent:Suicide        Race: Asian/Pacific Islander          Percentage: 56.18%
Intent:Suicide        Race: Black                         

Some alarming statistics about suicides:
* Suicide accounts for 84% of deaths for whites
* Suicide also accounts for more than half of deaths for 'Asian/Pacific Islander' and 'Native American/Native Alaskan' races

Just as alarming about homicides:
* Homicide accounts for 84% of deaths for blacks
* Homicide accounts for 62% of deaths for hispanics
* Homicide accounts for 35% of deaths for 'Native American/Native Alaskan'

No group perishes mostly by accident, nor are most deaths undetermined.  Death by suicide and homicide accounts for over 95% for each race.

### Deaths per 100000 per Race

How many deaths are there for each race?  Here we measure the number of deaths per 100k people of each race.

We take the total population by race from the 2010 census and multiply by the number of deaths by intent for each race to compute the number of deaths per 100k for each intent.

This gives us a better understanding of how proportions of deaths.

In [57]:
c = open('census.csv', 'r')
census_csv = csv.reader(c)
census = list(census_csv)
mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
    'Black': int(census[1][12]),
    'Native American/Native Alaskan': int(census[1][13]),
    'Hispanic': int(census[1][11]),
    'White': int(census[1][10])
}
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000
    print('Race: {:<30}  Per capita (100k): {:.2f}'.format(race, race_per_hundredk[race]))

Race: Asian/Pacific Islander          Percentage: 8.37
Race: White                           Percentage: 33.57
Race: Native American/Native Alaskan  Percentage: 24.52
Race: Black                           Percentage: 57.88
Race: Hispanic                        Percentage: 20.22


From this analysis, we see that there are by far more deaths per 100k for Blacks, followed by Whites, Native Americans, Hispanic, and Asian/Pacific Islanders.

The difference between the highest race, Blacks, is about 75% higher than the second highest race, Whites.

It might be worth exploring why there is such a discrepancy between the races.  Is it cultural, social, economic, or just coincidence?

### Deaths per 100,000 by Race and Intent

Here we combine analysis of the number of deaths per 100k by race and the number of deaths by race and intent.

We calculate this by multiplying the number of deaths by race per 100k and the percentage of deaths per race by intent.

In [60]:
c = open('census.csv', 'r')
census_csv = csv.reader(c)
census = list(census_csv)
mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
    'Black': int(census[1][12]),
    'Native American/Native Alaskan': int(census[1][13]),
    'Hispanic': int(census[1][11]),
    'White': int(census[1][10])
}
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000

# use already processed data
intent_race_100k = {}
for intent_race in intent_by_race_count:
    intent = intent_race[0]
    race = intent_race[1]
    intent_race_100k[intent_race] = intent_by_race_count[intent_race] * race_per_hundredk[race]
    print('Intent:{:<13}  Race: {:<30}  Per capita (100k): {:.2f}'.format(intent_race[0], intent_race[1], intent_race_100k[intent_race]))
# intent_race_100k

Intent:Suicide        Race: Asian/Pacific Islander          Per capita (100k): 4.71
Intent:Suicide        Race: White                           Per capita (100k): 28.06
Intent:Suicide        Race: Native American/Native Alaskan  Per capita (100k): 14.84
Intent:Undetermined   Race: White                           Per capita (100k): 0.30
Intent:Accidental     Race: White                           Per capita (100k): 0.57
Intent:Suicide        Race: Black                           Per capita (100k): 8.28
Intent:Homicide       Race: White                           Per capita (100k): 4.64
Intent:Homicide       Race: Asian/Pacific Islander          Per capita (100k): 3.53
Intent:Homicide       Race: Black                           Per capita (100k): 48.47
Intent:Homicide       Race: Native American/Native Alaskan  Per capita (100k): 8.72
Intent:Undetermined   Race: Native American/Native Alaskan  Per capita (100k): 0.37
Intent:Accidental     Race: Black                           Per capita (1

This analysis is a combination of the deaths per 100k and also the ratios of deaths by race.  It proves instructive to show that we can use existing data and reformat it for new expressions for the information as necessary.

## Conclusion

Homicide and suicide are the primary causes of gun deaths.  Accidents only make a very small percentage.

Whites primarily perish by suicide, where as Blacks perish primarily by homicide.  Blacks in particular suffer far more deaths per capita by homicide than any other race, 4x the next highest race, Hispanic.

Data and analysis such as this can reveal much about where to put more resources for studying causes and where subsequent efforts may be applied to maximum effect.
