# Introduction

This project was a part of the DataQuest Data Scientist specialization course. In this project, we explore a dataset on U.S. Gun Deaths from 2012-2014 and try to find patterns in victim demographics. The dataset was compiled by fivethirtyeight, and is available on their github repo.

## The Data Set

In [7]:
import csv

with open("guns.csv", "r") as f:
    reader = csv.reader(f)
    data = list(reader)
    
print(data[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


## Removing the Header

By printing the first five rows of the data set, we notice that the data is in a list of lists. The first list in the data is the header row. We will remove the header row from the data so that we can analyze it. 

In [8]:
headers = data[:1]
data = data[1:]
print(headers)

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]


In [9]:
print(data[0:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Gun Deaths by Year

The year column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

In [11]:
years = [lst[1] for lst in data]
year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1

In [12]:
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

## Gun Deaths by Month and Year

From the year_counts dictionary, we can see that gun deaths haven't really changed between 2012-2014. Now we'll see if they have changed by month and year. 

In [13]:
import datetime

dates = [datetime.datetime(year=int(lst[1]), month=int(lst[2]), day=1) for lst in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [14]:
date_counts = {}

for date in dates:
    if date not in date_counts:
        date_counts[date] = 1
    else:
        date_counts[date] += 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Exploring Gun Deaths by Race and Sex

In [19]:
sex = [lst[5] for lst in data]

sex_counts = {}

for s in sex:
    if s not in sex_counts:
        sex_counts[s] = 1
    else:
        sex_counts[s] += 1
        
sex_counts

{'M': 86349, 'F': 14449}

In [20]:
race = [lst[7] for lst in data]

race_counts = {}

for r in race:
    if r not in race_counts:
        race_counts[r] = 1
    else:
        race_counts[r] += 1
    
race_counts

{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}

## Findings

- At this point, we can see that gun deaths per month and year are relatively stable - generally differ by +/- 300 deaths month on month, and peak during the summer months. 
- There are almost 6x as many male gun deaths than female.
- There is a significant amount of white, black and hispanic gun deaths. However, we do not know the proportion of each race in the U.S. so we cannot really compare these values. 

## Importing Census Data Set

In [22]:
with open("census.csv", "r") as f:
    reader = csv.reader(f)
    census = list(reader)
    
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Rate of Gun Deaths per 100,000 people

In order to better quanitfy the gun deaths by race, we will transfrom the previous results to a rate per 100,000 people.

In [24]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
    }

In [26]:
race_rate = {}
for key, value in race_counts.items():
    race_rate[key] = (value/mapping[key]) * 100000

race_rate

{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}

## Filter Results by Homicide Intent

In [27]:
intents = [lst[3] for lst in data]

homicide_race_counts = {}

for i, r in enumerate(race):
    if intents[i] == "Homicide":
        if r not in homicide_race_counts:
            homicide_race_counts[r] = 1
        else:
            homicide_race_counts[r] += 1

In [28]:
race_rate = {}

for key, value in homicide_race_counts.items():
    race_rate[key] = (value/mapping[key]) * 100000

race_rate

{'White': 4.6356417981453335,
 'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Native American/Native Alaskan': 8.717729026240365,
 'Hispanic': 12.627161104219914}

## Findings

Now that we have quanitifed the results better, we can see that U.S. gun deaths highly affect the Black and Hispanic races.

Further inveatigation:
 - The link between month and homicide rate.
 - Homicide rate by gender.
 - The rates of other intents by gender and race.
 - Gun death rates by location and education.