# Exploring Gun Deaths in the US

The dataset came from FiveThirtyEight, and can be found at https://github.com/fivethirtyeight/guns-data. The dataset is stored in the guns.csv file and contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. 

In [27]:
## Open and read the dataset file "guns.csv"
import csv
f = open("guns.csv","r")
gunsdata = list(csv.reader(f))
print(gunsdata[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [3]:
## Extract header form data
headers = gunsdata[0]
print(headers)
gunsdata = gunsdata[1:]
print(gunsdata[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


In [4]:
## Retrieve years and year counts
years = [year[1] for year in gunsdata]
year_counts = {}

for row in years:
    if row in year_counts:
        year_counts[row] += 1
    else:
        year_counts[row] = 1
        
print(year_counts)

{'2014': 33599, '2012': 33563, '2013': 33636}


### Count Gun Deaths Based on Month-Year

In [5]:
## Create datetime list from data so that we can aggregate by month-year
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]) ,day=1) for row in gunsdata]      
print(dates[:5])


[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


In [6]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

from pprint import pprint
pprint(date_counts)

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

### Exploring Gun Deaths Based on Gender and Race

In [7]:
# Count by sex
sexes = [row[5] for row in gunsdata]
sex_counts = {}

for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
        
pprint(sex_counts)

{'F': 14449, 'M': 86349}


In [8]:
# Count by race
races = [row[7] for row in gunsdata]
race_counts = {}

for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
        
pprint(race_counts)

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}


### What Have Been Discovered so Far
1. There is no strong seasonality pattern shown by the data, although the number of victims increase a little bit during summer months.
2. Gender wise, males are almost 6 times more likely to be victims compared to the females.
3. White have the highest victim numbers compared to other race group in this data. However, it does not give the complete picture as we are lacking percentage of victims to the total population of each race. More data exploration and analysis are required.

### Import Census Data

In [9]:
cd = open("census.csv","r")
cencus = list(csv.reader(cd))
pprint(cencus)

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]


In [12]:
mapping = {}
mapping['Asian/Pacific Islander'] = (int(cencus[1][14]) + int(cencus[1][15]))
mapping['Black'] = int(cencus[1][12])
mapping['Native American/Native Alaskan'] = int(cencus[1][13])
mapping['Hispanic'] = int(cencus[1][11])
mapping['White'] = int(cencus[1][10])
pprint(mapping)

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}


In [23]:
race_per_hundredk = {}
for k,v in race_counts.items():
    result = 100000*(v/mapping[k])
    race_per_hundredk[k] = result
pprint(race_per_hundredk)

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}


With the new data, one can see that African American has the highest gun deaths rate per 100,000. On the opposite conundrum, Asian and Pacific Islanders have the lowest rate.

### Restricting to the Homicide Intent

In [24]:
intents = [row[3] for row in gunsdata]
races = [row[7] for row in gunsdata]
homicide_race_counts = {}

for i,race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
        
pprint(homicide_race_counts)

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}


In [25]:
homicide_race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    result = 100000*(v/mapping[k])
    homicide_race_per_hundredk[k] = result
pprint(homicide_race_per_hundredk)

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}


## Findings

The last iteration shows that the homicide gun deaths disproportionally affect Black and Hispanic race categories.

For further analysis, here are some potential next steps:
1. Figure out the link, if any, between month and homicide rate.
2. Explore the homicide rate by gender.
3. Explore the rates of other intents, like Accidental, by gender and race.
4. Find out if gun death rates correlate to location and education.