# Explore Gun Deaths in the US from 2012 to 2014

The dataset came from FiveThirtyEight, and can be found here. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim as following : 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'

In [10]:
# Import libraries and load dataset
import csv
import datetime
data = list(csv.reader(open('guns.csv', 'r')))
#data[0:5]

In [3]:
# get header
headers = data[0]
print(headers)

data = data[1:]
#data[0:5]


['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


In [11]:
# intents column
intents = [row[3] for row in data]
#intent

In [14]:
census = list(csv.reader(open('census.csv', 'r')))
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


## Explore gun deaths and homicide distribution in relation to year, month adn date

#### Calculate how many gun deaths happened in each year

In [20]:
# Calculate how many gun deaths happened in each year

years = [x[1] for x in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
print('Total Gun Deaths by Year')
year_counts

Total Gun Deaths by Year


{'2012': 33563, '2013': 33636, '2014': 33599, 'year': 1}

#### Count up gun deaths by date

In [15]:
# Count up gun deaths by date
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
#dates[0:5]
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
        
date_counts


{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

#### Explore if there is any link between month and homicide rate


In [33]:
# months column
months = [row[2] for row in data]
# months

month_counts = {}
for i, month in enumerate(months):
    if intents[i] == 'Homicide':
        if month in month_counts:
            month_counts[month] += 1
        else:
            month_counts[month] = 1
print('Homicide Count by month for year 2012 to 2014')
month_counts



Homicide Count by month for year 2012 to 2014


{'01': 2829,
 '02': 2178,
 '03': 2780,
 '04': 2845,
 '05': 2976,
 '06': 3130,
 '07': 3269,
 '08': 3125,
 '09': 2966,
 '10': 2968,
 '11': 2919,
 '12': 3191}

From the above result, homicide count is highest for July.  Next highest count occur in June, August and December.  Lowest count occurs in February.  Maybe high temprature and holiday spirit contribute to more homicide.

## Gun deaths and Homicide by gender

In [38]:
# Gun deaths by gender
gender_counts = {}
genders = [row[5] for row in data]
genders[0:5]
for g in genders:
    if g in gender_counts:
        gender_counts[g] += 1
    else:
        gender_counts[g] = 1
 
del gender_counts['sex']
print('Gender Count in Population')
gender_counts

{'F': 14449, 'M': 86349}

In [40]:
# Explore the homicide count by gender
homicide_by_gender = {}
for i, gender in enumerate(genders):
    if intents[i] == 'Homicide':
        if gender in homicide_by_gender:
            homicide_by_gender[gender] += 1
        else:
            homicide_by_gender[gender] = 1
print("Homicide Count by Gender")
homicide_by_gender


Homicide Count by Gender


{'F': 5373, 'M': 29803}

Gun deaths count for male is 6 times of female's.  Similiar ratio for homicide by gender.  Homicide is about one third of total gun deaths.

## Gun Deaths and Homicide by Race

#### Total population by Race

In [15]:
# Total Population by race

mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]), 
    'Black': census[1][12], 
    'Native American/Native Alaskan': census[1][13],
    'Hispanic': census[1][11], 
    'White': census[1][10]
}
print('Total Population by Race')
mapping


Total Population by Race


{'Asian/Pacific Islander': 15834141,
 'Black': '40250635',
 'Hispanic': '44618105',
 'Native American/Native Alaskan': '3739506',
 'White': '197318956'}

#### Gun deaths by race

In [8]:
# Gun deaths by race
race_counts = {}
races = [row[7] for row in data]
#races[0:5]
for r in races:
    if r in race_counts:
        race_counts[r] += 1
    else:
        race_counts[r] = 1
        
del race_counts['race']

print("Total Gun Deaths by Race")
race_counts

Total Gun Deaths by Race


{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

#### Calculate rate of gun deaths per 100000 people by race.

In [17]:
# Calculate rate of gun deaths per 100000 people of each race.
race_per_hundredk = {}
keys = list(race_counts.keys())
#print(type(keys))

for key in keys:
    #print(type(key))
    race_per_hundredk[key] = int(race_counts[key]) / int(mapping[key]) * 10000

print('Gun Deaths per 10k by Race')
race_per_hundredk

Gun Deaths per 10k by Race


{'Asian/Pacific Islander': 0.8374309664161763,
 'Black': 5.78773477735196,
 'Hispanic': 2.022049121091091,
 'Native American/Native Alaskan': 2.452195557381109,
 'White': 3.356849303419181}

#### Calculate total homicide by race

In [28]:
# Filter gun deaths results, and restrict them to the 
# Homicide intent. This will tell us what the gun-related 
# murder rate per 100000 people in each racial category is. 

races = [row[7] for row in data]
#races

homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
            
print('Total Homicide by Race')
homicide_race_counts


Total Homicide by Race


{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

#### Calculate rate of homicide per 100000 people by race.

In [29]:
homicide_per_hundredk = {}
for key in keys:
    homicide_per_hundredk[key] = int(homicide_race_counts[key]) / int(mapping[key]) * 10000

print('Homicide per 10k by Race')
homicide_per_hundredk 

Homicide per 10k by Race


{'Asian/Pacific Islander': 0.3530346230970155,
 'Black': 4.847128498718095,
 'Hispanic': 1.2627161104219913,
 'Native American/Native Alaskan': 0.8717729026240364,
 'White': 0.46356417981453335}

#### Findings:

From the analysis of gun deaths data from 2012 to 2014, it can be concluded that White has the highest total number of gun deaths of 66k, followed by 23k for Black.  While the total number of homicide for black is highest 19510.  Next is 9147 for White people, followed by 5634 for Hispanic.

According to the result of the above ratio analysis, Black has the highest ratio of homicide per 10k population of 4.8.  Next highest ratio is 1.26 for Hispanic race.  Followed by 0.87 for Native American/Native Alaskan people.  The ratio is slightly higher for white than for Asian/Pacific Islander.


## Explore the rates of other intents, like Accidental, by gender and race


In [16]:
accidental_by_race = {}
for i, race in enumerate(races):
    if intents[i] == 'Accidental':
        if race in accidental_by_race:
            accidental_by_race[race] += 1
        else:
            accidental_by_race[race] = 1
print('Gun Deaths by Accidental Count by Race')
accidental_by_race
    

Gun Deaths by Accidental Count by Race


{'Asian/Pacific Islander': 12,
 'Black': 328,
 'Hispanic': 145,
 'Native American/Native Alaskan': 22,
 'White': 1132}

In [18]:
accidental_per_hundredK_race = {}
for key in keys:
    accidental_per_hundredK_race[key] = int(accidental_by_race[key]) / int(mapping[key]) * 10000
print("Gun Deaths by Accidental per 10K Population by Race")
accidental_per_hundredK_race

Gun Deaths by Accidental per 10K Population by Race


{'Asian/Pacific Islander': 0.007578560782046845,
 'Black': 0.08148939762068351,
 'Hispanic': 0.03249801846133985,
 'Native American/Native Alaskan': 0.05883130017708221,
 'White': 0.057369044664923124}

Blake group has the highest ratio of gun deaths by accidental per population.

## Find out if gun death rates correlate to location


In [5]:
places = [row[9] for row in data[1:]]
#places
place_count = {}
for p in places:
    if p in place_count:
        place_count[p] += 1
    else:
        place_count[p] = 1
print('Gun Deaths Count by Locations')
place_count

Gun Deaths Count by Locations


{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

Highest number of gun deaths occured at home.  

## Find out if gun death rates correlate to education

In [6]:
educations = [row[10] for row in data[1:]]
education_count = {}
for e in educations:
    if e in education_count:
        education_count[e] += 1
    else:
        education_count[e] = 1
print('Gun Deaths Count by Education Level')
education_count

Gun Deaths Count by Education Level


{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

The group of education level 2 has the highest number of gun deaths.  Level 2 is for "Graduated from High School or equivalent".