# Gun Deaths in USA from 2012 to 2014

From the dataquest guided project in the Intermediate Python developer course. 
This project made use of a guns.csv file (data is from FiveThirtyEight) and a census information file (census.csv)

---

**' '** -- this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.

**year** -- the year in which the fatality occurred.

**month** -- the month in which the fatality occurred.

**intent** -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.

**police** -- whether a police officer was involved with the
shooting. Either 0 (false) or 1 (true).

**sex** -- the gender of the victim. Either M or F.

**age** -- the age of the victim.

**race** -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.

**hispanic** -- a code indicating the Hispanic origin of the victim.

**place** -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.

**education** -- educational status of the victim. Can be one of the following:

    1 -- Less than High School

    2 -- Graduated from High School or equivalent

    3 -- Some College

    4 -- At least graduated from College

    5 -- Not available

In [91]:
import csv

f= open("guns.csv",'r')
csvreader = csv.reader(f)
data = list(csvreader)

print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [92]:
headers = data[0]
data = data[1::]

In [93]:
years = [row[1] for row in data]

In [94]:
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] = year_counts[year] + 1
    else:
        year_counts[year] = 1
print(year_counts) # Displays the total number of gun deaths per year

{'2012': 33563, '2013': 33636, '2014': 33599}


---
♣ Gun related deaths seem to be consistent year on year. (About 33'500)

---

In [95]:
import datetime
dates = [datetime.datetime(year = int(row[1]),month = int(row[2]),day = 1) for row in data]

In [96]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] = date_counts[date] + 1
    else:
        date_counts[date] = 1

date_counts # Displays the total number of gun deaths per month, per year

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

---
♣ Gun related deaths per month is fairly consistent as well. No months stand out as having a huge variance

---


In [97]:
# Function to count how many times each item appears in a column
def count_column(input_lst,column):
    col_counts = {}
    for row in input_lst:
        if row[column] in col_counts:
            col_counts[row[column]] = col_counts[row[column]] + 1
        else:
            col_counts[row[column]] = 1
    return col_counts       

In [98]:
sex_counts = count_column(data,5)
race_counts = count_column(data,7)

In [99]:
sex_counts

{'F': 14449, 'M': 86349}

In [100]:
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

---
♣ Gun deaths in the US, seem to occur consitently year on year,  month on month, with little variation.
Men are more likely to die from a gun related death than women.
White people are more likely to die from a gun related death than other races.

The next things that should be investigated include:
- what percentage of each race is male/female
- what is the biggest intent per each race/gender combo
- what is the correlation between age, race, gender and intent
- what is the correlation between education, race, gender and intent
- which places have the most gun related deaths
- how many gun deaths were police officers involved in

---

♣ Need a rate of gun deaths per 100'000 people of each race to compare data meaningfully. This is done by importing census data from the census.csv file and dividing the values from our gun dataset with the ppopulation numbers from the census dataset and multiplying the result by 100'000

---

In [101]:
f = open("census.csv","r")
csvreader = csv.reader(f)
census = list(csvreader)
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


---
♣ The census data needs to be mapped to contain the same races as in our dataset

---

In [102]:
mapping = {
    "Asian/Pacific Islander": int(census[1][14]) + int(census[1][15]),
    "Black":int(census[1][12]),
    "Native American/Native Alaskan":int(census[1][13]),
    "Hispanic":int(census[1][11]),
    "White":int(census[1][10])    
}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [103]:
race_per_hundredk = {}
for key, value in race_counts.items():
    race_per_hundredk[key] = value/mapping[key]*100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

---
♣ This data shows that more black people die from guns per 100'000 than other races in the US

---

In [104]:
intents = [row[3] for row in data]
races = [row[7] for row in data]

In [105]:
homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] = homicide_race_counts[race] + 1
        else:
            homicide_race_counts[race] = 1

homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [106]:
homicide_race_per_hundredk = {}
for key, value in homicide_race_counts.items():
    homicide_race_per_hundredk[key] = value/mapping[key]*100000
homicide_race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

---
♣ Black people are more likely to die from homicde related gun deaths in the US, than other races (48 per 100'000)

---

In [107]:
suicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Suicide':
        if race in suicide_race_counts:
            suicide_race_counts[race] = suicide_race_counts[race] + 1
        else:
            suicide_race_counts[race] = 1

suicide_race_per_hundredk = {}
for key, value in suicide_race_counts.items():
    suicide_race_per_hundredk[key] = value/mapping[key]*100000
suicide_race_per_hundredk

{'Asian/Pacific Islander': 4.705023152187416,
 'Black': 8.278130270491385,
 'Hispanic': 7.106980451097149,
 'Native American/Native Alaskan': 14.841532544673013,
 'White': 28.06217969245692}

---
♣ White people are more likely to commit suicide using a gun than other races in the US.

---

In [108]:
accidental_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Accidental':
        if race in accidental_race_counts:
            accidental_race_counts[race] = accidental_race_counts[race] + 1
        else:
            accidental_race_counts[race] = 1

accidental_race_per_hundredk = {}
for key, value in accidental_race_counts.items():
    accidental_race_per_hundredk[key] = value/mapping[key]*100000
accidental_race_per_hundredk

{'Asian/Pacific Islander': 0.07578560782046845,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985,
 'Native American/Native Alaskan': 0.5883130017708221,
 'White': 0.5736904466492313}

---
♣ Accidental deaths from guns are relatively low across all races, when compared to suicide and homicide

---

In [109]:
intents_counts = count_column(data,3)
intents_counts

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

In [110]:
suicide_percentage = intents_counts['Suicide']/len(data)*100
homicide_percentage = intents_counts['Homicide']/len(data)*100
print('Suicide: ',round(suicide_percentage,2),'%')
print('Homicide: ',round(homicide_percentage,2),'%')

Suicide:  62.67 %
Homicide:  34.9 %


---
♣ Most deaths by guns are as a result of suicide (62.67%)

---

In [111]:
police_count = count_column(data,4)
percentage_involving_police = (police_count['1']/(police_count['1'] + police_count['0']))*100
percentage_involving_police

1.390900613107403

---
♣ Police are only involved in 1.4% of gun related deaths in the US

---

In [112]:
for row in data:
    if row[6] != 'NA':
        row[6] = int(row[6])
age_count = count_column(data,6)
age_count

{0: 33,
 1: 38,
 2: 50,
 3: 66,
 4: 54,
 5: 43,
 6: 50,
 7: 43,
 8: 32,
 9: 47,
 10: 53,
 11: 61,
 12: 117,
 13: 229,
 14: 364,
 15: 561,
 16: 864,
 17: 1185,
 18: 1753,
 19: 2065,
 20: 2219,
 21: 2504,
 22: 2712,
 23: 2472,
 24: 2437,
 25: 2230,
 26: 2231,
 27: 2070,
 28: 1986,
 29: 1955,
 30: 1869,
 31: 1833,
 32: 1824,
 33: 1700,
 34: 1699,
 35: 1631,
 36: 1512,
 37: 1500,
 38: 1491,
 39: 1389,
 40: 1414,
 41: 1485,
 42: 1492,
 43: 1527,
 44: 1449,
 45: 1372,
 46: 1437,
 47: 1532,
 48: 1621,
 49: 1669,
 50: 1674,
 51: 1755,
 52: 1715,
 53: 1708,
 54: 1684,
 55: 1596,
 56: 1625,
 57: 1472,
 58: 1510,
 59: 1430,
 60: 1361,
 61: 1306,
 62: 1099,
 63: 1041,
 64: 1126,
 65: 1039,
 66: 998,
 67: 865,
 68: 868,
 69: 879,
 70: 883,
 71: 791,
 72: 736,
 73: 737,
 74: 671,
 75: 676,
 76: 582,
 77: 575,
 78: 598,
 79: 573,
 80: 546,
 81: 563,
 82: 520,
 83: 552,
 84: 490,
 85: 440,
 86: 407,
 87: 312,
 88: 340,
 89: 245,
 90: 208,
 91: 176,
 92: 128,
 93: 103,
 94: 64,
 95: 55,
 96: 40,
 97: 2

In [113]:
children_deaths = 0
for row in data:
    if row[6] != 'NA':
        if row[6] < 19:
            children_deaths += 1
children_deaths_percentage = children_deaths/len(data)*100
children_deaths_percentage
    

5.598325363598484

---
♣ 5.6% of gun related deaths occurred amoung children in the US (People under the age of 18 years)

---

In [114]:
place_counts = count_column(data,9)
place_counts

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

In [115]:
home_percentage = place_counts['Home']/len(data)*100
home_percentage

60.00714299886902

In [116]:
places = [row[9] for row in data]
suicide_places = {}
for i, place in enumerate(places):
    if intents[i] == 'Suicide':
        if place in suicide_places:
            suicide_places[place] = suicide_places[place] + 1
        else:
            suicide_places[place] = 1
suicide_places
            
        

{'Farm': 378,
 'Home': 45415,
 'Industrial/construction': 155,
 'Other specified': 8031,
 'Other unspecified': 4774,
 'Residential institution': 125,
 'School/instiution': 295,
 'Sports': 98,
 'Street': 2181,
 'Trade/service area': 1723}

In [117]:
suicide_at_home_percentage = suicide_places['Home']/place_counts['Home']*100
suicide_at_home_percentage

75.08349039447145

In [118]:
homicide_places = {}
for i, place in enumerate(places):
    if intents[i] == 'Homicide':
        if place in homicide_places:
            homicide_places[place] = homicide_places[place] + 1
        else:
            homicide_places[place] = 1
homicide_places

{'Farm': 76,
 'Home': 13613,
 'Industrial/construction': 85,
 'NA': 1383,
 'Other specified': 5429,
 'Other unspecified': 3604,
 'Residential institution': 73,
 'School/instiution': 361,
 'Sports': 22,
 'Street': 8867,
 'Trade/service area': 1663}

In [119]:
homicide_at_home_percentage = homicide_places['Home']/place_counts['Home']*100
homicide_at_home_percentage

22.506034454253875

---
♣ 60% of gun related deaths occur in the home, 75% of these are suicides, 22.5% are homicides

---

In [120]:
education = [row[10] for row in data]
education_counts = count_column(data,10)
education_counts

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

In [121]:
education_of_suicides = {}
for i, edu in enumerate(education):
    if intents[i] == 'Suicide':
        if edu in education_of_suicides:
            education_of_suicides[edu] = education_of_suicides[edu] + 1
        else:
            education_of_suicides[edu] = 1
education_of_suicides


{'1': 9292, '2': 26321, '3': 15533, '4': 11147, '5': 876, 'NA': 6}

In [122]:
education_of_homicides = {}
for i, edu in enumerate(education):
    if intents[i] == 'Homicide':
        if edu in education_of_homicides:
            education_of_homicides[edu] = education_of_homicides[edu] + 1
        else:
            education_of_homicides[edu] = 1
education_of_homicides

{'1': 11839, '2': 15649, '3': 5640, '4': 1559, '5': 455, 'NA': 34}

---

♣ People who didn't go to college are more likely to die as a result of homicide than those who did

---

In [123]:
new_data = data
for row in new_data:
    row.append(row[7] + ' ' + row[5])


In [124]:
race_gen = [row[11] for row in new_data]
race_gen_counts = count_column(new_data,11)
race_gen_counts

{'Asian/Pacific Islander F': 244,
 'Asian/Pacific Islander M': 1082,
 'Black F': 2318,
 'Black M': 20978,
 'Hispanic F': 1073,
 'Hispanic M': 7949,
 'Native American/Native Alaskan F': 127,
 'Native American/Native Alaskan M': 790,
 'White F': 10687,
 'White M': 55550}

In [125]:
suicide_race_gen_counts = {}
for i, race in enumerate(race_gen):
    if intents[i] == 'Suicide':
        if race in suicide_race_gen_counts:
            suicide_race_gen_counts[race] = suicide_race_gen_counts[race] + 1
        else:
            suicide_race_gen_counts[race] = 1

suicide_race_gen_counts

{'Asian/Pacific Islander F': 99,
 'Asian/Pacific Islander M': 646,
 'Black F': 364,
 'Black M': 2968,
 'Hispanic F': 347,
 'Hispanic M': 2824,
 'Native American/Native Alaskan F': 62,
 'Native American/Native Alaskan M': 493,
 'White F': 7817,
 'White M': 47555}

In [126]:
homicide_race_gen_counts = {}
for i, race in enumerate(race_gen):
    if intents[i] == 'Homicide':
        if race in homicide_race_gen_counts:
            homicide_race_gen_counts[race] = homicide_race_gen_counts[race] + 1
        else:
            homicide_race_gen_counts[race] = 1

homicide_race_gen_counts

{'Asian/Pacific Islander F': 140,
 'Asian/Pacific Islander M': 419,
 'Black F': 1898,
 'Black M': 17612,
 'Hispanic F': 695,
 'Hispanic M': 4939,
 'Native American/Native Alaskan F': 60,
 'Native American/Native Alaskan M': 266,
 'White F': 2580,
 'White M': 6567}

In [127]:
accidental_race_gen_counts = {}
for i, race in enumerate(race_gen):
    if intents[i] == 'Accidental':
        if race in accidental_race_gen_counts:
            accidental_race_gen_counts[race] = accidental_race_gen_counts[race] + 1
        else:
            accidental_race_gen_counts[race] = 1

accidental_race_gen_counts

{'Asian/Pacific Islander F': 3,
 'Asian/Pacific Islander M': 9,
 'Black F': 34,
 'Black M': 294,
 'Hispanic F': 15,
 'Hispanic M': 130,
 'Native American/Native Alaskan F': 2,
 'Native American/Native Alaskan M': 20,
 'White F': 164,
 'White M': 968}

---
♣ More Asian/Pacific Islander females die as a result of homicide by gun, rather than suicide by gun, whereas the men in this race group are more likely to die as a result of suicide by gun

Native American/Native Alaskan Men are more likey to die as a result of suicide by gun rather than by homicide by gun

More white men and women die from gun related suicides than homicides, but white men tend to take their life more than white women

Both black men and black women have a higher rate of homicide related deaths by gun than suicide by gun.

---