# Exploring Gun Deaths in the US

The dataset contains information on gun deaths in the US from `2012` to `2014`. Each row in the dataset represents a single fatality. Each column represents:
+ `[0] `: an identifier column, which contains the row number
+ `[1] year`: the year in which the fatality occurred
+ `[2] month`: the month in which the fatality occured
+ `[3] intent`: the intent of the perpetrator of the crime. This can be `Suicide`, `Accidental`, `NA`, `Homicide`, or `Undetermined`
+ `[4] police`: whether a police officer was involved with the shooting. Either `0` (false) or `1` (true)
+ `[5] sex`: the gender of the victim. Either `M` or `F`
+ `[6] age`: the age of the victim
+ `[7] race`: the race of the victim. Either `Asian/Pacifi Islander`, `Native American/Native Alaskan`, `Black`, `Hispanic`, or `White`
+ `[8] place`: where the shooting occured. Has several categories
+ `[9] education`: educational status of the victim. Can be one of the following:
  - `1` -- Less than High School
  - `2` -- Graduated from High School or equivalent
  - `3` -- Some College
  - `4` -- At least graduated from College
  - `5` -- Not avaiable

In [1]:
# read the dataset
import csv
f = open("guns.csv", "r")
data = list(csv.reader(f))
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

In [2]:
# extracting headers
headers = data[:1]
data = data[1:]
headers

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education']]

In [3]:
data[:5]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  '2']]

In [4]:
# calculating how many gun deaths happened in each year
years = [each[1] for each in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

In [5]:
# calculating gun deaths in the US change by month and year
import datetime
dates = [datetime.datetime(year = int(each[1]), month = int(each[2]), day = 1) for each in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [6]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

In [7]:
# counting gun deaths in the US vary by gender and race
gender = [each[5] for each in data]
sex_counts = {}
for sex in gender:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
sex_counts

{'F': 14449, 'M': 86349}

In [8]:
races = [each[7] for each in data]
race_counts = {}
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

# Rate of Gun Deaths per 100k people of each race

At first look, seems that more white people were dead against other races.

Our first analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers.

We'll need to read data about what percentage of the US population falls into each racial category. 

In [9]:
f = open("census.csv", "r")
census = list(csv.reader(f))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [10]:
# mapping racial categories from race_counts to census dictionaries
# census:
# [0] Id
# [1] Year
# [2] Id.1
# [3] Sex
# [4] Id.2
# [5] Hispanic Origin
# [6] Id.3
# [7] Id2
# [8] Geography
# [9] Total
# [10] Race Alone - White
# [11] Race Alone - Hispanic
# [12] Race Alone - Black or African American
# [13] Race Alone - American Indian and Alaska Native
# [14] Race Alone - Asian
# [15] Race Alone - Native Hawaiian and Other Pacific Islander
# [16] Two or More Races
mapping = {
    "Asian/Pacific Islander": int(census[1][14]) + int(census[1][15]),
    "Black": int(census[1][12]),
    "Native American/Native Alaskan": int(census[1][13]),
    "Hispanic": int(census[1][11]),
    "White": int(census[1][10])
}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [11]:
race_per_hundredk = {}
for race_key, race_value in race_counts.items():
    rate_value = race_value / mapping[race_key] * 100000
    race_per_hundredk[race_key] = rate_value
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

In [12]:
# restricting analysis to the Homicide intent
intents = [each[3] for each in data]
intents

['Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Accidental',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Homicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Homicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Accidental',
 'Homicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Accidental',
 'Suicide',
 'Accidental',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homi

In [13]:
races = [each[7] for each in data]
races

['Asian/Pacific Islander',
 'White',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Black',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Black',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Native American/Native Alaskan',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'Native American/Native Alaskan',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'White',
 'Native Ame

In [14]:
homicide_race_counts = {}
for i,race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [15]:
homicide_per_hundredk = {}
for race_key, race_value in homicide_race_counts.items():
    rate_value = race_value / mapping[race_key] * 100000
    homicide_per_hundredk[race_key] = rate_value
homicide_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

### Analyzing the link between month and homicide rate

In [18]:
for fatality in data:
    fatality_datetime = datetime.datetime(year = int(fatality[1]), month = int(fatality[2]), day = 1)
    fatality[10] = fatality_datetime
data[:5]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  datetime.datetime(2012, 1, 1, 0, 0)],
 ['2',
  '2012',
  '01',
  'Suicide',
  '0',
  'F',
  '21',
  'White',
  '100',
  'Street',
  datetime.datetime(2012, 1, 1, 0, 0)],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  datetime.datetime(2012, 1, 1, 0, 0)],
 ['4',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '64',
  'White',
  '100',
  'Home',
  datetime.datetime(2012, 2, 1, 0, 0)],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  datetime.datetime(2012, 2, 1, 0, 0)]]

In [20]:
months = [each[10].month for each in data]
months

[1,
 1,
 1,
 2,
 2,
 2,
 2,
 3,
 2,
 2,
 2,
 1,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 4,
 5,
 4,
 4,
 5,
 5,
 5,
 5,
 5,
 6,
 6,
 6,
 6,
 6,
 6,
 7,
 6,
 6,
 7,
 7,
 6,
 6,
 7,
 7,
 7,
 7,
 7,
 7,
 7,
 8,
 8,
 7,
 7,
 8,
 8,
 8,
 8,
 8,
 8,
 8,
 8,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 10,
 9,
 9,
 9,
 9,
 9,
 10,
 10,
 9,
 10,
 10,
 9,
 10,
 10,
 10,
 10,
 10,
 9,
 11,
 10,
 11,
 10,
 11,
 11,
 11,
 11,
 11,
 11,
 11,
 12,
 12,
 11,
 12,
 12,
 11,
 11,
 12,
 12,
 12,
 12,
 10,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 12,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 2,
 1,
 1,
 2,
 1,
 2,
 2,
 2,
 1,
 1,
 1,
 1,
 2,
 2,
 2,
 2,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 2,
 2,
 2,
 2,
 2,
 2,
 2,
 1,
 2,
 2,
 2,
 2,
 2,
 2,
 2,
 2,
 2,
 2,
 1,
 2,
 2,
 2,
 1,
 2,
 1,
 3,
 1,
 2,
 2,
 2,
 3,
 2,
 3,
 3,
 2,
 2,


In [21]:
homicide_monthly_counts = {}
for i,month in enumerate(months):
    if intents[i] == "Homicide":
        if month in homicide_monthly_counts:
            homicide_monthly_counts[month] += 1
        else:
            homicide_monthly_counts[month] = 1
homicide_monthly_counts

{1: 2829,
 2: 2178,
 3: 2780,
 4: 2845,
 5: 2976,
 6: 3130,
 7: 3269,
 8: 3125,
 9: 2966,
 10: 2968,
 11: 2919,
 12: 3191}

### Analyzing homicide rate by gender

In [22]:
gender

['M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'F',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'F',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'F',
 'F',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'F',
 'M',
 'M',
 'M',
 'M',
 'M',
 'M',
 'F'

In [23]:
homicide_sex_counts = {}
for i,sex in enumerate(gender):
    if intents[i] == "Homicide":
        if sex in homicide_sex_counts:
            homicide_sex_counts[sex] += 1
        else:
            homicide_sex_counts[sex] = 1
homicide_sex_counts

{'F': 5373, 'M': 29803}

### Analyzing Accidental intent by gender and race

In [24]:
accident_sex_counts = {}
for i,sex in enumerate(gender):
    if intents[i] == "Accidental":
        if sex in accident_sex_counts:
            accident_sex_counts[sex] += 1
        else:
            accident_sex_counts[sex] = 1
accident_sex_counts

{'F': 218, 'M': 1421}

In [25]:
accident_race_counts = {}
for i,race in enumerate(races):
    if intents[i] == "Accidental":
        if race in accident_race_counts:
            accident_race_counts[race] += 1
        else:
            accident_race_counts[race] = 1
accident_race_counts

{'Asian/Pacific Islander': 12,
 'Black': 328,
 'Hispanic': 145,
 'Native American/Native Alaskan': 22,
 'White': 1132}

In [26]:
accident_per_hundredk = {}
for race_key, race_value in accident_race_counts.items():
    rate_value = race_value / mapping[race_key] * 100000
    accident_per_hundredk[race_key] = rate_value
accident_per_hundredk

{'Asian/Pacific Islander': 0.07578560782046845,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985,
 'Native American/Native Alaskan': 0.5883130017708221,
 'White': 0.5736904466492313}

### Gun death rates correlate to location and education

In [28]:
locations = [each[9] for each in data]
locations

['Home',
 'Street',
 'Other specified',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Other specified',
 'Other specified',
 'Other specified',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Street',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Home',
 'Home',
 'NA',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Other specified',
 'Other specified',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Home',
 'Home',
 'Other specified',
 'Home',
 'Home',
 'Other specified',
 'NA',
 'Other specified',
 'Other specified',
 'Home',
 'Other 

In [29]:
location_counts = {}
for location in locations:
    if location in location_counts:
        location_counts[location] += 1
    else:
        location_counts[location] = 1
location_counts

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

In [33]:
education_dict = {
    "1": "Less than High School",
    "2": "Graduated from High School or equivalent",
    "3": "Some College",
    "4": "At least graduated from College",
    "5": "Not avaiable"
}
educations = [each[8] for each in data]
educations

['100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '998',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '281',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '211',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '998',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
 '100',
