# Gun deaths

This project analyzes gun deaths in the US from 2012 to 2014 based on the dataset provided by FiveThirtyEight.

The dataset contains the following columns:

- `` -- this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.
- `year` -- the year in which the fatality occurred.
- `month` -- the month in which the fatality occurred.
- `intent` -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.
- `police` -- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).
- `sex` -- the gender of the victim. Either M or F.
- `age` -- the age of the victim.
- `race` -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.
- `hispanic` -- a code indicating the Hispanic origin of the victim.
- `place` -- where the shooting occurred
- `education` -- educational status of the victim. Can be one of the following:
 - 1 -- Less than High School
 - 2 -- Graduated from High School or equivalent
 - 3 -- Some College
 - 4 -- At least graduated from College
 - 5 -- Not available

In [1]:
import csv
f = open("guns.csv", "r")
data = list(csv.reader(f))
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

Because the data contains a header, this header needs to be removed before we perform computations.

In [2]:
headers = data[0]
data = data[1:]
print(headers)
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


In [3]:
years = [row[1] for row in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

In [6]:
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day = 1) for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [7]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Gun deaths by sex and race

Now, let's count how many gun deaths there were for each sex and each race.

In [8]:
sex_col = [row[5] for row in data]

sex_counts = {}
for sex in sex_col:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
sex_counts

{'F': 14449, 'M': 86349}

In [9]:
race_col = [row[7] for row in data]

race_counts = {}
for race in race_col:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

The above results show that far more males are killed by guns than females. In terms of race, whites are killed by guns the most, followed by blacks and hispanics. Asian/Pacific Islander and Native American/Native Alaskan have far fewer gun deaths than the other races but also make up much smaller percentages of the overall population. It would be interesting to examine these numbers within the context of the overall percentage of the population of these races.

## Gun deaths by race per 100,000 people 

To perform this analysis, we need to import some census data.

In [10]:
f2 = open("census.csv", "r")
census = list(csv.reader(f2))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In order to bring the data from both of these sources together, we need to first create a map that connects each race from `data` to the corresponding total population in `census`.

In [12]:
mapping = {
    'Asian/Pacific Islander': 15159516 + 674625,
    'Black': 40250635,
    'Native American/Native Alaskan': 3739506,
    'Hispanic': 44618105,
    'White': 197318956
}

In [13]:
race_per_hundredk = {}
for race in race_counts:
    count = race_counts[race]
    pop = mapping[race]
    count_per = (count / pop) * 100000
    race_per_hundredk[race] = count_per
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Gun homicide rate by race

Now that we have the rate of gun deaths per 100,000 people for each race, let's determine the gun-related murder rate per 100,000 people in each racial category. We can do this by counting only row were the `intent` is `Homicide`.

First, we extract the `intent` column. Then, we use the enumerate() function to loop through each index and value in the race column. If the value at the same position in the `intents` extracted list is `Homicide`, then we increment the count. 

In [14]:
intents = [row[3] for row in data]
races = [row[7] for row in data]

homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

Finally, we use the `mapping` dictionary from above to convert these raw counts into homicide rates per 100,000 people.

In [15]:
homicide_race_per_hundredk = {}
for race in homicide_race_counts:
    count = homicide_race_counts[race]
    pop = mapping[race]
    count_per = (count / pop) * 100000
    homicide_race_per_hundredk[race] = count_per
homicide_race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

Although whites have a relatively high gun-related death rate (the second highest rate at 33 per 100,000 people), the gun homicide rate for whites is very low at only 4.6 per 100,000. This seems to indicate that most whites who die in gun-related incidents are likely suicides.

In contrast, blacks have the highest gun-related death rate (57.9 per 100,000 people) and also have the highest gun homicide rate by far (48.5 per 100,000 people). These numbers indicate that the majority of blacks who are killed in gun-related incidents were murdered.

Similarly, more than half of Hispanics who are killed in gun-related incidents (20.2 per 100,000 people) were murdered (12.6 per 100,000 people). 

Potential next steps:
- determine whether there is a link between month and homicide rate
- explore the homicide rate by gender
- explore the rates of other intents, like `