# Exploring Gun Deaths In The U.S

In this project, we're going to explore gun deaths in the U.S based on dataset that came from [FiveTirthyEight](https://fivethirtyeight.com/) which can be found [here](https://github.com/fivethirtyeight/guns-data).

Each row in the dataset represents a single fatality. The column contain demographic and other information about the victim. Here's an explanation of each column:

* ` `: An identifier column.
* `year`: The year in which the fatality occured.
* `month`: The month in which the fatality occured.
* `intent`: The intent of the perpetrator of the crime. This can be `Suicide`, `Accidental`, `NA`, `Homicide`, `Undetermined`.
* `police`: Wether a police officer was involved with the shooting. Either `0` (false) or `1` (true).
* `sex`: The gender of the victim. Either `M` or `F`.
* `hispanic`: A code indicating the Hispanic origin of the victim.
* `place`: Where the shooting occured.
* `education`: Educational status of the victim. Can be one of the following:
  * `1`: Less than High School
  * `2`: Graduated from High School or equivalent
  * `3`: Some College
  * `4`: At least graduated from College
  * `5`: Not available
  
Based on this dataset, we'll explore the dataset, and try to find patterns in the demographics of the victims. Here's the full steps:

##  The first step is to read the data in and take a look at it. While we're at it, remove headers from a list of lists.

In [1]:
import csv
f = open(r"guns.csv", 'r')
data = list(csv.reader(f))
headers = data[0]
data = data[1:]
headers

['',
 'year',
 'month',
 'intent',
 'police',
 'sex',
 'age',
 'race',
 'hispanic',
 'place',
 'education']

In [2]:
data[:2]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']]

## Counting Gun Deaths By Year

The `year` column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

We can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

In [3]:
year_counts = {}
for row in data:
    if row[1] not in year_counts:
        year_counts[row[1]] = 1
    else:
        year_counts[row[1]] = year_counts[row[1]] + 1
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

## Exploring Gun Deaths By Month And Year

It looks like gun deaths didn't change much by year from `2012` to `2014`. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a [datetime.datetime](https://docs.python.org/3/library/datetime.html#datetime-objects) object using the `year` and `month` columns. We'll then be about to count up gun deaths by date, like we did by year in the last screen.

We can use the `month` and `year` column of data to create a `datetime`. We'll specify a fixed day because we're missing that column in our data.

If we create a `datetime.datetime` object for each row, we can then count up how many gun deaths occurred in each month and year using a similar procedure to what we did in the last screen.

In [4]:
import datetime
dates = []
for row in data:
    date = datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1)
    dates.append(date)
dates[:2]

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0)]

In [5]:
date_counts = {}
for date in dates:
    if date not in date_counts:
        date_counts[date] = 1
    else:
        date_counts[date] = date_counts[date] + 1
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Exploring Gun Deaths By Race And Sex

The `sex` and `race` columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.

In [6]:
def column_counts(col):
    counts = {}
    for row in data:
        if row[col] not in counts:
            counts[row[col]] = 1
        else:
            counts[row[col]] = counts[row[col]] + 1
    return counts

In [7]:
sex_counts = column_counts(5)
sex_counts

{'F': 14449, 'M': 86349}

In [8]:
race_counts = column_counts(7)
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

### Experiment

In [9]:
import math
sex_rate = math.ceil(sex_counts['M'] / sex_counts['F'])
sex_rate

6

In [10]:
max_race_counts = 0
for race in race_counts:
    if race_counts[race] > max_race_counts:
        max_race_counts = race_counts[race]
        max_race = race
max_race

'White'

In [11]:
max_race_counts

66237

In [12]:
total = 0
for race in race_counts:
    total = total + race_counts[race]
max_race_counts_percentage = max_race_counts / total * 100
max_race_counts_percentage

65.71261334550289

### What we've learned so far
As we can see, most victim's genders are mostly male which is 6 times more than female. The victim's race for white people has the highest percentage of other races, which is 65.7%.

### Reading In A Second Dataset
We've just explored gun deaths by race. However, our analysis only gives us the total number of gun deaths by race in the U.S. Unless we know the proportion of each race in the U.S, we won't be able to meaningfully compare those numbers. 

So, we're going to read the `census.csv` which contains information on the total population of the U.S.

In [45]:
f = open("census.csv", 'r')
census = list(csv.reader(f))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Computing Rates Of Gun Deaths Per Race

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we'll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the White racial category is 197318956. We'd divide 66237 by 197318956:

```python
white_gun_death_rate = 66237 / 197318956
```

This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from `2012` to `2014`. If you do this computation, you'll see that the rate is a very small number, `0.0003356849303419181`. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells you the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, we just multiply by 100000:

```python
rate_per_hundredk = 0.0003356849303419181 * 100000
```

This gives us 33.56, which we can interpret as "33.56 out of every 100000 people in the White census race category in the US were killed by guns between 2012 and 2014".

We'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in census and in data. We'll need to manually construct a dictionary that allows us to map between them, and perform the division.

Here's a list of the race name in data, and the corresponding race name in census:

- Asian/Pacific Islander -- Race Alone - Asian plus Race Alone - Native Hawaiian and Other Pacific Islander.
- Black -- Race Alone - Black or African American.
- Hispanic -- Race Alone - Hispanic
- Native American/Native Alaskan -- Race Alone - American Indian and Alaska Native
- White -- Race Alone - White

We'll need to create a dictionary that has each race name from data as a key, and has the population count for the races from census as the values.

In [49]:
mapping = {
    'Asian/Pacific Islander': 15159516 + 674625,
    'Black': 40250635,
    'Native American/Native Alaskan': 3739506,
    'Hispanic': 44618105,
    'White': 197318956
}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

###### Using items method

In [57]:
race_per_hundredk = {}
for key, value in race_counts.items():
    race_per_hundredk[key] = (value / mapping[key]) * 100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

In [60]:
intents = []
races = []
for row in data:
    intents.append(row[3])
    races.append(row[7])

###### Using enumerate method

In [61]:
homicide_race_per_hundredk = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race not in homicide_race_per_hundredk:
            homicide_race_per_hundredk[race] = 1
        else:
            homicide_race_per_hundredk[race] = homicide_race_per_hundredk[race] + 1

homicide_race_per_hundredk

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [63]:
for key, value in homicide_race_per_hundredk.items():
    homicide_race_per_hundredk[key] = (value / mapping[key]) * 100000
homicide_race_per_hundredk

{'Asian/Pacific Islander': 0.022295786244231088,
 'Black': 0.12042365291176386,
 'Hispanic': 0.028300532046844913,
 'Native American/Native Alaskan': 0.23312515145691343,
 'White': 0.002349313969685373}

### Findings
It appears that gun related homicides in the US disproportionately affect people in the _Black_ and _Hispanic_ racial categories.

## Next Steps
* The link between month and homicide rate.
* Homicide rate by gender.
* The rates of other intents by gender and race.
* Gun death rates by location and education.