# Guided Project 03: Exploring Gun Deaths in the US

## Table of Contents

* [Core Content](#core-content)
* [Lessons Learnt So Far](#lessons-learnt-so-far)
* [Findings](#findings)
* [Next Steps](#next-steps)
* [Project Summary](#project-summary)

## Core Content <a class="anchor" id="core-content"></a>

### Read Data

In [1]:
# import libraries
import csv

In [2]:
# read dataset via the csv module
f = open('guns.csv', 'r')
data = csv.reader(f)
data = list(data)
for row in data[:5]:
    print(row)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']
['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']
['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']
['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']


### Seperate Header from Data

In [3]:
# seperate header from data
headers = data[0]
print('Headers:', headers)

Headers: ['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


In [4]:
# remove first row from data
data = data[1:]
print('Data:')
for row in data[:5]:
    print(row)

Data:
['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']
['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']
['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']
['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']
['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']


### Calculate gun deaths per year

In [5]:
# extract year column from data
years = [row[1] for row in data]
print(years[:10])

['2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012']


In [6]:
# calculate
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
print('Gun Deaths Per Year:')
year_counts

Gun Deaths Per Year:


{'2012': 33563, '2013': 33636, '2014': 33599}

### Calculate gun deaths by Month and Year

In [7]:
# create datetime.datetime object for each row
import datetime

# dates = ["{} {}".format(row[1], row[2]) for row in data] # test format
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) 
         for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [8]:
# calculate
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

print('Gun Deaths Per Month:')
date_counts

Gun Deaths Per Month:


{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

In [9]:
# Extension - get max deaths and corresponding month per year
max_deaths_per_year = {}
for date, deaths in date_counts.items():
    year = date.year
    #print(year)
    if year in max_deaths_per_year:
        # update if deaths is more
        #print(deaths, type(deaths))
        #print(max_deaths_per_year[year])
        if deaths > max_deaths_per_year[year]['deaths']:
            max_deaths_per_year[year] = {'month': date.month, 'deaths': deaths}
    else:
        max_deaths_per_year[year] = {'month': date.month, 'deaths': deaths}

max_deaths_per_year

{2012: {'deaths': 3026, 'month': 7},
 2013: {'deaths': 3079, 'month': 7},
 2014: {'deaths': 2970, 'month': 8}}

### Calculate gun deaths by Sex and Race

In [10]:
# calculate deaths by sex
sex_count = {}
for row in data:
    sex = row[5]
    if sex in sex_count:
        sex_count[sex] += 1
    else:
        sex_count[sex] = 1
sex_count

{'F': 14449, 'M': 86349}

In [11]:
# calculate deaths by race
race_counts = {}
for row in data:
    race = row[7]
    #print(race)
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

In [12]:
# extension - calculate % diff between male and female deaths
female_deaths = sex_count['F']
male_deaths = sex_count['M']
diff = male_deaths - female_deaths
print('There were {} more males killed than females by guns'.format(diff))
diff_percent = (male_deaths - female_deaths)/female_deaths * 100
print('There were {:.2f}% more male deaths than female deaths'.format(diff_percent))

There were 71900 more males killed than females by guns
There were 497.61% more male deaths than female deaths


---

## Lessons Learnt So Far <a class="anchor" id="lessons-learnt-so-far"></a>
- Remember to seperate header from data if header is present
- Good idea to do some quick calculations by group to see for any quick insights
- There is a spike in gun deaths during the summer (May - Aug/Sep) every year
- Most deaths happened in July/August of each year
- There were way more male gun deaths than females
- In terms of race, there were a lot more White deaths (66,237) than any other race. Next up was Black (23,296) but that was at least a third of White deaths

**Next Steps to examine:**
- check gender deaths per month
- check race deaths per month
- take into consideration the other data points - police, place, education level, age and look for any correlations

---

### Get rate of gun deaths per 100000 people of each race

#### Class Notes
- Added as it was something I learnt new and wanted to incorporate into my notes

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we'll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the White racial category is $197318956$. We'd divide $66237$ by $197318956$:


```python
white_gun_death_rate = 66237 / 197318956
```

This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from 2012 to 2014. If you do this computation, you'll see that the rate is a very small number, $0.0003356849303419181$. It's for this reason that it's typical to express crime statistics as the "_rate per 100000_". This tells you the number of people in a given group out of every $100000$ that were killed by guns in the US. To get this, we just multiply by $100000$:


```python
rate_per_hundredk = 0.0003356849303419181 * 100000
```

This gives us $33.56$, which we can interpret as "_33.56 out of every 100000 people in the White census race category in the US were killed by guns between 2012 and 2014_".

In [13]:
# read in census data
f = open('census.csv', 'r')
census = list(csv.reader(f))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [14]:
# extension - easily line up race with amount
# race columns start in the 11th index
census_header = census[0][10:]
population = census[1][10:]

for i in zip(census_header, population):
    print('{}: {}'.format(i[0], i[1]))

Race Alone - White: 197318956
Race Alone - Hispanic: 44618105
Race Alone - Black or African American: 40250635
Race Alone - American Indian and Alaska Native: 3739506
Race Alone - Asian: 15159516
Race Alone - Native Hawaiian and Other Pacific Islander: 674625
Two or More Races: 6984195


In [15]:
# add Asian + Native Hawaiian and Other Pacific Islander
15159516 + 674625

15834141

In [16]:
# manually create map between race_counts and census
mapping = {
    'Asian/Pacific Islander': 15834141,
    'Black': 40250635,
    'Hispanic': 44618105,
    'Native American/Native Alaskan': 3739506,
    'White': 197318956
}

In [17]:
# calculate
race_per_hundredk = {}

for race in race_counts:
    gun_death_rate = race_counts[race] / mapping[race]
    #print(gun_death_rate)
    gun_death_rate_per_hundredk = gun_death_rate * 100000
    race_per_hundredk[race] = gun_death_rate_per_hundredk

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

### Calculate Homicide Race Rates

In [22]:
# extract intents
intents = [row[3] for row in data]
intents[:5]

['Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide']

In [25]:
# extract race
races = [row[7] for row in data]
races[:5]

['Asian/Pacific Islander', 'White', 'White', 'White', 'White']

In [45]:
# calculate
homicide_race_counts = {}

for i, race in enumerate(races):
    intent = intents[i]
    if intent == 'Homicide':
        #print(i,':', race,':', intent)
        if race not in homicide_race_counts:
            homicide_race_counts[race] = 1
        else:
            homicide_race_counts[race] += 1
            
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [48]:
# calculate rate per 100k
race_per_hundredk = {}
for race, deaths in homicide_race_counts.items():
    homicide_rate = (deaths / mapping[race]) * 100000
    #print(race, ':', homicide_rate)
    race_per_hundredk[race] = homicide_rate

print('Race/100k:', race_per_hundredk)

Race/100k: {'Hispanic': 0.028300532046844913, 'Asian/Pacific Islander': 0.022295786244231088, 'Black': 0.12042365291176386, 'Native American/Native Alaskan': 0.23312515145691343, 'White': 0.002349313969685373}


---

## Findings <a class="anchor" id="findings"></a>
- When taking census data into consideration, blacks experience the highest gun death rate per 100K (~58). The next highest are whites with ~34.
- There are a larger number of black homicides (19,510), followed by white homicides (9,147)
- Although Asian/Pacific Islanders experience the lowest gun death rate per 100K (8), they experienced more homicide gun deaths than Native American/Native Alaskan (559 vs. 326)

## Next Steps <a class="anchor" id="next-steps"></a>
- Look at education levels and locations
- Create some plots to better visualize the statistics
- Look at age, police
- More analysis around time - look for trends on annual or month to month basis
- Look into why more deaths occur during summer - is there something else in this dataset that can help explain this observation?
- Look at other intents by race and gender

## Project Summary <a class="anchor" id="project-summary"></a>
That's it for the guided steps! We recommend exploring the data more on your own.

Here are some potential next steps:

- Figure out the link, if any, between month and homicide rate.
- Explore the homicide rate by gender.
- Explore the rates of other intents, like Accidental, by gender and race.
- Find out if gun death rates correlate to location and education.

We recommend creating a Github repository and placing this project there. It will help other people, including employers, see your work. As you start to put multiple projects on Github, you'll have the beginnings of a strong portfolio. You're welcome to keep working on the project here, but we recommend downloading it to your computer using the download icon above and working on it there.