# Exploring Gun Deaths in the U.S


This project is aimed at exploring the gun deaths in the U.S by finding patterns in the demographics. 

The data for the project can be found [here](https://github.com/fivethirtyeight/guns-data/blob/master/full_data.csv)

The first step is to import the file into a list of lists using the csv module as shown below:

In [2]:
import csv
with open("gun.csv") as f:
    file = csv.reader(f)
    data = list(file)

In [3]:
#Listing the first 5 rows of the data:

data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  'BA+'],
 ['2',
  '2012',
  '01',
  'Suicide',
  '0',
  'F',
  '21',
  'White',
  '100',
  'Street',
  'Some college'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  'BA+'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+']]

The next step is to ensure that we extract the header row from the data and put it into 
a seprate list while the data itself is in another list so that we can extract insights
from the data alone minus the header row.

We can do this using the code shown below:

In [4]:
#Extracting the header row

header = data[0]

#Verifying the header row
header

['',
 'year',
 'month',
 'intent',
 'police',
 'sex',
 'age',
 'race',
 'hispanic',
 'place',
 'education']

In [6]:
#Storing the rest of the data minus the header row 

data_list = data[1:]

## Exploring gun deaths by time period

The **year** column contains information on how many gun deaths took place every year. 
We can use this column to calculate information about:

- The number of gun deaths that have taken place each year.


In [7]:
#Extracting the year column using list comprehension

years = [row[1] for row in data_list]

In [8]:
#Creating an empty dictionary to store the counts of the gun deaths by year

year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] = year_counts[year] + 1
    else:
        year_counts[year] = 1

In [9]:
#Exploring the gun deaths by year

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

From the data above we can see that:

- Gun deaths were highest in 2013 
- While Gun deaths were lowest. in 2012

However we do observe that the number of gun deaths does not increase very significantly between
every year. In order to extract better insights let's extract tbe gun deaths on a monhtly basis for 
each year.

In [22]:
#Extracting the months using the datetime package and a list comprehension

import datetime

dates = [datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1)for row in data_list]

In [23]:
#Verifying the dates 

dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

Thus we see that the variable **dates** holds the dates in the datetime format for 
us to work with.

In [24]:
#Creating a dictionary to hold the counts of gun deaths by month of the year

date_counts = {}

for date in dates:
    if date in date_counts:
        date_counts[date] = date_counts[date] + 1
    else:
        date_counts[date] = 1

In [27]:
#Displaying the results

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

From the data above:
    
- Gun deaths have been **high** in the month of **July** in the season of summer,. This could be due to the fact that 
  festivals, concerts and other public events where the possibility of a gun deaths is 
  at it's peak 
- Gun deaths are **lowest** in **Febuary**. This could be due to that fact that Febuary has lesser number of days
  compared to the other months.

## Exploring Gun Deaths by Race and Gender

In the section below we explore the gun deaths by race and gender to see if there are
some interesting patterns that develop as a result: 

In [10]:
#Creating a list of races

races = [row[7] for row in data_list]

In [11]:
#Creating a dictionary to count the races

race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

From the data we can observe: 

- White people have had the highest number of deaths related to Guns in the US.
- Native Americans have had the lowest number of deaths related to Guns in the US.
- This could be due to White poeple representing the largest number of the demographic in
  the US.
- Additionally the lower number of deaths for Native Americans and Asian/Pacific Islander 
  could be due to the face that they represent the lowest number of the demographic in the 
  US

In [12]:
#Creating a list of the Sexes

sexes = [row[5] for row in data_list]

In [38]:
#Creating a dictionary to count the unique sexes

sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1
sex_counts

{'F': 14449, 'M': 86349}

From the data we can observe that

- Males have the highest number of deaths while Females have the lowest deaths. 
- The ratio of Male:Female deaths is 6
- The ratio of Male:Female in the US is 1.06
- This is a dispropotionatly high number of deaths related to Guns for the males. 
- This points to the observation that this could be due to the fact that more males carry guns
  in the US rather than the females.

## Exploring deaths by intent and education

In this section we explore to find patterns in gun deaths by intent and by the education of the 
individiual.

In [51]:
intents = [row[3] for row in data_list]

intent_counts = {}
for intent in intents:
    if intent not in intent_counts:
        intent_counts[intent] = 0
    intent_counts[intent] += 1
intent_counts

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

From the data above we can observe:
    
- Gun deaths are highest for **Suicides**
- While gun deaths are least for **Undetermined** cases
- There is one missing value in our dataset.
- There have been 35,176 cases of **homicides** in the US.

In [54]:
educations = [row[10] for row in data_list]

education_counts = {}
for education in educations:
    if education not in education_counts:
        education_counts[education] = 0
    education_counts[education] += 1
education_counts

{'BA+': 12946,
 'HS/GED': 42927,
 'Less than HS': 21823,
 'NA': 1422,
 'Some college': 21680}

From the data above we can observe: 

- The maximum number deaths involving guns have taken place for people with a "HS/GED" or High School
  degree. 
- But we are not sure if they are people who are presently in high school or if they are people who 
  have graduated from high school.
- In order to determine this we will filter by the age to check if most of the deaths are for people 
  greater than the age of 18 which means they are high school graduates or if they are under the age 
  of 18 which means that they are still in high school.

## Exploring gun deaths by age



In [59]:
ages = [row[6] for row in data_list]

age_counts = {}
for age in ages:
    if age not in age_counts:
        age_counts[age] = 0
    age_counts[age] += 1
age_counts

{'0': 33,
 '1': 38,
 '10': 53,
 '100': 1,
 '101': 2,
 '102': 2,
 '107': 1,
 '11': 61,
 '12': 117,
 '13': 229,
 '14': 364,
 '15': 561,
 '16': 864,
 '17': 1185,
 '18': 1753,
 '19': 2065,
 '2': 50,
 '20': 2219,
 '21': 2504,
 '22': 2712,
 '23': 2472,
 '24': 2437,
 '25': 2230,
 '26': 2231,
 '27': 2070,
 '28': 1986,
 '29': 1955,
 '3': 66,
 '30': 1869,
 '31': 1833,
 '32': 1824,
 '33': 1700,
 '34': 1699,
 '35': 1631,
 '36': 1512,
 '37': 1500,
 '38': 1491,
 '39': 1389,
 '4': 54,
 '40': 1414,
 '41': 1485,
 '42': 1492,
 '43': 1527,
 '44': 1449,
 '45': 1372,
 '46': 1437,
 '47': 1532,
 '48': 1621,
 '49': 1669,
 '5': 43,
 '50': 1674,
 '51': 1755,
 '52': 1715,
 '53': 1708,
 '54': 1684,
 '55': 1596,
 '56': 1625,
 '57': 1472,
 '58': 1510,
 '59': 1430,
 '6': 50,
 '60': 1361,
 '61': 1306,
 '62': 1099,
 '63': 1041,
 '64': 1126,
 '65': 1039,
 '66': 998,
 '67': 865,
 '68': 868,
 '69': 879,
 '7': 43,
 '70': 883,
 '71': 791,
 '72': 736,
 '73': 737,
 '74': 671,
 '75': 676,
 '76': 582,
 '77': 575,
 '78': 598,
 

From the results above:
    
- We can see that most of the deaths occured for people above the age of 18.
- Hence we can confirm that most deaths due to guns have taken place for high school graduates. 

## Working with the Census data 

Now while we have the total Gun deaths by Race and Sex in the US, it's not very useful unless
we know the propotion of deaths with respect to the total population in the Country.

For the next section we are going to be working with the census data to figure out how relative 
the deaths are with respect to the total population in the U.S.

In [13]:
import csv

with open("census.csv", "r") as f:
    reader = csv.reader(f)
    census = list(reader)

### Computing rate of gun deathes per race

In order to do this we map the total population of every race from the census data 
into a dictionary:

In [41]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

In [44]:
race_per_hundredk = {}

for key, value in race_counts.items():
    race_per_hundredk[key] = (value/mapping[key]) * 100000

In [45]:
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

From the data dictionary above we can observe that:
    
- There has been 8.3 deaths per 100,000 **Asian/Pacific Islander** people.
- There has been 57.87 deaths per 100,000 **Black** people.
- There has been 20.22 deaths per 100,000 **Hispanic** people.
- There has been 24.52 deaths per 100,000 **Native American/Native Alaskan** people.
- There has been 33.5 deaths per 100,000 **White** people. 

Thus we can see that the maximum deaths due to guns have taken place for 

- Black people

While the minimum deaths due to guns have taken place for

- Asian/Pacific Islander people

### Filtering deaths by intent

We can filter the deaths by the intent or reason behind it. For this let's convert the 
intent column in a list:

In [14]:
intents = [row[3] for row in data_list]

We want to determine which gun deaths have been a result of homicide and categorize this 
by the race so that we can determine the deaths due to a murderous intent. 

In [49]:
#Extracting the races column as a list
races = [row[7] for row in data_list]

#Creating a dictionary to store the counts for deaths by races due to homicide only
homicide_race_counts = {}

for i,race in enumerate(races):
    
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1
        
#Creating a dictionary to store the counts for deaths by races due to homicide per 100,000 people

race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

From the dictionary we can see that:

- **Black** people have died the most due to homicide with 48.47 deaths per 100,000 people.
- **Asian/Pacific Islander** have died the least due to homicide with 3.53 deaths per 100,000 people. 

# Conclusion

From the project above we have:

- Made use of **dictionaries**, **list comprehensions** and the **datetime** package in python to find patterns in 
  the gun death data for the race, gender, year, month, age, intent and education.
- We have used the census data along with the gun death data to find the proportion of deaths per 
  100,000 people for each race and by intent. 