[View in Colaboratory](https://colab.research.google.com/github/shahidsaifi92/Projects/blob/master/project2.ipynb)

# Project: Exploring Gun Deaths in the US

The dataset came from FiveThirtyEight, and can be found here. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here are the first few rows of the dataset:

year	month	intent	police	sex	age	race	hispanic	place	education
1	2012	1	Suicide	0	M	34.0	Asian/Pacific Islander	100	Home	4.0
2	2012	1	Suicide	0	F	21.0	White	100	Street	3.0
3	2012	1	Suicide	0	M	60.0	White	100	Other specified	4.0
4	2012	2	Suicide	0	M	64.0	White	100	Home	4.0
5	2012	2	Suicide	0	M	31.0	White	100	Other specified	2.0
As you can see above, the first row of the data is a header row, which tells you what kind of data is in each column of the CSV file. Each row contains information about the fatality, and the victim. Here's an explanation of each column:

-- this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.
year -- the year in which the fatality occurred.
month -- the month in which the fatality occurred.
intent -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.
police -- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).
sex -- the gender of the victim. Either M or F.
age -- the age of the victim.
race -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.
hispanic -- a code indicating the Hispanic origin of the victim.
place -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.
education -- educational status of the victim. Can be one of the following:
1 -- Less than High School
2 -- Graduated from High School or equivalent
3 -- Some College
4 -- At least graduated from College
5 -- Not available
In this project, we'll explore the dataset, and try to find patterns in the demographics of the victims. Our first step is to read the data in and take a look at it.


In [0]:
import csv
f = open('guns.csv','r')
data = list(csv.reader(f))
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


# Removing Headers From A List Of Lists
# instructions
-Extract the first row of data, and assign it to the variable headers.
-Remove the first row from data.
-Display headers.
-Display the first 5 rows of data to verify that you removed the header row properly.

In [0]:
headers = data[0]
data = data[1:]
print(headers)
print(data[:5])


['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


# Counting Gun Deaths By Year
# Instructions
-Use a list comprehension to extract the year column from data.
Because the year column is the second column in the data, you'll need to get the element at index 1 in each row.
Assign the result to the variable years.
Create an empty dictionary called year_counts.

-Loop through each element in years.
If the element isn't a key in year_counts, create it, and set the value to 1.
If the element is a key in year_counts, increment the value by one.
Display year_counts to see how many gun deaths occur in each year.

In [0]:
years = [row [1] for row in data]

year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1 

print(year_counts)


{'2014': 33599, '2012': 33563, '2013': 33636}


# Exploring Gun Deaths By Month And Year
# Instructions


It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a datetime.datetime object using the year and month columns. We'll then be able to count up gun deaths by date, like we did by year in the last screen.

As you may recall from an earlier mission, you can create a datetime object by specifying the year, month, and day keyword arguments:


date = datetime(year=2016, month=12, day=1)
We can use the month and year column of data to create a datetime. We'll specify a fixed day because we're missing that column in our data.

If we create a datetime.datetime object for each row, we can then count up how many gun deaths occurred in each month and year using a similar procedure to what we did in the last screen.

Instructions

Use a list comprehension to create a datetime.datetime object for each row. Assign the result to dates.
The year column is in the second element in each row.
The month column is the third element in each row.
Make sure to convert year and month to integers using int().
Pass year, month, and day=1 into the datetime.datetime() function.
Display the first 5 rows in dates to verify everything worked.
Count up how many times each unique date occurs in dates. Assign the result to date_counts.
This follows a similar procedure to what we did in the last screen with year_counts.
Display date_counts.


In [0]:
import datetime

dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
dates[:5]

        

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [0]:
date_counts = {}

for date in dates:
    if date not in date_counts:
        date_counts[date] = 0
    date_counts[date] += 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

#  Exploring Gun Deaths By Race And Sex

Count up how many times each item in the sex column occurs.
Assign the result to sex_counts.
Count up how many times each item in the race column occurs.
Assign the result to race_counts.
Display race_counts and sex_counts to verify your work, and see if you can spot any patterns.
Write a markdown cell detailing what you've learned so far, and what you think might need further examination.

In [0]:
for idx, element in enumerate(headers):
    print(element, idx)
    

 0
year 1
month 2
intent 3
police 4
sex 5
age 6
race 7
hispanic 8
place 9
education 10


In [0]:
sex_counts = {}
race_counts = {}

sex_col = [row[5] for row in data]
race_col = [row[7] for row in data]

for sex in sex_col:
    if sex not in sex_counts:
        sex_counts[sex] = 1
    else:
        sex_counts[sex] += 1
for race in race_counts:
    if race not in race_counts:
        race_counts[race] = 1
    else:
        race_counts[race] += 1
print(race_counts)
print(sex_counts)

{}
{'F': 14449, 'M': 86349}


# Reading In A Second Dataset

Read in census.csv, and convert to a list of lists. Assign the result to the census variable.
Display census to verify your work.

In [0]:
census = list(csv.reader(open('census.csv', 'r')))
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


# Computing Rates Of Gun Deaths Per Race

Manually create a dictionary, mapping that maps each key from race_counts to the population count of the race from census.

The keys in the dictionary should be Asian/Pacific Islander, Black, Native American/Native Alaskan, Hispanic, and White.

In the case of Asian/Pacific Islander, you'll need to add the counts from census for Race Alone - Asian, and Race Alone - Native Hawaiian and Other Pacific Islander.
Create an empty dictionary, race_per_hundredk.
Loop through each key in race_counts.

Divide the value associated with the key in race_counts by the value associated with the key in mapping.
Multiply by 100000.

Assign the result to the same key in race_per_hundredk.
When you're done, race_per_hundredk should contain the rate of gun deaths per 100000 people for each racial category.
Print race_per_hundredk to verify your work.

In [0]:
sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1
sex_counts
    

{'F': 14449, 'M': 86349}

In [0]:
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

# Findings


It appears that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.

Some areas to investigate further:

The link between month and homicide rate.
Homicide rate by gender.
The rates of other intents by gender and race.
Gun death rates by location and education.