# 9. Exploring Gun Deaths in the US - Python Programming - Intermediate

## 1. Introducing US Gun Deaths Data

In this project, you'll be working with Jupyter notebook, and analyzing data on gun deaths in the US. By the end, you'll have a notebook that you can add to your portfolio or build on top of on your own. If you need help at any point, you can consult our solution notebook [here](https://github.com/dataquestio/solutions/blob/master/Mission218Solution.ipynb).

The dataset came from [FiveThirtyEight](http://fivethirtyeight.com/), and can be found [here](https://github.com/fivethirtyeight/guns-data/blob/master/full_data.csv). The dataset is stored in the <font color='red'>guns.csv</font> file. It contains information on gun deaths in the US from <font color='red'>2012</font> to <font color='red'>2014</font>. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here are the first few rows of the dataset:

<img src = 'table_gun_deaths.jpeg'/>

As you can see above, the first row of the data is a header row, which tells you what kind of data is in each column of the CSV file. Each row contains information about the fatality, and the victim. Here's an explanation of each column:

* -- this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.
* <font color='red'>year</font> -- the year in which the fatality occurred.
* <font color='red'>month</font> -- the month in which the fatality occurred.
* <font color='red'>intent</font> -- the intent of the perpetrator of the crime. This can be <font color='red'>Suicide</font>, <font color='red'>Accidental</font>, <font color='red'>NA</font>, <font color='red'>Homicide</font>, or <font color='red'>Undetermined</font>.
* <font color='red'>police</font> -- whether a police officer was involved with the shooting. Either <font color='red'>0</font> (false) or <font color='red'>1</font> (true).
* <font color='red'>sex</font> -- the gender of the victim. Either <font color='red'>M</font> or <font color='red'>F</font>.
* <font color='red'>age</font> -- the age of the victim.
* <font color='red'>race</font> -- the race of the victim. Either <font color='red'>Asian/Pacific Islander</font>, <font color='red'>Native American/Native Alaskan</font>, <font color='red'>Black</font>, <font color='red'>Hispanic</font>, or <font color='red'>White</font>.
* <font color='red'>hispanic</font> -- a code indicating the Hispanic origin of the victim.
* <font color='red'>place</font> -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.
* <font color='red'>education</font> -- educational status of the victim. Can be one of the following:
    * <font color='red'>1</font> -- Less than High School
    * <font color='red'>2</font> -- Graduated from High School or equivalent
    * <font color='red'>3</font> -- Some College
    * <font color='red'>4</font> -- At least graduated from College
    * <font color='red'>5</font> -- Not available

In this project, we'll explore the dataset, and try to find patterns in the demographics of the victims. Our first step is to read the data in and take a look at it.

### Instructions

* Read the dataset in as a list using the [csv](https://docs.python.org/3/library/csv.html) module.
    * Import the <font color='red'>csv</font> module.
    * Open the file using the [open()](https://docs.python.org/3/library/functions.html#open) function.
    * Use the [csv.reader()](https://docs.python.org/3/library/csv.html#csv.reader) function to load the opened file.
* Call [list()](https://docs.python.org/3/library/functions.html#func-list) on the result to get a list of all the data in the file.
    * Assign the result to the variable <font color='red'>data</font>.
* Display the first <font color='red'>5</font> rows of <font color='red'>data</font> to verify everything.

In [1]:
import csv
f = open("data/guns.csv", 'r')
csvreader = csv.reader(f)

# Convert the result to a list
data = list(csvreader)

# Display the first 5 rows of data to verify everything.
print(data[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+']]


## 2. Removing Headers From A List Of Lists

In the last screen, we read our data into the list of lists <font color='red'>data</font>. Each inner list in <font color='red'>data</font> represents a single row. Each item in the inner lists represents a single column for that row. Here's how the first <font color='red'>5</font> rows should have looked:

    [
        ['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], 
        ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], 
        ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], 
        ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], 
        ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']
    ]

You hopefully noticed that the first item in the <font color='red'>data</font> list is a header row. In order to analyze the data properly, we'll have to remove the header row, which contains the names of each column. We can remove this using list slicing. You can read more about lists [here](https://docs.python.org/3/tutorial/datastructures.html).

### Instructions

* Extract the first row of <font color='red'>data</font>, and assign it to the variable <font color='red'>headers</font>.
* Remove the first row from <font color='red'>data</font>.
* Display <font color='red'>headers</font>.
* Display the first <font color='red'>5</font> rows of <font color='red'>data</font> to verify that you removed the header row properly.

In [2]:
# Extract the first row of data, and assign it to the variable headers
headers = data[0]

# Remove the first row from data.
data = data[1:]

# Display headers.
print(headers)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


In [3]:
# Display the first 5 rows of data to verify that you removed the header row properly.
print(data[0:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED']]


## 3. Counting Gun Deaths By Year

The <font color='red'>year</font> column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

We can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

### Instructions

* Use a list comprehension to extract the <font color='red'>year</font> column from <font color='red'>data</font>.
    * Because the <font color='red'>year</font> column is the second column in the data, you'll need to get the element at index <font color='red'>1</font> in each row.
    * Assign the result to the variable <font color='red'>years</font>.
* Create an empty dictionary called <font color='red'>year_counts</font>.
* Loop through each element in <font color='red'>years</font>.
    * If the element isn't a key in <font color='red'>year_counts</font>, create it, and set the value to <font color='red'>1</font>.
    * If the element is a key in <font color='red'>year_counts</font>, increment the value by one.
* Display <font color='red'>year_counts</font> to see how many gun deaths occur in each year.

In [4]:
# Get the year column
years = [row[1] for row in data]

# Create empty dictionary
year_counts = {}

# Loop through years and count years
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


## 4. Exploring Gun Deaths By Month And Year

It looks like gun deaths didn't change much by year from <font color='red'>2012</font> to <font color='red'>2014</font>. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a [datetime.datetime](https://docs.python.org/3/library/datetime.html#datetime-objects) object using the <font color='red'>year</font> and <font color='red'>month</font> columns. We'll then be about to count up gun deaths by date, like we did by <font color='red'>year</font> in the last screen.

As you may recall from an earlier mission, you can create a <font color='red'>datetime</font> object by specifying the <font color='red'>year</font>, <font color='red'>month</font>, and <font color='red'>day</font> keyword arguments:

    date = datetime(year=2016, month=12, day=1)

We can use the <font color='red'>month</font> and <font color='red'>year</font> column of <font color='red'>data</font> to create a datetime. We'll specify a fixed <font color='red'>day</font> because we're missing that column in our data.

If we create a <font color='red'>datetime.datetime</font> object for each row, we can then count up how many gun deaths occurred in each month and year using a similar procedure to what we did in the last screen.

### Instructions

* Use a list comprehension to create a <font color='red'>datetime.datetime</font> object for each row. Assign the result to <font color='red'>dates</font>.
    * The <font color='red'>year</font> column in the second element in each row.
    * The <font color='red'>month</font> column is the third element in each row.
    * Make sure to convert <font color='red'>year</font> and <font color='red'>month</font> to integers using [int()](https://docs.python.org/3/library/functions.html#int).
    * Pass <font color='red'>year</font>, <font color='red'>month</font>, and <font color='red'>day=1</font> into the <font color='red'>datetime.datetime()</font> function.
* Display the first <font color='red'>5</font> rows in <font color='red'>dates</font> to verify everything worked.
* Count up how many times each unique date occurs in <font color='red'>dates</font>. Assign the result to <font color='red'>date_counts</font>.
    * This follows a similar procedure to what we did in the last screen with <font color='red'>year_counts</font>.
* Display <font color='red'>date_counts</font>.

In [5]:
import datetime

# Use a list comprehension to create a datetime.datetime object for each row. Assign the result to dates
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]

# Display the first 5 rows in dates to verify everything worked.
print(dates[0:5])

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


In [6]:
# Count up how many times each unique date occurs in dates. Assign the result to date_counts.
# Create empty dictionary
date_counts = {}

# Loop through dates and count each unique date
for row in dates:
    if row in date_counts:
        date_counts[row] += 1
    else:
        date_counts[row] = 1

# Display date_counts.
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## 5. Exploring Gun Deaths By Race And Sex

The <font color='red'>sex</font> and <font color='red'>race</font> columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.

### Instructions

* Count up how many times each item in the <font color='red'>sex</font> column occurs.
    * Assign the result to <font color='red'>sex_counts</font>.
* Count up how many times each item in the <font color='red'>race</font> column occurs.
    * Assign the result to <font color='red'>race_counts</font>.
* Display <font color='red'>race_counts</font> and <font color='red'>sex_counts</font> to verify your work, and see if you can spot any patterns.
* Write a markdown cell detailing what you've learned so far, and what you think might need further examination.

In [7]:
# Get the sex column
sex = [row[5] for row in data]

# Get the race column
race = [row[7] for row in data]

# Create empty dictionary
sex_counts = {}
race_counts = {}

# Loop through sex and count
for row in sex:
    if row in sex_counts:
        sex_counts[row] += 1
    else:
        sex_counts[row] = 1

# Loop through race and count
for row in race:
    if row in race_counts:
        race_counts[row] += 1
    else:
        race_counts[row] = 1       
              
print(sex_counts)
print(race_counts)

{'M': 86349, 'F': 14449}
{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}


There are 6 times more male gun deaths than female.  
There are almost 3 times more white gun deaths than blacks

## 6. Reading In A Second Dataset 

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per <font color='red'>100000</font> people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

The data contains information on the total population of the US, as well as the total population of each racial group in the US. The data is stored in the <font color='red'>census.csv</font> file, and only consists of two rows:

<img src = 'table_census.jpeg'>

As you can see, the first row is a header row, and the second row consists of population counts. We'll need to read this file in using the <font color='red'>csv.reader()</font> function.

### Instructions

* Read in <font color='red'>census.csv</font>, and convert to a list of lists. Assign the result to the <font color='red'>census</font> variable.
* Display <font color='red'>census</font> to verify your work.

In [12]:
# Read in census.csv, and convert to a list of lists. Assign the result to the census variable.
import csv
f = open("data/census.csv", 'r')
csvreader = csv.reader(f)

# Convert the result to a list
census = list(csvreader)

# Display census
print(census)

[['', 'Id', 'Year', 'Id.1', 'Sex', 'Id.2', 'Hispanic Origin', 'Id.3', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['0', 'cen42010', 'April 1 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', 'NaN', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


## 7. Computing Rates Of Gun Deaths Per Race

Earlier, we computed the number of gun deaths per race, and created a dictionary, <font color='red'>race_counts</font>, that looked like this:

    {

         'Asian/Pacific Islander': 1326,

         'Black': 23296,

         'Hispanic': 9022,

         'Native American/Native Alaskan': 917,

         'White': 66237

    }

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per <font color='red'>100000</font> people in each race, we'll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the <font color='red'>White</font> racial category is <font color='red'>197318956</font>. We'd divide <font color='red'>66237</font> by <font color='red'>197318956</font>:

    white_gun_death_rate = 66237 / 197318956

This gives us the percentage chance that a given person in the <font color='red'>White</font> census race category would have been killed by a gun in the US from <font color='red'>2012</font> to <font color='red'>2014</font>. If you do this computation, you'll see that the rate is a very small number, <font color='red'>0.0003356849303419181</font>. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells you the number of people in a given group out of every <font color='red'>100000</font> that were killed by guns in the US. To get this, we just multiply by <font color='red'>100000</font>:

    rate_per_hundredk = 0.0003356849303419181 * 100000

This gives us <font color='red'>33.56</font>, which we can interpret as "<font color='red'>33.56</font> out of every <font color='red'>100000</font> people in the <font color='red'>White</font> census race category in the US were killed by guns between <font color='red'>2012</font> and <font color='red'>2014</font>".

We'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in <font color='red'>census</font> and in <font color='red'>data</font>. We'll need to manually construct a dictionary that allows us to map between them, and perform the division.

Here's a list of the race name in <font color='red'>data</font>, and the corresponding race name in <font color='red'>census</font>:

* <font color='red'>Asian/Pacific Islander</font> -- <font color='red'>Race Alone - Asian</font> plus <font color='red'>Race Alone - Native Hawaiian and Other Pacific Islander</font>.
* <font color='red'>Black</font> -- <font color='red'>Race Alone - Black or African American</font>.
* <font color='red'>Hispanic</font> -- <font color='red'>Race Alone - Hispanic</font>
* <font color='red'>Native American/Native Alaskan</font> -- <font color='red'>Race Alone - American Indian and Alaska Native</font>
* <font color='red'>White</font> -- <font color='red'>Race Alone - White</font>

We'll need to create a dictionary that has each race name from <font color='red'>data</font> as a key, and has the population count for the races from <font color='red'>census</font> as the values.

### Instructions

* Manually create a dictionary, <font color='red'>mapping</font> that maps each key from <font color='red'>race_counts</font> to the population count of the race from <font color='red'>census</font>.
    * The keys in the dictionary should be <font color='red'>Asian/Pacific Islander</font>, <font color='red'>Black</font>, <font color='red'>Native American/Native Alaskan</font>, <font color='red'>Hispanic</font>, and <font color='red'>White</font>.
    * In the case of <font color='red'>Asian/Pacific Islander</font>, you'll need to add the counts from <font color='red'>census</font> for <font color='red'>Race Alone - Asian</font>, and <font color='red'>Race Alone - Native Hawaiian and Other Pacific Islander</font>.
* Create an empty dictionary, <font color='red'>race_per_hundredk</font>.
* Loop through each key in <font color='red'>race_counts</font>.
    * Divide the value associated with the key in <font color='red'>race_counts</font> by the value associated with the key in <font color='red'>mapping</font>.
    * Multiply by <font color='red'>100000</font>.
    * Assign the result to the same key in <font color='red'>race_per_hundredk</font>.
* When you're done, <font color='red'>race_per_hundredk</font> should contain the rate of gun deaths per <font color='red'>100000</font> people for each racial category.
    Print <font color='red'>race_per_hundredk</font> to verify your work.

In [18]:
# Manually create a dictionary, mapping that maps each key from race_counts 
# to the population count of the race from census.
mapping = {
    'Asian/Pacific Islander': 15159516+674625,
    'Black': 40250635,
    'Native American/Native Alaskan': 3739506,
    'Hispanic': 44618105,
    'White': 197318956
}

# Create an empty dictionary
race_per_hundredk = {}

# Loop through each key in race_counts
for i in race_counts:
    val = race_counts[i] / mapping[i] * 100000
    race_per_hundredk[i] = val
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## 8. Filtering By Intent

We can filter our results, and restrict them to the <font color='red'>Homicide</font> intent. This will tell us what the gun-related murder rate per <font color='red'>100000</font> people in each racial category is. In order to do this, we'll need to redo our work in generating <font color='red'>race_counts</font>, but only count rows where the <font color='red'>intent</font> was <font color='red'>Homicide</font>.

We can do this by first extracting the <font color='red'>intent</font> column, then using the [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) function to loop through each index and value in the race column. If the value in the same position in <font color='red'>intents</font> is <font color='red'>Homicide</font>, we'll count the value in the race column.

Finally, we'll use the <font color='red'>mapping</font> dictionary to convert from raw counts to rates.

### Instructions

* Extract the <font color='red'>intent</font> column using a list comprehension. The <font color='red'>intent</font> column is the fourth column in <font color='red'>data</font>.
    * Assign the result to <font color='red'>intents.
* Extract the <font color='red'>race</font> column using a list comprehension. The <font color='red'>race</font> column is the eighth column in <font color='red'>data</font>.
    * Assign the result to <font color='red'>races</font>.
* Create an empty dictionary called <font color='red'>homicide_race_counts</font>
* Use the <font color='red'>enumerate()</font> function to loop through each item in <font color='red'>races</font>. The position should be assigned to the loop variable <font color='red'>i</font>, and the value to the loop variable <font color='red'>race</font>.
    * Check the value at position <font color='red'>i</font> in <font color='red'>intents</font>.
        * If the value at position <font color='red'>i</font> in <font color='red'>intents</font> is <font color='red'>Homicide</font>:
            * If the key <font color='red'>race</font> doesn't exist in <font color='red'>homicide_race_counts</font>, create it.
            * Add <font color='red'>1</font> to the value associated with <font color='red'>race</font> in <font color='red'>homicide_race_counts</font>.
* When you're done, <font color='red'>homicide_race_counts</font> should have one key for each of the racial categories in <font color='red'>data</font>. The associated value should be the number of gun deaths by homicide for that race.
* Perform the same procedure we did in the last screen using <font color='red'>mapping</font> on <font color='red'>homicide_race_counts</font> to get from raw numbers to rates per <font color='red'>100000</font>.
* Display <font color='red'>homicide_race_counts</font> to verify your work.
* Write up your findings in a markdown cell.
* Write up any next steps you want to pursue with the data in a markdown cell.

In [23]:
# Extract the intent column
intents = [row[3] for row in data]

# Extract the race column
races = [row[7] for row in data]

# Craete an empty dictionary
homicide_race_counts = {}

# Use the enumerate() function to loop through each item in races. 
# The position should be assigned to the loop variable i, and the value to the loop variable race.
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
            
# print(homicide_race_counts)
# Perform the same procedure we did in the last screen 
# using mapping on homicide_race_counts to get from raw numbers to rates per 100000.

# Loop through each key in homicide_race_counts
for i in homicide_race_counts:
    homicide_race_counts[i] = homicide_race_counts[i] / mapping[i] * 100000
    
homicide_race_counts

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

Per 100,000 almost 10,5 times more blacks die of homicide than whites.

## 9. Next Steps

That's it for the guided steps! We recommend exploring the data more on your own.

Here are some potential next steps:

* Figure out the link, if any, between month and homicide rate.
* Explore the homicide rate by gender.
* Explore the rates of other intents, like <font color='red'>Accidental</font>, by gender and race.
* Find out if gun death rates correlate to location and education.

We recommend creating a [Github](https://github.com/) repository and placing this project there. It will help other people, including employers, see your work. As you start to put multiple projects on Github, you'll have the beginnings of a strong portfolio. You're welcome to keep working on the project here, but we recommend downloading it to your computer using the download icon above and working on it there.