# Analyzing data on gun deaths in US

In [1]:
import csv
f = open("guns.csv", "r")
data = list(csv.reader(f))
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


Removing Header from the dataset "data", which is a **list of lists**.

In [2]:
headers = data[0]
data = data[1:]

Display the header

In [3]:
print(headers)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


Displaying the first 5 rows of data

In [4]:
print(data[:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Deaths each year

The **year** column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

We can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

In [5]:
# using list comprehension

years = [row[1] for row in data]
print(years[:5])

['2012', '2012', '2012', '2012', '2012']


In [6]:
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a **datetime.datetime** object using the year and month columns. We'll then be able to count up gun deaths by date, like we did by year in the last screen.

We can use the **month** and **year** column of data to create a datetime. We'll specify a fixed day because we're missing that column in our data.

If we create a **datetime.datetime** object for each row, we can then count up how many gun deaths occurred in each month and year using a similar procedure to what we did in the last screen.

In [7]:
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
print(dates[:5])

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


Count up how many times each unique date occurs in **dates**. Assign the result to **date_counts**.

In [8]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
        
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

The **sex** and **race** columns contain potentially interesting information on how gun deaths in the US vary by gender and race

In [9]:
# for deaths of each sex

sex_list = [row[5] for row in data]

sex_counts = {}
for sex in sex_list:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
        
sex_counts
    

{'F': 14449, 'M': 86349}

In [10]:
# for deaths of each race

race_list = [row[7] for row in data]

race_counts = {}
for race in race_list:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
        
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

Till now, we have analysed the deaths on the basis of:
* year
* date
* sex
* race

## Using census Data

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

The data contains information on the total population of the US, as well as the total population of each racial group in the US. The data is stored in the census.csv file, and only consists of two rows:

* the first row is a header row
* the second row consists of population counts


In [11]:
f = open("census.csv", "r")
census = list(csv.reader(f))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

### Rates of gun deaths per Race

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we'll need to divide the total number of gun deaths by the population of each race.

This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from 2012 to 2014. If you do this computation, you'll see that the rate is a very small number, 0.0003356849303419181. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells you the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, we just multiply by 100000.


#### Problem 

The only stumbling block is that the racial categories are named slightly differently in **census** and in **data**. We'll need to manually construct a dictionary that allows us to map between them, and perform the division.

Here's a list of the race name in **data**, and the corresponding race name in **census**:

* Asian/Pacific Islander -- Race Alone - Asian **plus(+)** Race Alone - Native Hawaiian and Other Pacific Islander
* Black -- Race Alone - Black or African American
* Hispanic -- Race Alone - Hispanic
* Native American/Native Alaskan -- Race Alone - American Indian and Alaska Native
* White -- Race Alone - White

In [12]:
# mapping dict contains population of each race 
#in the key of race_counts
mapping = {
    "Asian/Pacific Islander" : int(census[1][14]) + int(census[1][15]),
    "Black" : int(census[1][12]),
    "Native American/Native Alaskan" : int(census[1][13]),
    "Hispanic" : int(census[1][11]),
    "White" : int(census[1][10])
}

**race_per_hundredk** contain the rate of gun deaths per 100000 people for each racial category

In [13]:
race_per_hundredk = {}
for key, value in race_counts.items():
    race_per_hundredk[key] = (value/mapping[key]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Homicide per race

We can filter our results, and restrict them to the **Homicide** intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we'll need to redo our work in generating **race_counts**, but only count rows where the intent was Homicide.

We can do this by first extracting the **intent** column, then using the **enumerate()** function to loop through each index and value in the race column. If the value in the same position in **intents** is **Homicide**, we'll count the value in the race column.

Finally, we'll use the **mapping** dictionary to convert from raw counts to rates.

In [14]:
intents = [row[3] for row in data]
races = [row[7] for row in data]
homicide_race_counts = {}

In [15]:
for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
            
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [16]:
for index, value in homicide_race_counts.items():
    homicide_race_counts[index] = (value/mapping[index]) * 100000
    
homicide_race_counts

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

# To do...

Here are some potential next steps:

* Figure out the link, if any, between month and homicide rate.
* Explore the homicide rate by gender.
* Explore the rates of other intents, like Accidental, by gender and race.
* Find out if gun death rates correlate to location and education