# Gun Deaths in the US

This file analyzes Gun Deaths in the US between 2012 and 2014, to gain insights on the data present in "guns.csv", a dataset provided by [FiveThirtyEight](https://www.fivethirtyeight.com/).

The objective is to practice Data Science skills.

The data is presented in the format:

|   | year | month | intent | police | sex |   age   | race | hispanic | place | education |
|:-:|:----:|:-----:|:------:|:------:|:----:|:---:|:---:|:--------:|:-----:|:------------:|
| 1 | 2012 |   1   | Suicide | 0      |  M  | 34.0| White | 100     | Street | 3.0 |

In [1]:
import csv
from IPython.display import HTML, display
 
# Obtain data from CSV

f = open("guns.csv")
content = csv.reader(f)
data = list(content)

In [2]:
# Extract headers from data
headers = data.pop(0)

Let's find out how much Gun Deaths change by year.

In [3]:
# Extract all years
years = [ row[1] for row in data ]

year_counts = dict()

# Count occurences for each year
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
# Display occurences by year
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

It seems like Gun Deaths do not change much depending on the year, so let's analyze other parameters to find better insights.

Now let's take into consideration both years and months together.

In [4]:
import datetime

# Extract dates
dates = [ datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) \
         for row in data ]

date_counts = {}

for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
        
# Display dates
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

As seen above, the dates appear not to be related with Gun Death occurences.

The __sex__ and __race__ columns may contain potentially interesting information on how gun deaths in the US vary by gender and race.

In [5]:
# Extract genders and races
genders = [ row[5] for row in data ]
races = [ row[7] for row in data ]

sex_counts = {}
race_counts = {}

for gender in genders:
    if gender in sex_counts:
        sex_counts[gender] += 1
    else:
        sex_counts[gender] = 1

for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
        
# Display results
print(sex_counts)
race_counts

{'M': 86349, 'F': 14449}


{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

In [6]:
total = len(data)

# Calculate percentages
gender_dist = {}
races_dist = {}

for gender in sex_counts:
    gender_dist[gender] = (sex_counts[gender]/float(total)) * 100
    
    
for race in race_counts:
    races_dist[race] = (race_counts[race]/float(total)) * 100
    
print(gender_dist)
races_dist

{'M': 85.66539018631322, 'F': 14.33460981368678}


{'Asian/Pacific Islander': 1.315502291712137,
 'Black': 23.111569674001466,
 'Hispanic': 8.95057441615905,
 'Native American/Native Alaskan': 0.9097402726244569,
 'White': 65.71261334550289}

As seen above, it is clear that most deaths belong to Males, which take 85.66% of the total.

Also, it is clear that the race with most deaths is White, with 65.71% total, followed by Black with 23.11%.

However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. 

What we really want to get is a rate of _gun deaths per 100 000 people_ of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out, present in the 'census.csv' file.

In [7]:
# Read the Census file
census = list(csv.reader(open('census.csv')))

display(HTML(
    '<table><tr>{}</tr></table>'.format(
        '</tr><tr>'.join(
            '<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in census)
        )
 ))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Id,Year,Id,Sex,Id,Hispanic Origin,Id,Id2,Geography,Total,Race Alone - White,Race Alone - Hispanic,Race Alone - Black or African American,Race Alone - American Indian and Alaska Native,Race Alone - Asian,Race Alone - Native Hawaiian and Other Pacific Islander,Two or More Races
cen42010,"April 1, 2010 Census",totsex,Both Sexes,tothisp,Total,0100000US,,United States,308745538,197318956,44618105,40250635,3739506,15159516,674625,6984195


We'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in census and in data. We'll need to manually construct a dictionary that allows us to map between them, and perform the division.

In [9]:
# Create manual mapping for 
mapping = {
    "Asian/Pacific Islander": int(census[1][14]) + int(census[1][15]),
    "Black": int(census[1][12]),
    "Hispanic": int(census[1][11]),
    "Native American/Native Alaskan": int(census[1][13]),
    "White": int(census[1][10])
}

race_per_hundredk = {}

for race in race_counts:
    race_per_hundredk[race] = (float(race_counts[race])/mapping[race]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

Now we can observe that "Black" has the most Gun Deaths per 100 000 individuals between 2012 and 2014.

We can now restrict our Analysis to just the "Homicide" intent.

In [11]:
# Get intents
intents = [ row[3] for row in data ]

homicide_race_counts = {}

for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
            
for race in race_counts:
    homicide_race_counts[race] = (float(homicide_race_counts[race])/mapping[race]) * 100000
    
homicide_race_counts

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

Finally, we observe that Native American/Native Alaskan were murdered by Gun more often than White people in the US for our years of study, and that "Black" stays at the first place with 48.47 murdered in 100 000 individuals.