# Guided Project #2 - U.S. Gun Deaths
By [Luis Munguia](http://www.linkedin.com/in/luis-munguia) and [Dataquest](http://www.dataquest.io)

In this guided project, I'll work with a dataset on gun deaths in the U.S. compiled by FiveThirtyEight from CDC information.

The dataset contains the following columns:

* `year`: the year in which the fatality occurred.
* `month`: the month in which the fatality occurred.
* `intent`: the intent of the perpetrator of the crime.
* `police`: whether a police officer was involved with the shooting.
* `sex`: the gender of the victim.
* `age`: the age of the victim.
* `race`: the race of the victim.
* `hispanic`: a code indicating the Hispanic origin of the victim.
* `place`: where the shooting occurred. Has several categories, which you're encouraged to explore on your own.
* `education`: educational status of the victim.

## 1.- Read CSV and explore data.

Use `csv` module to convert file. And display 5 rows to check data.

In [1]:
import csv

data = list(csv.reader(open("guns.csv")))
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

## 2.- Remove headers.

Remove **headers** from dataset.

In [2]:
headers = data[0]
data = data[1:]
data[:5]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  '2']]

## 3.- Display yearly gun deaths.

Use dictionary to count how many deaths by **year.**

In [3]:
years = [row[1] for row in data]

year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

Almost no variation in yearly deaths.

## 3.- Use datetime to determine gun deaths by month.

Use `datetime.datetime` object with **month and year.**

In [4]:
import datetime
dates = [datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [5]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
date_counts 

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

By looking at the data, it seems that gun deaths increase during summer break.
And February is the month with the least deaths. What does it mean?

## 4.- Analyze sex and race's relationship with gun death.

Use dictionary with sex and race to count how many deaths.

In [6]:
sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
sex_counts 

{'F': 14449, 'M': 86349}

In [7]:
total_deaths = 14449 + 86349
total_deaths

100798

In [8]:
14449 / total_deaths

0.1433460981368678

In [9]:
86349 / total_deaths

0.8566539018631322

In [10]:
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts 

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

According to this data, 85% of deaths are male. And almost all dead people are black or white. But without more data to compare this to, we won't exactly know how much does it represent from total population of the US.

## 5.- Use US Census data

Read data from census to determine rate of gun deaths per 10000 people of each race.

In [11]:
census = list(csv.reader(open("census.csv"))) 
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## 5.- Use Manual Dictionary to Combine Race Totals

Data from census came with diferent categories in race. I'll make a dictionary that contains agregated results. And with this determine death per 100,000.

In [12]:
mapping = {'Asian/Pacific Islander': 15159516 + 674625,
           'Black': 40250635,
           'Hispanic': 44618105,
           'Native American/Native Alaskan': 3739506,
           'White': 197318956}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [13]:
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000 / 1
race_per_hundredk #Use floor division to limit decimals

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

According to this, black people are most probable to be killed by guns. Their value is almost double of the other races.

## 6.- Use `enumerate()` to filter results by intent.

I will filter gun related deaths by the `Homicide` intent and combine the results with the census data.

In [14]:
intents = [row[3] for row in data]
homicide_rate_counts = {}
for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_rate_counts:
            homicide_rate_counts[race] += 1
        else:
            homicide_rate_counts[race] = 1
homicide_rate_per_hundredk = {}
for race in homicide_rate_counts:
    homicide_rate_per_hundredk[race] = homicide_rate_counts[race] / mapping[race] * 100000 /1
homicide_rate_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

It seems that black people are the number 1 gun related deaths by Homicide.

## 7.- Find link between homicide and month.

I will filter gun related deaths by the `Homicide` intent for each month to see if there's a strong relation.

In [15]:
homicide_rate_month_counts = {}
for i, month in enumerate(dates):
    if intents[i] == "Homicide":
        if month in homicide_rate_month_counts:
            homicide_rate_month_counts[month] += 1
        else:
            homicide_rate_month_counts[month] = 1
homicide_rate_month_counts

{datetime.datetime(2012, 1, 1, 0, 0): 972,
 datetime.datetime(2012, 2, 1, 0, 0): 749,
 datetime.datetime(2012, 3, 1, 0, 0): 966,
 datetime.datetime(2012, 4, 1, 0, 0): 999,
 datetime.datetime(2012, 5, 1, 0, 0): 1003,
 datetime.datetime(2012, 6, 1, 0, 0): 1044,
 datetime.datetime(2012, 7, 1, 0, 0): 1160,
 datetime.datetime(2012, 8, 1, 0, 0): 1090,
 datetime.datetime(2012, 9, 1, 0, 0): 1070,
 datetime.datetime(2012, 10, 1, 0, 0): 979,
 datetime.datetime(2012, 11, 1, 0, 0): 978,
 datetime.datetime(2012, 12, 1, 0, 0): 1083,
 datetime.datetime(2013, 1, 1, 0, 0): 986,
 datetime.datetime(2013, 2, 1, 0, 0): 721,
 datetime.datetime(2013, 3, 1, 0, 0): 923,
 datetime.datetime(2013, 4, 1, 0, 0): 916,
 datetime.datetime(2013, 5, 1, 0, 0): 955,
 datetime.datetime(2013, 6, 1, 0, 0): 1066,
 datetime.datetime(2013, 7, 1, 0, 0): 1137,
 datetime.datetime(2013, 8, 1, 0, 0): 1000,
 datetime.datetime(2013, 9, 1, 0, 0): 954,
 datetime.datetime(2013, 10, 1, 0, 0): 1009,
 datetime.datetime(2013, 11, 1, 0, 0): 9

It's as I suspected. There's a relation between summer break and an increase in gun related deaths by the homicide intent. It's strange that February is the month with least homicides.

## 8.- Create group_by() Function

Create a function that groups parameters by column and variable, to ease up on the analysis.

In [16]:
def group_by(column1, column2, value):
    
    list_column_1 = [row[column1] for row in data]
    list_column_2 = [row[column2] for row in data]
    return_dictionary = {}
    for i, k in enumerate(list_column_1):
        if k not in return_dictionary:
            return_dictionary[k] = 0
        if list_column_2[i] == value:
            return_dictionary[k] += 1
    
    if column1 == 7:
        race_per_hundredk = {}
        for i in list_column_1:
            race_per_hundredk[i] = return_dictionary[i] / mapping[i] * 100000 / 1
        return(race_per_hundredk)
    
    return return_dictionary

In [17]:
homicide_by_gender = group_by(5,3,"Homicide")
homicide_by_gender

{'F': 5373, 'M': 29803}

In [18]:
homicide_by_race = group_by(7,3,"Homicide")
homicide_by_race

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

In [19]:
accidental_by_gender = group_by(5,3,"Accidental")
accidental_by_gender

{'F': 218, 'M': 1421}

In [20]:
accidental_by_race = group_by(7,3,"Accidental")
accidental_by_race

{'Asian/Pacific Islander': 0.07578560782046845,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985,
 'Native American/Native Alaskan': 0.5883130017708221,
 'White': 0.5736904466492313}

## 9.- Create sort_by() Function

Create a function that sorts parameters by column, to expedite analysis.

In [21]:
def sort_by(column):
    
    list1 = [row[column] for row in data]
    return_dictionary = {}
    for value in list1:
        if value not in return_dictionary:
            return_dictionary[value] = 0
        return_dictionary[value] += 1
        
    return return_dictionary

In [22]:
gun_death_by_education = sort_by(10)
gun_death_by_education

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

It seems that people who don't graduate from college are more likely to participate in gun related deaths.

In [23]:
homicide_by_education = group_by(10,3,"Homicide")
homicide_by_education

{'1': 11839, '2': 15649, '3': 5640, '4': 1559, '5': 455, 'NA': 34}

In [24]:
suicide_by_education = group_by(10,3,"Suicide")
suicide_by_education

{'1': 9292, '2': 26321, '3': 15533, '4': 11147, '5': 876, 'NA': 6}

This is extremely important: Suicides are evenly distributed, but tend to increase in people that just graduetes from High School. What about school and race?

In [25]:
race_by_education = group_by(7,10,"1")
race_by_education

{'Asian/Pacific Islander': 1.2504625290377294,
 'Black': 18.8121255726773,
 'Hispanic': 8.734122616816649,
 'Native American/Native Alaskan': 7.059756021249865,
 'White': 5.013203090330561}

In [26]:
race_by_education = group_by(7,10,"2")
race_by_education

{'Asian/Pacific Islander': 2.3809311790263834,
 'Black': 26.54864948093366,
 'Hispanic': 7.185423943934867,
 'Native American/Native Alaskan': 10.48266803155283,
 'White': 14.32503018108407}

In [27]:
race_by_education = group_by(7,10,"3")
race_by_education

{'Asian/Pacific Islander': 2.18515169215684,
 'Black': 9.806056475879199,
 'Hispanic': 3.086191132501033,
 'Native American/Native Alaskan': 5.5354905166618265,
 'White': 8.008860537453888}

As suspected, most deaths by guns occur in blacks who only study up until high school.

In [28]:
gun_death_by_place = sort_by(9)
gun_death_by_place

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

This is interesting, almost all deaths are reported ocurring in Homes. What about intents?

In [29]:
gun_death_by_intent = sort_by(3)
gun_death_by_intent

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

Almost all deaths are either suicide or homicide. How much suicides occur in homes? What about race and suicide?

In [30]:
suicide_by_race = group_by(7,3,"Suicide")
suicide_by_race

{'Asian/Pacific Islander': 4.705023152187416,
 'Black': 8.278130270491385,
 'Hispanic': 7.106980451097149,
 'Native American/Native Alaskan': 14.841532544673013,
 'White': 28.06217969245692}

In [31]:
suicide_by_sex = group_by(5,3,"Suicide")
suicide_by_sex

{'F': 8689, 'M': 54486}

In [32]:
suicide_location = group_by(9,3,"Suicide")
suicide_location

{'Farm': 378,
 'Home': 45415,
 'Industrial/construction': 155,
 'NA': 0,
 'Other specified': 8031,
 'Other unspecified': 4774,
 'Residential institution': 125,
 'School/instiution': 295,
 'Sports': 98,
 'Street': 2181,
 'Trade/service area': 1723}

In [33]:
homicide_by_race = group_by(7,3,"Homicide")
homicide_by_race

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

In [34]:
homicide_by_sex = group_by(5,3,"Homicide")
homicide_by_sex

{'F': 5373, 'M': 29803}

In [35]:
homicide_location = group_by(9,3,"Homicide")
homicide_location

{'Farm': 76,
 'Home': 13613,
 'Industrial/construction': 85,
 'NA': 1383,
 'Other specified': 5429,
 'Other unspecified': 3604,
 'Residential institution': 73,
 'School/instiution': 361,
 'Sports': 22,
 'Street': 8867,
 'Trade/service area': 1663}

What about age?

In [36]:
gun_death_by_age = sort_by(6)
gun_death_by_age

{'0': 33,
 '1': 38,
 '10': 53,
 '100': 1,
 '101': 2,
 '102': 2,
 '107': 1,
 '11': 61,
 '12': 117,
 '13': 229,
 '14': 364,
 '15': 561,
 '16': 864,
 '17': 1185,
 '18': 1753,
 '19': 2065,
 '2': 50,
 '20': 2219,
 '21': 2504,
 '22': 2712,
 '23': 2472,
 '24': 2437,
 '25': 2230,
 '26': 2231,
 '27': 2070,
 '28': 1986,
 '29': 1955,
 '3': 66,
 '30': 1869,
 '31': 1833,
 '32': 1824,
 '33': 1700,
 '34': 1699,
 '35': 1631,
 '36': 1512,
 '37': 1500,
 '38': 1491,
 '39': 1389,
 '4': 54,
 '40': 1414,
 '41': 1485,
 '42': 1492,
 '43': 1527,
 '44': 1449,
 '45': 1372,
 '46': 1437,
 '47': 1532,
 '48': 1621,
 '49': 1669,
 '5': 43,
 '50': 1674,
 '51': 1755,
 '52': 1715,
 '53': 1708,
 '54': 1684,
 '55': 1596,
 '56': 1625,
 '57': 1472,
 '58': 1510,
 '59': 1430,
 '6': 50,
 '60': 1361,
 '61': 1306,
 '62': 1099,
 '63': 1041,
 '64': 1126,
 '65': 1039,
 '66': 998,
 '67': 865,
 '68': 868,
 '69': 879,
 '7': 43,
 '70': 883,
 '71': 791,
 '72': 736,
 '73': 737,
 '74': 671,
 '75': 676,
 '76': 582,
 '77': 575,
 '78': 598,
 

People ages from 19 to 32 are the most likely to be killed by guns. And lastly, what about the police?

In [37]:
gun_death_by_police = sort_by(4)
gun_death_by_police

{'0': 99396, '1': 1402}

In [38]:
1402 / total_deaths

0.01390900613107403

In [39]:
homicide_by_police = group_by(4,3,"Homicide")
homicide_by_police

{'0': 33774, '1': 1402}

In [40]:
race_by_police = group_by(7,4,"1")
race_by_police

{'Asian/Pacific Islander': 0.18946401955117112,
 'Black': 0.8844580961269306,
 'Hispanic': 0.6320304280067475,
 'Native American/Native Alaskan': 0.6685375020122979,
 'White': 0.3593167196769478}

Gun deaths where a police officer was involved with the shooting are considered Homicide. Why is that the case?

## 10.- Closing commentary

These are my findings after reviewing CDC data and doing simple analysis:

* Gun related deaths are consistent through the years.
* Gun related deaths increase during summer break.
* Gun related deaths decrease significantly during February.
* Almost all gun related deaths are black and white males.
* People who don't study up to college are most likely to be involved in gun related deaths.
* Suicide rates are evenly distributed by education.
* Almost all deaths occur in the victim's home.
* Most suicides occur in the victim's home.
* Most suicide victims are white males.
* Most homicide victims are black males.
* Most victim's age range from 19 to 32.
* The police involvement in gun related deaths is less than 1.5%
* Races least likely to die by the police involment in gun related deaths are Asian/Pacific Islander and Whites in that order.

Take aways:

* Increase police responsiveness during summer break.
* Increase awareness in schools about gun related deaths for races more likely to be involved.
* Increase suicide prevention efforts in all levels of education.
* Make homes safer.