# Exploring Gun Deaths in the US

The dataset comes from FiveThirtyEighty and can be found [here](https://github.com/fivethirtyeight/guns-data).

It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here is an explanation of each column:

* `--` - identifier column containing the row number
* `year` - the year in which the death occured
* `month` - the month in which the death occured
* `intent` - the intent of the perpetrator of the crime. It can be `Suicide`, `Accidental`, `NA`, `Homicide`, or `Undetermined`
* `police` - whether a police officer was involved in the shooting. `0` for false and `1` for true
* `sex` - the gender of the victim. `M` for male and `F` for female
* `race` - the race of the victim. Either `Asian/Pacific Islander`, `Native American/Native Alaskan`, `Black`, `Hispanic`, or `White`
* `hispanic` - a code indicating the Hispanic origin of the victim
* `place` - where the shooting occured. Either `Home`, `Street`, or `Other specified`
* `education` - educational status of the victim. Can be one of the following:
    * `1` - Less than High School
    * `2` - Graduated from High School or equivalent
    * `3` - Attended College
    * `4` - At least graduated from College
    * `5` - Not available
    
**The objective**: The objective is to find patterns in the demographics of the victims.

**Techniques used**
* Working with CSV files
* Dictionary mapping
* Working with rates to reduce bias

## Introducing the data

In [1]:
import csv
f = open("guns.csv", "r")
data = list(csv.reader(f))

In [2]:
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


## Removing the header

In [3]:
headers = data[0]
data = data[1:]
print(headers)
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Total gun deaths by year

In [7]:
years = [ row[1] for row in data ]

year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

## Gun deaths by month and year

Since gun deaths do not change much by year, an analysis by month will be performed. Note that a fixed day will be specified for the analysis, since the information is not provided in the dataset.

In [8]:
import datetime

dates = [ datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data ]

dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [9]:
date_counts = {}

for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
        
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Gun deaths by sex and race

In [12]:
sexes = [ row[5] for row in data ]
races = [ row[7] for row in data ]

sex_counts = {}
race_counts = {}

for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
        
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
        
print(sex_counts)
print(race_counts)

{'F': 14449, 'M': 86349}
{'Native American/Native Alaskan': 917, 'Black': 23296, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022, 'White': 66237}


As observed, the majority of the victims are white males. Knowing the proportion of each race in the US would make the analysis above more meaningful. Furthermore, gun deaths are higher during the summer period. 

Further analysis could be peformed using the intent and educational status. Also, it would be interesting to analyze the presence of the police with the race of the victim.

## Importing census data from 2010

The following dataset contains information about the total US population and the total population of each racial group.

In [13]:
import csv
f = open("census.csv", "r")
census = list(csv.reader(f))

census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Rates of gun deaths per race

Since the race names are not identical in both datasets, a dictionary must be constructed manually.

The rates are expressed as death per 100 000 people.

In [14]:
mapping = {
    "Asian/Pacific Islander": 15834141,
    "Black": 40250635,
    "Native American/Native Alaskan": 3739506,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}

for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Filtering by intent (homicide)

In [15]:
intents = [ row[3] for row in data]

homicide_race_counts = {}

for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1
        
race_per_hundredk = {}

for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

**Conclusion**: As observed, gun death related to homicides greatly affect Black and Hispnaic people.

**Further Work**: Analyze gun death according to where it occured, or according to the educational status of the victim. Also, it would be interesting to see if there is a correlation between the race, the educational status and the place where the death occured.