# Analysis of Gun Deaths in the U.S.

The dataset used here comes from [FiveThirtyEight](https://www.fivethirtyeight.com) and is available [on GitHub](https://github.com/fivethirtyeight/guns-data). Each row represents a single fatality, and consists of:
* row number
* `year`
* `month`
* `intent`
* `police`
* `sex`
* `age`
* `race`
* `hispanic`
* `place`
* `education`

project source: [DataQuest](https://www.dataquest.io)

In [1]:
import csv

# read in our data to a list
f = open("guns.csv", 'r')
d = csv.reader(f)
data = list(d)

# confirm dataset read in by printing first 5 rows
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

In [2]:
# assign headers to a variable
# and remove row from our data
headers = data[0]
data = data[1:]

# confirm removal of header
data[:5]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  '2']]

In [3]:
# extract years column from data
years = [row[1] for row in data]

# tally fatalities per year
year_counts = dict()
for year in years:
    count = year_counts.get(year, 0)
    year_counts[year] = count + 1

# verify population of year_counts
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

## Analyzing gun deaths by month
Note that, in the above, gun deaths varied little year-to-year. We will proceed by utilizing `datetime` objects to perform a month-to-month analysis.

In [4]:
import datetime

# using list comprehension, make a list of
# datetime objects for all fatalities
dates = [datetime.datetime(year=int(row[1]),
                           month=int(row[2]),
                           day=1)
         for row in data]

# verify list of datetime objects
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [5]:
# count the occurrences of each date in the dataset
date_counts = dict()
for date in dates:
    count = date_counts.get(date, 0)
    date_counts[date] = count + 1

# confirm population of date_counts
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

In [6]:
# count the occurrences of each sex & race in the dataset
sex_counts = dict()
race_counts = dict()

for row in data:
    sex = row[5]
    race = row[7]
    
    count = sex_counts.get(sex, 0)
    sex_counts[sex] = count + 1
    
    count = race_counts.get(race, 0)
    race_counts[race] = count + 1

In [7]:
# confirm population of sex counts
sex_counts

{'F': 14449, 'M': 86349}

In [8]:
# confirm population of race counts
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

## Thoughts
February appears to be the month associated with fewest fatalities. Most victims of gun deaths are male, as males make up 86% of all gun deaths. By race, whites make up the largest number of gun deaths.

Cross-referencing race from census datasets could make for a more exhaustive analysis, leading to a per-capita analysis of gun deaths. Subsets for `intent` could be included to find if there is a correlation to any features included in the dataset.

In [9]:
f = open("census.csv", 'r')
d = csv.reader(f)
census = list(d)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [10]:
# map census totals to each of the keys from race_counts
# NOTE: Asian/Pacific Islander includes the census counts
#       for 'Race Alone - Asian' and 'Race Alone - Native
#       Hawaiian and Other Pacific Islander'
mapping = dict()
mapping["Asian/Pacific Islander"] = census[1][14] + census[1][15]
mapping["Black"] = census[1][12]
mapping["Native American/Native Alaskan"] = census[1][13]
mapping["Hispanic"] = census[1][11]
mapping["White"] = census[1][10]

mapping

{'Asian/Pacific Islander': '15159516674625',
 'Black': '40250635',
 'Hispanic': '44618105',
 'Native American/Native Alaskan': '3739506',
 'White': '197318956'}

In [11]:
# find gun deaths per 100-thousand of each population
race_per_hundredk = dict()
for race in race_counts:
    gun_deaths = int(race_counts[race])
    population = int(mapping[race])
    race_per_hundredk[race] = gun_deaths / population * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.746980714890115e-06,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

In [12]:
intents = [row[3] for row in data]
intents

['Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Accidental',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Homicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homicide',
 'Homicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Accidental',
 'Homicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Homicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Accidental',
 'Suicide',
 'Accidental',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Suicide',
 'Suicide',
 'Homi

In [13]:
races = [row[7] for row in data]
races

['Asian/Pacific Islander',
 'White',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Black',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Black',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Native American/Native Alaskan',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'Native American/Native Alaskan',
 'Native American/Native Alaskan',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'Asian/Pacific Islander',
 'White',
 'White',
 'Native Ame

In [15]:
homicide_race_counts = dict()

for i, race in enumerate(races):
    if intents[i] == "Homicide":
        count = homicide_race_counts.get(race, 0)
        homicide_race_counts[race] = count + 1

homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [16]:
# find gun homicide deaths per 100-thousand of each population
homicides_per_hundredk = dict()
for race in homicide_race_counts:
    homicide_deaths = int(homicide_race_counts[race])
    population = int(mapping[race])
    homicides_per_hundredk[race] = homicide_deaths / population * 100000
    
homicides_per_hundredk

{'Asian/Pacific Islander': 3.687452654316421e-06,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

## Further Exploration
Given that our CDC data includes whether or not officers were "involved" in a gun death, I'd be interested in seeing what `per_hundredk` results we end up with after including that additional feature of the CDC dataset.

Furthermore, it may be interesting to explore, amongst all the intents available to us, which are more commonly associated with any of the other features.