## Exploring US Gun Deaths

In this project, I will be exploring gun deaths in the US. The data was taken from the [FiveThirtyEight GitHub](https://github.com/fivethirtyeight/guns-data) and was utilised for their [Gun Deaths in America project](https://fivethirtyeight.com/features/gun-deaths/). The data is a cleaned version of the gun-deaths dataset for 2012-2014 from the CDC. 


### Opening the data

In [1]:
#Data will read as a list of lists
#Each inner list is a single row in the csv file
import csv
with open('guns.csv','r') as f:
    file=csv.reader(f)
    data=list(file)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


### Separating headers from the data

In [2]:
#headers are contained in only the first list 
headers=data[:1]
#data is contained in the remaining lists
data=data[1:]
print (headers)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


### Counting gun deaths per year

In [3]:
#creating empty list to hold the year from each row in the data
years=[]
#looping through the data to append the year into the years list
for row in data:
    years.append(row[1])

#creating an empty dictionary to hold the year counts
year_counts={}
#counting the times a year appears in the years list
for year in years:
    if year in year_counts:
        year_counts[year]=year_counts[year] + 1
    else:
        year_counts[year]=1

In [4]:
print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


There does not seem to be any variation between the number of deaths in the years 2012, 2013 and 2014. 
Moreover, the number of deaths in 2013 and 2014 are higher than 2012, but not significantly.

### Gun deaths by month and year

In [5]:
import datetime
dates=[]
#looping through each row to extract the year and month 
#storing the date in the dates list, with day set as 1 for any given date
#we are only exploring month and year, so day is set to 1
for row in data:
    date= datetime.datetime(year=int(row[1]) , month= int(row[2]), day=1)
    dates.append(date)
print(dates[:5])    
                   

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


In [6]:
date_counts={}
#counting the times a month and year date appears in the dates list 
for date in dates:
    if date in date_counts:
        date_counts[date]=date_counts[date]+1
    else:
        date_counts[date]=1


In [7]:
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

February consistenly has been the month with the lowest deaths whilst August or July have been the months with the highest deaths. There does seem to be some seasonal correlation with less gun deaths during winter and more gun deaths during summer.

### Gun deaths by race and sex

In [8]:
sex_counts={}

for row in data:
    if row[5] in sex_counts:
        sex_counts[row[5]]=sex_counts[row[5]] + 1
    else:
        sex_counts[row[5]]= 1
print(sex_counts)

{'M': 86349, 'F': 14449}


In [9]:
race_counts={}

for row in data:
    if row[7] in race_counts:
        race_counts[row[7]]=race_counts[row[7]] + 1
    else:
        race_counts[row[7]] = 1
        
race_counts

{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}

There are a lot more male gun deaths than female. With almost 6 times more male deaths than female. 

Mostly white race and black race people have faced gun deaths, but looking at census data would help us to put these numbers into perspective by understanding the demographics in the US population.

## Reading in a census dataset

In [10]:
with open('census.csv','r') as f:
    reader=csv.reader(f)
    census=list(reader)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

### Rates of gun deaths per race

In [11]:
#creating a dictionary of race counts from census data
mapping={ 'Asian/Pacific Islander':674625+15159516, 
         'Black': 40250635, 
         'Native American/Native Alaskan':3739506, 
         'Hispanic':44618105, 
         'White':197318956}

#putting the deaths in perspective of 100K of the race in the US
races_per_hundredk={}

for key, value in race_counts.items():
    percentage= value/mapping[key]
    mapping_per100k=percentage*100000
    races_per_hundredk[key]=mapping_per100k
races_per_hundredk
    


{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}

When put into perspective of the demographic of the US, the gun deaths rate was highest for black racial category.

### Gun deaths by intent

In [12]:
#filtering for homocide intents
intents=[row [3] for row in data]
races=[row[7] for row in data]
homicide_race_counts={}

for i, race in enumerate(races):
    if intents[i] =='Homicide':
        if races[i] not in homicide_race_counts:
            homicide_race_counts[races[i]] = 0
        homicide_race_counts[races[i]] += 1    

homicide_race_counts_per_100k={}
for key, value in homicide_race_counts.items():
    percentage=value/mapping[key]
    homicide_100k=percentage*100000
    homicide_race_counts_per_100k[key]=homicide_100k
homicide_race_counts_per_100k


{'White': 4.6356417981453335,
 'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Native American/Native Alaskan': 8.717729026240365,
 'Hispanic': 12.627161104219914}

When filtering for homocide deaths only, there is a even more significant proportion of gun homicides per 100k in Black and Hispanic races. 