The dataset came from [FiveThirtyEight](http://fivethirtyeight.com/), and can be found [here](https://github.com/fivethirtyeight/guns-data). The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here are the first few rows of the dataset:

## Introducing Gun Deaths Data

In [1]:
import csv
f= open("guns.csv", "r")
data= csv.reader(f)
data=list(data)
print(data[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [2]:
header5= data[0]
data=data[1:]
print(header5)
print(data[0:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Counting Gun Deaths by Year

In [3]:
years= [each[1] for each in data]
#print(years)
year_counts={}
for each in years:
    if each in year_counts:
        year_counts[each]= year_counts[each]+1
    else:
        year_counts[each]=1

print(year_counts)

{'2012': 33563, '2014': 33599, '2013': 33636}


It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year.

## Gun Deaths by Month and Year

In [6]:
import datetime
dates= [datetime.datetime(year=int(each[1]), month=int(each[2]), day=1) for each in data]
print(dates[0:5])
date_counts={}
for each in dates:
    if each in date_counts:
        date_counts[each] = date_counts[each]+1
    else:
        date_counts[each]=1

print(date_counts)

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]
{datetime.datetime(2014, 7, 1, 0, 0): 2884, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2012, 11, 1, 0, 0): 2729, datetime.datetime(2012, 9, 1, 0, 0): 2852, datetime.datetime(2014, 8, 1, 0, 0): 2970, datetime.datetime(2013, 9, 1, 0, 0): 2742, datetime.datetime(2013, 12, 1, 0, 0): 2765, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2014, 10, 1, 0, 0): 2865, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2012, 7, 1, 0, 0): 3026, datetime.datetime(2012, 8, 1, 0, 0): 2954, datetime.datetime(2014, 9, 1, 0, 0): 2914, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2012, 10, 1, 0, 0): 2733, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2012, 1, 1, 0, 0)

## Exploring death counts by Sex and Race

In [7]:
sex_counts={}
race_counts={}
sex= [each[5] for each in data]
race= [each[7] for each in data]
for each in sex:
    if each in sex_counts:
        sex_counts[each]= sex_counts[each]+1
    else:
        sex_counts[each]=1

for each in race:
    if each in race_counts:
        race_counts[each]= race_counts[each]+1
    else:
        race_counts[each]=1
        
print(sex_counts)
print(race_counts)

{'M': 86349, 'F': 14449}
{'Hispanic': 9022, 'Black': 23296, 'White': 66237, 'Asian/Pacific Islander': 1326, 'Native American/Native Alaskan': 917}


Gun deaths in the US seem to disproportionately affect men vs women. They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall US population would help.

There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter. It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.

## Reading a second dataset

In [8]:
f1= open("census.csv", "r")
census= list(csv.reader(f1))
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


## Computing Rates Of Gun Deaths Per Race

In [12]:
mapping={
    "Asian/Pacific Islander":15834141,
    "Black": 40250635,
    "Native American/Native Alaskan":3739506,
    "Hispanic":44618105,
    "White":197318956
}

race_per_hundredk={}
for each in race_counts:
    race_per_hundredk[each]= (race_counts[each]/mapping[each])*100000

print(race_per_hundredk)

{'Hispanic': 20.220491210910907, 'Native American/Native Alaskan': 24.521955573811088, 'White': 33.56849303419181, 'Asian/Pacific Islander': 8.374309664161762, 'Black': 57.8773477735196}


## Filtering by Intent

In [14]:
intents= [each[3] for each in data]
races= [each[7] for each in data]

homicide_race_counts={}

for i,race in enumerate(races):
    if intents[i]=="Homicide":
        if race in homicide_race_counts:
            homicide_race_counts[race]= homicide_race_counts[race]+1
        else:
            homicide_race_counts[race]=1
            

homicide_race_per_hundredk={}
for each in homicide_race_counts:
    homicide_race_per_hundredk[each]= (homicide_race_counts[each]/mapping[each])*100000

print(homicide_race_counts)

{'Hispanic': 5634, 'Native American/Native Alaskan': 326, 'White': 9147, 'Asian/Pacific Islander': 559, 'Black': 19510}


In [15]:
months= [each[2] for each in data]

homicide_month_counts={}
for i,month in enumerate(months):
    if intents[i]=="Homicide":
        if month in homicide_month_counts:
            homicide_month_counts[month]= homicide_month_counts[month]+1
        else:
            homicide_month_counts[month]=1
            
print(homicide_month_counts)

{'12': 3191, '10': 2968, '08': 3125, '04': 2845, '01': 2829, '06': 3130, '05': 2976, '09': 2966, '03': 2780, '02': 2178, '07': 3269, '11': 2919}


In [16]:
gender= [each[5] for each in data]

homicide_gender_counts={}
for i,g in enumerate(gender):
    if intents[i]=="Homicide":
        if g in homicide_gender_counts:
            homicide_gender_counts[g]= homicide_gender_counts[g]+1
        else:
            homicide_gender_counts[g]=1
            
print(homicide_gender_counts)

{'M': 29803, 'F': 5373}


In [18]:
accidental_gender_counts={}
accidental_race_counts={}

for i,g in enumerate(gender):
    if intents[i]=="Accidental":
        if g in accidental_gender_counts:
            accidental_gender_counts[g]= accidental_gender_counts[g]+1
        else:
            accidental_gender_counts[g]=1
            
for i,race in enumerate(races):
    if intents[i]=="Accidental":
        if g in accidental_race_counts:
            accidental_race_counts[race]= accidental_race_counts[race]+1
        else:
            accidental_race_counts[race]=1
            
print(accidental_gender_counts)
print(accidental_race_counts)

{'M': 1421, 'F': 218}
{'Hispanic': 1, 'Native American/Native Alaskan': 1, 'White': 1, 'Asian/Pacific Islander': 1, 'Black': 1}


In [20]:
places= [each[9] for each in data]
education= [each[10] for each in data]

place_counts={}
education_counts={}

for each in places:
    if each in place_counts:
        place_counts[each] = place_counts[each]+1
    else:
        place_counts[each]=1
        
for each in education:
    if each in education_counts:
        education_counts[each] = education_counts[each]+1
    else:
        education_counts[each]=1

print(place_counts)
print(education_counts)

{'Industrial/construction': 248, 'Sports': 128, 'Street': 11151, 'Trade/service area': 3439, 'Other specified': 13751, 'Home': 60486, 'NA': 1384, 'Farm': 470, 'Other unspecified': 8867, 'School/instiution': 671, 'Residential institution': 203}
{'4': 12946, '3': 21680, 'NA': 53, '2': 42927, '1': 21823, '5': 1369}


## Findings:
It appears that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.

Some areas to investigate further:

- The link between month and homicide rate.
- Homicide rate by gender.
- The rates of other intents by gender and race.
- Gun death rates by location and education.