# US gun deaths data for a period of 2012 to 2014
The dataset is used from __[fivethirtyeight](https://github.com/fivethirtyeight/guns-data)__ and stored in guns.csv file.<br>
<div class="alert alert-block alert-info">Below analysis is done using basic Python. Analysis will reveal following insights from the data:<br>
- Total death counts with respect to particular categorial attribute. For e.g total gun shot deaths counts of Female
in given three years of data.
- %(percentage) of deaths in a particular race filtered with intent/gender. Same function can be extended to be used for  other categories.

Note __gun.csv__ data is already clean. So, not added any step to clean the data.
 
</div>
##### The details of each field in the data set is given below
   __--__ -- this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.<br>
   __year__-- the year in which the fatality occurred.<br>
   __month__-- the month in which the fatality occurred.<br>
   __intent__-- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.<br>
   __police__-- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).<br>
   __sex__-- the gender of the victim. Either M or F.<br>
   __age__ -- the age of the victim.<br>
   __race__ -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.<br>
   __hispanic__ -- a code indicating the Hispanic origin of the victim.<br>
   __place__ -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.<br>
   __education__ -- educational status of the victim. Can be one of the following:

    1 -- Less than High School
    2 -- Graduated from High School or equivalent
    3 -- Some College
    4 -- At least graduated from College
    5 -- Not available



### Loading US gun deaths data between 2012 to 2014

In [3]:
import csv

file = open("guns.csv")
data = list(csv.reader(file))
data[0:2]


[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4']]

### Deleting first column from the data as it is just the counter of rows in gun.csv file

In [4]:
subdata = data
subdata = [sublist[slice(1,12)] for sublist in subdata]
subdata[0:3]

[['year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']]

### Analysis on total gun deaths by year / sex / race

In [14]:
#Function to get total death counts on a particular categorial attribute.

def cloumnWiseDeathCount(data, column):
    column_counts = {}
    for row in data:
        if row[column] in column_counts:
            column_counts[row[column]] += 1
        else:
            column_counts[row[column]] = 1
    return(column_counts)

#year wise death counts
year_death_counts = cloumnWiseDeathCount(subdata, 0)

#gender wise death counts
sex_counts = cloumnWiseDeathCount(subdata, 4)

#race wise death counts
race_counts = cloumnWiseDeathCount(subdata, 6)

print(year_death_counts)
print(race_counts)
print(sex_counts)


print(sex_counts["M"] + sex_counts["F"])


{'2012': 33563, '2013': 33636, '2014': 33599}
{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}
{'M': 86349, 'F': 14449}
100798


#### Findings in above results
__Year Analysis:__ Gun shot deaths between 2012 to 2014 were almost same.<br>

__Gender Analysis:__ The proportion between male,female gun shot deaths had big difference. Male gun shot deaths are neary 6 times more than female gun shot deaths. Big concern!<br>

__Race Analysis:__ Gun shot deaths also varies across different races. Both "White" and "Black" race peoples are disproportionately affected compared to other races.<br>

    "census.csv" file has total US population across different races between 2012 to 2014. It might be helpful to gather more info across various races filtering with respect to other category. Like, %(percentage) of female deaths across all races.


### Loading census.csv containing total US polutaion across different races between 2012 to 2014

In [6]:
f = open("census.csv")
census = list(csv.reader(f))
census[0:]


[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

# US gun death rates on races with respect to intent/gender

Each races gun death rate on a particular intent or gender is calculated.
This shows filtering particular intent or gender deaths in a particular race.


# Improvements
Made functions to make calculations generic with races to any column.


__Steps:__
1. Call __getRaceDataWithRespectToPassedColumn__ by passing column number of races and respective column. For e.g 6 is the index of races info and 4 is the index of gender.(refer subdata)
2. Above step witll give list of lists having race and corresponding gender in each row.
3. Call __racesCountWithRespectToOtherCol__ with list of lists created above and "F"/"M" (gender) info.
__(above we created lists of lists with respect to gender column)__
4. With above step we have dictionary ready with count info of each races with respect to "F" or "M" used.
5. Call __per_hundredk_rates__ passing dictionary created above and mapping data.

In [8]:
# Mapping between two data set files. Names in gun.csv mapped to race count data in census.csv.
# Here mapping dictionary contains total population per race in US.

mapping = {}

for row in race_counts:
    print(row)
    if row == "Black":
        mapping[row] = int(census[1][12])
    elif row == "Hispanic":
        mapping[row] = int(census[1][11])
    elif row == "Native American/Native Alaskan":
        mapping[row] = int(census[1][13])
    elif row == "Asian/Pacific Islander":
        mapping[row] = int(census[1][14]) + int(census[1][15])
    elif row == "White":
        mapping[row] = int(census[1][10])
mapping

Asian/Pacific Islander
White
Native American/Native Alaskan
Black
Hispanic


{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

### Following is the rate of deaths per 100k in a particular race

In [13]:
#Having different races total population details in US between 2012 to 2014. 

def per_hundredk_rates(onData, totalPopulationData):
    per_hundredk = {}
    for row in onData:
        val = (onData[row]/totalPopulationData[row]) * 100000
        per_hundredk[row] = val
    return(per_hundredk)
race_per_hundredk = per_hundredk_rates(onData=race_counts, totalPopulationData = mapping)
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

In [10]:
#List of lists containing race, particularColumnData(intent/gender/or any)

def getRaceDataWithRespectToPassedColumn(raceColumn, withRespectToColumn):
    list_races_data = [[row[raceColumn], row[withRespectToColumn]] for row in subdata]
    return(list_races_data)

#list_races_intent is list of lists [race, intent]
list_races_intent = getRaceDataWithRespectToPassedColumn(6,2)

#list_races_gender is the list of lists [race, gender]
list_races_gender = getRaceDataWithRespectToPassedColumn(6,4)

print(list_races_intent[0:5])
print(list_races_gender[0:5])

[['Asian/Pacific Islander', 'Suicide'], ['White', 'Suicide'], ['White', 'Suicide'], ['White', 'Suicide'], ['White', 'Suicide']]
[['Asian/Pacific Islander', 'M'], ['White', 'F'], ['White', 'M'], ['White', 'M'], ['White', 'M']]


### Filtering deaths gender wise/intent wise in a particular race. 


In [15]:
def racesCountWithRespectToOtherCol(race_col_list, withRespectToColToRefer):
    race_counts = {}
    for row in race_col_list:
        if row[1] == withRespectToColToRefer:
            if row[0] in race_counts:
               race_counts[row[0]] += 1
            else:
                race_counts[row[0]] = 1
    return(race_counts)

print(racesCountWithRespectToOtherCol(list_races_intent, "Homicide"))
print(racesCountWithRespectToOtherCol(list_races_gender, "F"))

# Similarly, function can be used for male deaths too.
# Firth param is a lists of lists with [race, particular column] with which we wanted to get data

{'White': 9147, 'Asian/Pacific Islander': 559, 'Black': 19510, 'Native American/Native Alaskan': 326, 'Hispanic': 5634}
{'White': 10687, 'Black': 2318, 'Native American/Native Alaskan': 127, 'Hispanic': 1073, 'Asian/Pacific Islander': 244}


### Gun shot death %(percentage) of female per 100k rates of race

In [29]:
per_hundredk_rates(racesCountWithRespectToOtherCol(list_races_gender, "F"), mapping)

{'Asian/Pacific Islander': 1.5409740256828584,
 'Black': 5.7589153562422055,
 'Hispanic': 2.404853366139149,
 'Native American/Native Alaskan': 3.3961705102224733,
 'White': 5.416104066555065}

### Findings:
__White__ and __Black__ females are disproportionately affected by gun shot deaths.

### Filtering location wise gun shot deaths across years

In [51]:
#funtion to get location wise deaths across diferent years

def particularLocationDeathsInAYear(yearLocationList, location):
    yearLocationCounts = {}
    for row in yearLocationList:
        if row[1] == location:
            if row[0] in yearLocationCounts:
                yearLocationCounts[row[0]] += 1
            else:
                 yearLocationCounts[row[0]] = 1
    return(yearLocationCounts)

particularLocationDeathsInAYear([[r[0], r[8]] for r in subdata], 'Home')

      
        

{'2012': 19733, '2013': 20092, '2014': 20661}

### Findings:
Above data reveals home deaths in every year. Home deaths keep increasing every year.<br>

Some areas to invstigate further:<br>
- Location having highest deaths<br>
- Gun deaths by education and race<br>