We will be using Python's csv module to read the contents of the csv file as a nested list

In [19]:
import csv

In [20]:
file = open("guns.csv", "r")
csv_reader = csv.reader(file)
gun_deaths = list(csv_reader)
gun_deaths[:5] # Displaying the first 5 rows

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

In [21]:
# Removing first row from the nested list as that is the header and assigning it to header
header = gun_deaths[0]

# Dataset without the header
gun_deaths = gun_deaths[1:]

The year column contains information on the year in which the gun deaths occurred. We can use this column to calculate how many gun deaths happened each year. We can creat a dictionary to keep a count of how many times each element occurs in the year column

In [22]:
# Extract the unique values from year column using list comprehension
years = [row[1] for row in gun_deaths]

# Creating an empty dictionary
year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

                                        Exploring Gun Deaths by race, sex, intent

In [23]:
# Creating an empty dictionary to store the values for different gender
gender_counts = {}

for row in gun_deaths:
    if row[5] in gender_counts:
        gender_counts[row[5]] += 1
    else:
        gender_counts[row[5]] = 1
        
gender_counts

{'F': 14449, 'M': 86349}

In [24]:
# Creating an empty dictionary to store the values of gun deaths on the basis of race
race_counts = {}

for row in gun_deaths:
    if row[7] in race_counts:
        race_counts[row[7]] += 1
    else:
        race_counts[row[7]] = 1
        
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

                We write a more generic function to calculate the number of unique values from any column

In [25]:
def unique_counts(input_list, row_number):
    unique_value_counts = {}
    for row in input_list:
        if row[row_number] in unique_value_counts:
            unique_value_counts[row[row_number]] += 1
        else:
            unique_value_counts[row[row_number]] = 1
    return unique_value_counts

intent_counts = unique_counts(gun_deaths, 3)
intent_counts

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

In [27]:
# Similarly we can find the unique values for data in the place column
place_counts = unique_counts(gun_deaths, 9)
place_counts

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

Our analysis only gives us the total number of gun deaths by race in the US i.e percentage or number of gun deaths of a particular race as a factor of the total number of deaths of people belonging to all races.

Unless we know the proportion of each race in the US, we wont be able to meaningfully compare these numbers. What we really would want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category.

Lets read in data from the census.csv file

In [28]:
file = open("census.csv", "r")
csv_reader = csv.reader(file)
census = list(csv_reader)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

As we can see, the racial categories seem to different than the ones included in our gun_deaths dataset. Therefore, we manually create a dictionary that maps each value from our gun_deaths dataset to the population count from census dataset. We keep the key values similar to the racial categories that we have in the gun_deaths dataset

In [30]:
mapping = {"Asian/Pacific Islander":int(census[1][14])+int(census[1][15]), "Black":int(census[1][12]), "Native American/Native Alaskan":int(census[1][13]), "Hispanic":int(census[1][11]), "White":int(census[1][10])}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [31]:
# To calculate the rate of deaths per 100000 people of each race basedon the total population of each race, we begin with creating an empty dictionary and looping through the race_counts dictionary
race_per_hundredk = {}

for key, value in race_counts.items():
    race_per_hundredk[key] = (race_counts[key] * 100000) / mapping[key]

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.877347773519595,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

We can further explore the dataset and combine values from multiple columns as conditioning parameter such as:
1. The number of gun deaths based on race column, the intent of which was homicide
2. The number of gun deaths based on age parameter
3. If there is any link between homicide rate and month
4. Exploring homicide rate by gender
5. Exploring rates of other intents such as Accidental, by gender and race
6. If gun death rates correlate to education and location