# 00 - Optional homework

For this initial homework we will be working with a [dataset](https://github.com/fivethirtyeight/guns-data/blob/master/interactive_data.csv) available as a part of an interesting analysis of [gunshot deaths in the US](http://fivethirtyeight.com/features/gun-deaths/). The goal of this optional homework is to carefully go through the interactive visualization portrayed at the top of the aforementioned article, and use an IPython Notebook to reproduce the following claims made in the visualization:
- Nearly *two-thirds* of gun deaths are *suicides*.
- More than *85 percent* of suicide victims are *male*.
- Around *a third* of all gun deaths are *homicides*.
- Around *two-thirds* of homicide victims who are *males* in the *age-group of 15--34* are *black*.
- *Women* constitue only *15 percent* of the total *homicide* victims.

It's not necessary to generate visualizations for the results -- numbers should be more than enough to convince yourself that you 
were able to reproduce the results of that article.

You can use this opportunity first of all to refresh your Python skills. If you are coming from another programming language
(especially a static PL like Java and C++), we recommend you to take a look at this presentation:
[Code Like a Pythonista: Idiomatic Python](http://www.omahapython.org/IdiomaticPython.html) -- it will teach
you how to write nice Python code, while at the same time getting you up to speed with the syntax.
Feel free to explore more advanced libraries (like [Pandas](http://pandas.pydata.org/)) if you really want, but keep in mind that you
should be able to reproduce the results with the Python Standard Library.
One advantage of using only the PSL is that once you will get knowledgeable about Pandas you will appreciate how much more concise
and readable your code will become :)

Credits to [Michele Catasta](https://github.com/pirroh), on whose material this version is based.


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
import json

In [12]:
df = pd.read_csv(r'./data/interactive_data.csv')
df.columns.values.tolist()

['Unnamed: 0',
 'Intent',
 'Gender',
 'Age',
 'Race',
 'Deaths',
 'Population',
 'Rate']

In [27]:
total_death_number = df[(df['Intent'] == 'None selected') & (df['Gender'] == 'None selected')&
                        (df['Age'] == 'None selected') & (df['Race'] == 'None selected')]['Deaths'].sum()
print('More than %s people are fatally shot every year.'%total_death_number)

More than 33599 people are fatally shot every year.


In [17]:
print('......')  #don't see the special reason of death......

......


In [29]:
suicide_number = df[(df['Intent'] == 'Suicide') & (df['Gender'] == 'None selected')&
                        (df['Age'] == 'None selected') & (df['Race'] == 'None selected')]['Deaths'].sum()
ratio = 100*suicide_number/total_death_number
print('%s percent of the gun death are suicides.'%ratio)

62.67448435965356 percent of the gun death are suicides.


In [30]:
male_suicide_number = df[(df['Intent'] == 'Suicide') & (df['Gender'] == 'Male')&
                        (df['Age'] == 'None selected') & (df['Race'] == 'None selected')]['Deaths'].sum()
ratio = 100*male_suicide_number/suicide_number
print('%s percent of the suicide victims are male'%ratio)

86.24750688574413 percent of the suicide victims are male


In [33]:
male_old_suicide_number =df[(df['Intent'] == 'Suicide') & (df['Gender'] == 'None selected')&
                        ((df['Age'] == '35 - 64') |(df['Age'] == '65+')) & (df['Race'] == 'None selected')]['Deaths'].sum()
ratio = 100*male_old_suicide_number/suicide_number
print('...and %s percent of all suicides are men age 35 or older'%ratio)

...and 75.29679931617437 percent of all suicides are men age 35 or older


In [41]:
def counting_sheep(intent = 'None selected', gender = 'None selected',
                   age = 'None selected', race = 'None selected'):
    group_number = df[(df['Intent'] == intent) & (df['Gender'] == gender)&
                        (df['Age'] == age) & (df['Race'] == race)]['Population'].sum()
    group_death_number = df[(df['Intent'] == intent) & (df['Gender'] == gender)&
                        (df['Age'] == age) & (df['Race'] == race)]['Deaths'].sum()
    prediction = 100000 * group_death_number/group_number
    print('%s gun deaths, %.1f deaths per 100000 people.'%(group_death_number, prediction))
    
counting_sheep(gender = 'Male', race = 'Black')

6993 gun deaths, 37.7 deaths per 100000 people.
