# Probability Converter
This script converts the `occupational_fatalities.csv` dataset into an index of liklihoods to die by each profession, then joins on the `deaths_age_gender_race_mechanism_cause.csv` dataset by mechanism of death. Next, the full dataset is converted into daily probabilities of dying by age , gender, race, and occupation by means of modeling the daily probabilities via linear models connecting each age, in years.

In [95]:
import pandas as pd
import numpy as np

In [96]:
job_deaths = pd.read_csv("../data/occupational_fatalities.csv")

In [97]:
job_deaths.head()

Unnamed: 0,Occupation,Hierarchy Levels,variable,value
0,Total,0,Total fatal injuries (number),5250
1,Total,0,Transportation incidents,2080
2,Transportation and material moving occupations,0,Total fatal injuries (number),1443
3,Motor vehicle operators,1,Total fatal injuries (number),1044
4,Transportation and material moving occupations,0,Transportation incidents,1014


In [98]:
job_deaths.shape

(4361, 4)

Because there are 4 levels in the hierarchy and the user experience will be poor if they have to search through too many job titles to find something close to theirs (a mentally taxing task), selecting the right level is paramount. To aid in this selection, looking at the volume and degree of detail should help. Level 3 is the most detailed so we'll start there and work our way up:

In [99]:
job_deaths.Occupation[job_deaths['Hierarchy Levels']==3].value_counts()

 Insurance sales agents                                                       7
 Rotary drill operators, oil and gas                                          7
 Education administrators, postsecondary                                      7
 Pump operators, except wellhead pumpers                                      7
 Pipelayers                                                                   7
 Cashiers                                                                     7
 Wellhead pumpers                                                             7
 Floor sanders and finishers                                                  7
 Reinforcing iron and rebar workers                                           7
 Gaming managers                                                              7
 Cabinetmakers and bench carpenters                                           7
 Financial analysts                                                           7
 Animal trainers                        

275 options is clearly too many. 

In [100]:
job_deaths.Occupation[job_deaths['Hierarchy Levels']==2].value_counts()

 Heavy vehicle and mobile equipment service technicians and mechanics     7
 Miscellaneous motor vehicle operators                                    7
 Secondary school teachers                                                7
 Lodging managers                                                         7
 Reinforcing iron and rebar workers                                       7
 Real estate brokers and sales agents                                     7
 Helpers, construction trades                                             7
 Property, real estate, and community association managers                7
 Writers and editors                                                      7
 Biological scientists                                                    7
 Social and community service managers                                    7
 Food service managers                                                    7
 Miscellaneous computer occupations                                       7
 Power plant

237 is still way too many.

In [101]:
job_deaths.Occupation[job_deaths['Hierarchy Levels']==1].value_counts()

 Supervisors of sales workers                                                       7
 Fire fighting and prevention workers                                               7
 Entertainers and performers, sports and related workers                            7
 Air transportation workers                                                         7
 Other transportation workers                                                       7
 Helpers, construction trades                                                       7
 Life, physical, and social science technicians                                     7
 Supervisors of construction and extraction workers                                 7
 Media and communication workers                                                    7
 Assemblers and fabricators                                                         7
 Material moving workers                                                            7
 Computer occupations                                 

87 is getting more reasonable, but still annoying.

In [102]:
job_deaths.Occupation[job_deaths['Hierarchy Levels']==0].value_counts()

 Construction and extraction occupations                        7
 Management occupations                                         7
 Transportation and material moving occupations                 7
 Office and administrative support occupations                  7
 Military specific occupations(5)                               7
 Building and grounds cleaning and maintenance occupations      7
 Computer and mathematical occupations                          7
 Protective service occupations                                 7
 Education, training, and library occupations                   7
 Production occupations                                         7
 Food preparation and serving related occupations               7
 Business and financial operations occupations                  7
 Community and social services occupations                      7
 Personal care and service occupations                          7
 Healthcare support occupations                                 7
 Healthcar

In [103]:
len(job_deaths.Occupation[job_deaths['Hierarchy Levels']==0].value_counts())

24

24 is totally reasonable. The overall impact this will have on likelihood to die will be extraordinarily small and it's mainly included to help the user feel like it's more personalized.  

In order to prep the data for joining, we'll need to remove the extraneous data and generate a probability based on the the volume of deaths per occupation and mechanism out of the total deaths by mechanism.

In [104]:
job_deaths = job_deaths[job_deaths['Hierarchy Levels']==0]

We'll want to remove the levels and clean the columns names:

In [105]:
del job_deaths['Hierarchy Levels']

In [106]:
job_deaths.columns = ['job','mechanism','deaths']

In [107]:
job_deaths

Unnamed: 0,job,mechanism,deaths
0,Total,Total fatal injuries (number),5250
1,Total,Transportation incidents,2080
2,Transportation and material moving occupations,Total fatal injuries (number),1443
4,Transportation and material moving occupations,Transportation incidents,1014
5,Construction and extraction occupations,Total fatal injuries (number),1003
8,Total,Violence and other injuries by persons or animals,828
10,Total,"Falls, slips, trips",791
11,Total,Contact with objects and equipment,786
15,Total,Exposure to harmful substances or environments,621
16,"Installation, maintenance, and repair occupat...",Total fatal injuries (number),420


In [113]:
probs = []
for i in range(job_deaths.shape[0]):
    prob = job_deaths.deaths.iloc[i]/\
    job_deaths.deaths[(job_deaths.job == job_deaths.job.iloc[i]) & 
                      (job_deaths.mechanism == job_deaths.mechanism.iloc[i])]
    probs.append(prob)

In [124]:
job_deaths.deaths.iloc[i]/\
    job_deaths.deaths[(job_deaths.job == job_deaths.job.iloc[i]) & 
                      (job_deaths.mechanism == job_deaths.mechanism.iloc[i])]

4360   NaN
Name: deaths, dtype: float64

In [109]:
job_deaths['prob'] = probs

In [110]:
job_deaths

Unnamed: 0,job,mechanism,deaths,prob
0,Total,Total fatal injuries (number),5250,"0 1.0 Name: deaths, dtype: float64"
1,Total,Transportation incidents,2080,"1 1.0 Name: deaths, dtype: float64"
2,Transportation and material moving occupations,Total fatal injuries (number),1443,"2 1.0 Name: deaths, dtype: float64"
4,Transportation and material moving occupations,Transportation incidents,1014,"4 1.0 Name: deaths, dtype: float64"
5,Construction and extraction occupations,Total fatal injuries (number),1003,"5 1.0 Name: deaths, dtype: float64"
8,Total,Violence and other injuries by persons or animals,828,"8 1.0 Name: deaths, dtype: float64"
10,Total,"Falls, slips, trips",791,"10 1.0 Name: deaths, dtype: float64"
11,Total,Contact with objects and equipment,786,"11 1.0 Name: deaths, dtype: float64"
15,Total,Exposure to harmful substances or environments,621,"15 1.0 Name: deaths, dtype: float64"
16,"Installation, maintenance, and repair occupat...",Total fatal injuries (number),420,"16 1.0 Name: deaths, dtype: float64"
