In [36]:
import pandas as pd

df = pd.read_csv('../data/fatalities.csv')
df.head()

Unnamed: 0,Type,Deaths,Population,PerMillion
0,Assault,86398,1316784825,65.612846
1,Forces of Nature,7211,1316784825,5.476217
2,Animals,744,1316784825,0.565013
3,Traffic and Transportation,209146,1316784825,158.83081


This code uses the Plotly Express library to create a bar chart that displays the death rate per million people for different types of deaths. The data is sorted by the death rate in descending order and the x-axis represents the type of death, while the y-axis represents the death rate per million people.

The code starts by importing the Plotly Express library, which allows us to easily create various types of plots.

Next, we sort the data by the death rate per million people in descending order using the `sort_values()` method.

Then, we use the `px.bar()` function to create the bar chart. The function takes the DataFrame, the x-axis column name 'Type', the y-axis column name 'PerMillion', the color column name 'PerMillion', color_continuous_scale to 'earth' and set the labels for the y-axis.

After that, we use the `update_layout()` method to customize the appearance of the chart. We set the x-axis title to 'Type', the y-axis title to 'Per Million', the y-axis type to log and hide the legend by setting `showlegend=False`, the font family for the entire plot is set to 'Futura' and a title is added to the plot using the `title` parameter.

Finally, the `fig.show()` function is used to display the chart in the notebook.

In [37]:
import plotly.express as px # Importing the Plotly Express library

df = df.sort_values(by='PerMillion', ascending=False) # Sorting the data by death rate per million people in descending order

fig = px.bar(df, x='Type', y='PerMillion', color='PerMillion', # Creating a bar chart
             color_continuous_scale='earth', labels={'PerMillion':'Deaths Per Million'},
             width=740, height=450)

fig.update_layout(
    xaxis_title='Death Type', # Setting the x-axis title
    yaxis_title='Deaths Per Million', # Setting the y-axis title
    yaxis=dict(type='log'), # Setting the y-axis type to log
    showlegend=False, # Hiding the legend
    font=dict(family='Futura'), # Setting the font family for the entire plot
    title="Animals are one of the last things you should be afraid of." # Adding a title to the plot
)
fig.show() # Displaying the chart


In [38]:
import chart_studio
import chart_studio.plotly as py

chart_studio.tools.set_credentials_file(username='jpzamanillo129', api_key='WuBpcCmaTACk4UAQcpQr')
chart_studio.tools.set_config_file(sharing='public')

py.plot(fig, filename='full-type-pm', auto_open=True)

'https://plotly.com/~jpzamanillo129/1/'

This code uses the Pandas library to group and summarize the data and Plotly Express to create a bar chart that displays the death rate per million people for different types of deaths.

The first line of code uses the `replace()` method to group the deaths from 'Assault' and 'Traffic and Transportation' as 'Assault/Auto Accidents' and group the deaths from 'Animals' and 'Forces of Nature' as 'Outdoor Accidents'. This is done by replacing the original categories with the new ones.

The second line of code uses the `groupby()` method to group the data by 'Type' and then use the `sum()` method to sum all the values in each group. The `reset_index()` method is used to reset the index so that the 'Type' column can be used as a variable in the plot.

Combining categories into smaller categories allows to see the drastic difference between groups in this case, by comparing the death rate per million people of Assault/Auto Accidents and Outdoor Accidents, we can observe that the death rate is much higher for Assault/Auto Accidents which is a cause of death that is 37 times more likely to occur than spending time in the outdoors.

In [47]:
df['Type'] = df['Type'].replace({'Assault': 'Assault/Auto Accidents', 'Traffic and Transportation': 'Assault/Auto Accidents',
                      'Animals':'Outdoor Accidents','Forces of Nature':'Outdoor Accidents'})


df = df.groupby(['Type']).sum()
df.reset_index(inplace=True)

fig = px.bar(df, x='Type', y='PerMillion', color='PerMillion',
             color_continuous_scale='earth', labels={'PerMillion':'Deaths Per Million'},
             width=740, height=450)

fig.update_layout(
    xaxis_title='Death Category',
    yaxis_title='Deaths Per Million',
    yaxis=dict(type='log'),
    showlegend=False,
    font=dict(family='Futura'),
    title="The death rate from auto accidents and assaults is 37 times higher than outdoor accidents."
)

fig.update(layout_coloraxis_showscale=False)

fig.show()

In [48]:
py.plot(fig, filename='2-type-pm', auto_open=True)

'https://plotly.com/~jpzamanillo129/4/'

----
## Part 2

In [5]:
import pandas as pd

cdc = pd.read_csv('../data/cdc_wonder.csv')
cdc.head()

Unnamed: 0,State,State Code,Month,Month Code,ICD Sub-Chapter,ICD Sub-Chapter Code,Cause of death,Cause of death Code,Ten-Year Age Groups,Ten-Year Age Groups Code,Deaths,Population,Crude Rate
0,Alabama,1,"Jan., 2018",2018/01,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,15-24 years,15-24,14,Not Applicable,Not Applicable
1,Alabama,1,"Jan., 2018",2018/01,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,25-34 years,25-34,14,Not Applicable,Not Applicable
2,Alabama,1,"Feb., 2018",2018/02,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,15-24 years,15-24,10,Not Applicable,Not Applicable
3,Alabama,1,"Mar., 2018",2018/03,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,25-34 years,25-34,11,Not Applicable,Not Applicable
4,Alabama,1,"Apr., 2018",2018/04,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,15-24 years,15-24,12,Not Applicable,Not Applicable


In [6]:
cdc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3367 entries, 0 to 3366
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   State                     3367 non-null   object
 1   State Code                3367 non-null   int64 
 2   Month                     3367 non-null   object
 3   Month Code                3367 non-null   object
 4   ICD Sub-Chapter           3367 non-null   object
 5   ICD Sub-Chapter Code      3367 non-null   object
 6   Cause of death            3367 non-null   object
 7   Cause of death Code       3367 non-null   object
 8   Ten-Year Age Groups       3367 non-null   object
 9   Ten-Year Age Groups Code  3367 non-null   object
 10  Deaths                    3367 non-null   int64 
 11  Population                3367 non-null   object
 12  Crude Rate                3367 non-null   object
dtypes: int64(2), object(11)
memory usage: 342.1+ KB


In [7]:
cdc['State Code'].unique()

array([ 1,  4,  5,  6,  8, 12, 13, 17, 18, 20, 21, 22, 24, 25, 26, 28, 29,
       32, 34, 35, 36, 37, 39, 40, 41, 42, 45, 47, 48, 51, 53, 55])

In [8]:
hiking = pd.read_csv('../data/hiking_states.csv')
hiking

Unnamed: 0,State,Hiking
0,Hawaii,100
1,Utah,62
2,Colorado,57
3,Montana,54
4,Oregon,51
5,Vermont,50
6,Maine,48
7,New Hampshire,45
8,Washington,44
9,Alaska,44


In [10]:
hiking_cdc = hiking.merge(cdc)
hiking_cdc.head()

Unnamed: 0,State,Hiking,State Code,Month,Month Code,ICD Sub-Chapter,ICD Sub-Chapter Code,Cause of death,Cause of death Code,Ten-Year Age Groups,Ten-Year Age Groups Code,Deaths,Population,Crude Rate
0,Colorado,57,8,"Jul., 2018",2018/07,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,15-24 years,15-24,12,Not Applicable,Not Applicable
1,Colorado,57,8,"Sep., 2018",2018/09,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,25-34 years,25-34,10,Not Applicable,Not Applicable
2,Colorado,57,8,"May, 2020",2020/05,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,15-24 years,15-24,10,Not Applicable,Not Applicable
3,Colorado,57,8,"Aug., 2020",2020/08,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,15-24 years,15-24,11,Not Applicable,Not Applicable
4,Colorado,57,8,"May, 2021",2021/05,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,15-24 years,15-24,10,Not Applicable,Not Applicable


In [11]:
pops = pd.read_csv('../data/state_pops21.csv')
pops.head()

Unnamed: 0,state,pop
0,.Alabama,5049846.0
1,.Alaska,734182.0
2,.Arizona,7264877.0
3,.Arkansas,3028122.0
4,.California,39142991.0


In [12]:
pops['state'] = pops['state'].str.replace('.', '')
pops.head()

  pops['state'] = pops['state'].str.replace('.', '')


Unnamed: 0,state,pop
0,Alabama,5049846.0
1,Alaska,734182.0
2,Arizona,7264877.0
3,Arkansas,3028122.0
4,California,39142991.0


In [15]:
merged_data = pops.merge(hiking_cdc, left_on='state', right_on='State')
merged_data.info()
merged_data.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3367 entries, 0 to 3366
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   state                     3367 non-null   object 
 1   pop                       3367 non-null   float64
 2   State                     3367 non-null   object 
 3   Hiking                    3367 non-null   int64  
 4   State Code                3367 non-null   int64  
 5   Month                     3367 non-null   object 
 6   Month Code                3367 non-null   object 
 7   ICD Sub-Chapter           3367 non-null   object 
 8   ICD Sub-Chapter Code      3367 non-null   object 
 9   Cause of death            3367 non-null   object 
 10  Cause of death Code       3367 non-null   object 
 11  Ten-Year Age Groups       3367 non-null   object 
 12  Ten-Year Age Groups Code  3367 non-null   object 
 13  Deaths                    3367 non-null   int64  
 14  Populati

Unnamed: 0,state,pop,State,Hiking,State Code,Month,Month Code,ICD Sub-Chapter,ICD Sub-Chapter Code,Cause of death,Cause of death Code,Ten-Year Age Groups,Ten-Year Age Groups Code,Deaths,Population,Crude Rate
0,Alabama,5049846.0,Alabama,13,1,"Jan., 2018",2018/01,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,15-24 years,15-24,14,Not Applicable,Not Applicable
1,Alabama,5049846.0,Alabama,13,1,"Jan., 2018",2018/01,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,25-34 years,25-34,14,Not Applicable,Not Applicable
2,Alabama,5049846.0,Alabama,13,1,"Feb., 2018",2018/02,Assault,X85-Y09,Assault by other and unspecified firearm disch...,X95,15-24 years,15-24,10,Not Applicable,Not Applicable
3,Alabama,5049846.0,Alabama,13,1,"Mar., 2018",2018/03,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,25-34 years,25-34,11,Not Applicable,Not Applicable
4,Alabama,5049846.0,Alabama,13,1,"Apr., 2018",2018/04,Transport accidents,V01-V99,Person injured in unspecified motor-vehicle ac...,V89.2,15-24 years,15-24,12,Not Applicable,Not Applicable
