## Fatality Rates
>The purpose of this notebook is to check the relationship between estimated deaths and estimated infections (i.e. fatality rate). It finds that the relationship is not constant, but rather varies both by community and over time. In other words, the IHME model calculated different fatality rates for each community at each point in time based on factors such as testing capacity and hospital service. Therefore, although estimated infections is dependent on estimated deaths, the two differ by more than scale.

In [2]:
import pandas as pd

In [3]:
ihme = pd.read_csv("../Raw_Data/Projected_Data/IHME_projections_05_04.csv")
print(ihme.shape)
ihme.head()

(29841, 38)


Unnamed: 0,V1,location_name,date,allbed_mean,allbed_lower,allbed_upper,ICUbed_mean,ICUbed_lower,ICUbed_upper,InvVen_mean,...,icuover_lower,icuover_upper,mobility_data_type,mobility_composite,total_tests_data_type,total_tests,confirmed_infections,est_infections_mean,est_infections_lower,est_infections_upper
0,19692,Abruzzo,2020-02-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,,,
1,19693,Abruzzo,2020-02-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,,,
2,19694,Abruzzo,2020-02-06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,,,
3,19695,Abruzzo,2020-02-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,,,
4,19696,Abruzzo,2020-02-08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,,,


In [4]:
# Drop all data from outside the US

# Import the us library (after installation)
# See documentation here: https://pypi.org/project/us/
import us

# Make a list of all US state names as strings
US_states = [str(s) for s in us.states.STATES]

# Drop rows for locations outside the 50 US states.
ihme = ihme[ihme['location_name'].isin(US_states)]

# Rename the location column as states
ihme = ihme.rename(columns={'location_name': 'state'})

print(ihme.shape)
ihme.head()

(10150, 38)


Unnamed: 0,V1,state,date,allbed_mean,allbed_lower,allbed_upper,ICUbed_mean,ICUbed_lower,ICUbed_upper,InvVen_mean,...,icuover_lower,icuover_upper,mobility_data_type,mobility_composite,total_tests_data_type,total_tests,confirmed_infections,est_infections_mean,est_infections_lower,est_infections_upper
203,6700,Alabama,2020-02-04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,0.0,0.0,0.0
204,6701,Alabama,2020-02-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,0.0,0.0,0.0
205,6702,Alabama,2020-02-06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,0.0,0.0,0.0
206,6703,Alabama,2020-02-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,,,,,,0.0,0.0,0.0
207,6704,Alabama,2020-02-08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,observed,-0.898962,,,,0.0,0.0,0.0


In [5]:
# Drop extraneous columns (our focus is only on # infections)

relevant_cols = ['state', 
                 'date',
                 'deaths_mean',
                 'est_infections_mean',]

ihme = ihme[relevant_cols]

print(ihme.shape)
ihme.head()

(10150, 4)


Unnamed: 0,state,date,deaths_mean,est_infections_mean
203,Alabama,2020-02-04,0.0,0.0
204,Alabama,2020-02-05,0.0,0.0
205,Alabama,2020-02-06,0.0,0.0
206,Alabama,2020-02-07,0.0,0.0
207,Alabama,2020-02-08,0.0,0.0


In [6]:
# Drop all rows except for the dates at the start of each month.

relevant_dates = ['2020-05-04',
                  '2020-06-04']

ihme = ihme.loc[ihme['date'].isin(relevant_dates)]

print(ihme.shape)
ihme.head()

(100, 4)


Unnamed: 0,state,date,deaths_mean,est_infections_mean
293,Alabama,2020-05-04,14.446,1937.894863
324,Alabama,2020-06-04,22.324,4271.268001
496,Alaska,2020-05-04,0.0,0.125849
527,Alaska,2020-06-04,0.0,0.003485
1308,Arizona,2020-05-04,13.308,1405.037687


In [7]:
ihme['fatality_ratio'] = ihme['deaths_mean'] / ihme['est_infections_mean']
ihme.head(20)

Unnamed: 0,state,date,deaths_mean,est_infections_mean,fatality_ratio
293,Alabama,2020-05-04,14.446,1937.894863,0.007454
324,Alabama,2020-06-04,22.324,4271.268001,0.005227
496,Alaska,2020-05-04,0.0,0.125849,0.0
527,Alaska,2020-06-04,0.0,0.003485,0.0
1308,Arizona,2020-05-04,13.308,1405.037687,0.009472
1339,Arizona,2020-06-04,9.483,852.136541,0.011128
1511,Arkansas,2020-05-04,2.36,237.596165,0.009933
1542,Arkansas,2020-06-04,1.503,127.442911,0.011794
4556,California,2020-05-04,79.944,5871.986927,0.013614
4587,California,2020-06-04,29.641,2403.634766,0.012332


In [8]:
ihme['fatality_ratio'].describe()

count    100.000000
mean       0.016682
std        0.020422
min        0.000000
25%        0.009372
50%        0.013471
75%        0.019556
max        0.185494
Name: fatality_ratio, dtype: float64