# Replacing Covid Tracking Project - Simplified Workflow

>NOTE: ~~See  final workflow in the final script: `data_acquisition.py`~~ **as of 08/12, the development of the data acquisition code has moved back to "`Exploring Options for Replacing Covid Tracking Project Data_08-2021.ipynb`"**

- 08/09/21

- This is a companion notebook to `Exploring Options for Replacing Covid Tracking Project Data_08-2021`
- The goal is to only save the final pieces of code required to produce the dataset, and to exclude testing things out and info displays

## Summary

- Download kaggle data for the deaths and cases counts
- Download Socrata HHS data for hospital info
- Get all as daily frequency and with State Abbrevs before merging

In [1]:
# !pip install -U fsds
from fsds.imports import *

import os,zipfile,json,joblib
pd.set_option('display.max_columns',0)

fsds v0.3.2 loaded.  Read the docs: https://fs-ds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


In [2]:
## Appending folder with functions
%load_ext autoreload 
%autoreload 2
# import sys
# sys.path.append('.')

import functions as fn
# help(fn)

# Data

## Specifying File Destinations

In [3]:
## Specifying data storage folders
fpath_raw = r"data_raw"
fpath_clean = r"data/"
[os.makedirs(fpath,exist_ok=True) for fpath in [fpath_clean,fpath_raw]];

## Covid-19 Data From Johns Hopkins University

- https://www.kaggle.com/antgoldbloom/covid19-data-from-john-hopkins-university
- Comes with CONVENIENT_ files and RAW_ files.

In [4]:
## Download kaggle jhu data and make zipfile object
# !kaggle datasets download -p "{fpath_raw}" -d antgoldbloom/covid19-data-from-john-hopkins-university
os.system(f'kaggle datasets download -p "{fpath_raw}" -d antgoldbloom/covid19-data-from-john-hopkins-university')


jhu_data_zip = zipfile.ZipFile(os.path.join(fpath_raw,'covid19-data-from-john-hopkins-university.zip'))
jhu_data_zip.namelist()

['CONVENIENT_global_confirmed_cases.csv',
 'CONVENIENT_global_deaths.csv',
 'CONVENIENT_global_metadata.csv',
 'CONVENIENT_us_confirmed_cases.csv',
 'CONVENIENT_us_deaths.csv',
 'CONVENIENT_us_metadata.csv',
 'RAW_global_confirmed_cases.csv',
 'RAW_global_deaths.csv',
 'RAW_us_confirmed_cases.csv',
 'RAW_us_deaths.csv']

In [5]:
## Getting State Abbrevs
state_abbrevs = pd.read_csv('Reference Data/united_states_abbreviations.csv')

## Making dicts of Name:Abbrev and Abbrev:Name
state_to_abbrevs_map = dict(zip(state_abbrevs['State'],state_abbrevs['Abbreviation']))
abbrev_to_state_map = dict(zip(state_abbrevs['Abbreviation'],state_abbrevs['State']))
# state_to_abbrevs_map

### prep `df_metadata`

In [6]:
# prep df_metadata
file = 'CONVENIENT_us_metadata.csv'
jhu_data_zip.extract(file,path=fpath_raw)
df_metadata = pd.read_csv(os.path.join(fpath_raw,file))

## Adding State Abbrevas to kaggle metadata
df_metadata.insert(1,'State_Code',df_metadata['Province_State'].map(state_to_abbrevs_map))
print(df_metadata.isna().sum())

## Dropping us territories
df_metadata.dropna(subset=['State_Code'], inplace=True)

## Saving county info
df_metadata.to_csv(os.path.join(fpath_clean,"us_metadata_counties.csv"),index=False)
df_metadata

Province_State    0
State_Code        6
Admin2            6
Population        0
Lat               0
Long              0
dtype: int64


Unnamed: 0,Province_State,State_Code,Admin2,Population,Lat,Long
0,Alabama,AL,Autauga,55869,32.539527,-86.644082
1,Alabama,AL,Baldwin,223234,30.727750,-87.722071
2,Alabama,AL,Barbour,24686,31.868263,-85.387129
3,Alabama,AL,Bibb,22394,32.996421,-87.125115
4,Alabama,AL,Blount,57826,33.982109,-86.567906
...,...,...,...,...,...,...
3337,Wyoming,WY,Teton,23464,43.935225,-110.589080
3338,Wyoming,WY,Uinta,20226,41.287818,-110.547578
3339,Wyoming,WY,Unassigned,0,0.000000,0.000000
3340,Wyoming,WY,Washakie,7805,43.904516,-107.680187


In [7]:
## Saving a states-only version with aggregated populations and mean lat/long
df_state_metadata = df_metadata.groupby('Province_State',as_index=False).agg({'Population':'sum',
                                               "Lat":'mean',"Long":"mean"})
df_state_metadata.insert(1,'State_Code',df_state_metadata['Province_State'].map(state_to_abbrevs_map))
df_state_metadata.to_csv(os.path.join(fpath_clean,"us_metadata_states.csv"),index=False)
df_state_metadata

Unnamed: 0,Province_State,State_Code,Population,Lat,Long
0,Alabama,AL,4903185,31.931113,-84.196785
1,Alaska,AK,740995,56.628273,-139.57154
2,Arizona,AZ,7278717,29.714033,-98.349911
3,Arkansas,AR,3017804,34.005087,-90.033096
4,California,CA,39512223,36.582496,-116.704308
5,Colorado,CO,5758736,37.755612,-102.289687
6,Connecticut,CT,3565287,33.290944,-58.125464
7,Delaware,DE,973764,23.465566,-45.319942
8,District of Columbia,DC,705749,12.968059,-25.672187
9,Florida,FL,21477737,28.101892,-80.303621


In [8]:
## Making and saving remapping dicts
import joblib

state_to_abbrevs_meta = dict(zip(df_state_metadata['Province_State'],df_state_metadata['State_Code']))
abbrev_to_state_meta = dict(zip(df_state_metadata['State_Code'],df_state_metadata['Province_State']))

joblib.dump(state_to_abbrevs_meta, os.path.join(fpath_clean,'state_names_to_codes_map.joblib'))
joblib.dump(abbrev_to_state_meta, os.path.join(fpath_clean,'state_codes_to_names_map.joblib'))

## save mapper fo state to code for function
mapper_path = os.path.join(fpath_clean,'state_names_to_codes_map.joblib')
mapper_path

'data/state_names_to_codes_map.joblib'

### def `load_raw_ts_file` & `melt_df_to_ts`

In [9]:
%load_ext autoreload
%autoreload 2
import data_acquisition as da
# help(da)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [10]:
# def load_raw_ts_file(jhu_data_zip, file = 'RAW_us_confirmed_cases.csv',
#                      mapper_path='data/state_names_to_codes_map.joblib',
#                     verbose=True):
    
#     if verbose: 
#         print(f"Loading data from {file}")
#     state_to_abbrevs_meta = joblib.load(mapper_path)
    
#     ## Extract and load csv
#     jhu_data_zip.extract(file)
#     data = pd.read_csv(file)
    
#     ## Drop states not included in metadata
#     data.insert(1,'State_Code',data['Province_State'].map(state_to_abbrevs_meta))
#     data.dropna(subset=['State_Code'],inplace=True)
#     return data

help(da.load_raw_ts_file)

# def melt_df_to_ts(df_cases,value_name, var_name='Date',
#                   multi_index_cols=['State_Code','Date'],
#                   id_cols = ['Province_State',"State_Code",'Admin2'],
#                   cols_to_drop=['iso2','iso3','code3','UID','Country_Region',
#                                 'Combined_Key','Lat','Long_','FIPS']):
    
# #     value_cols = [c for c in df_cases.columns if c not in [*cols_to_drop,*id_cols]]
    
#     ## Remove any cols not in the actual dataframe
#     id_cols = [c for c in id_cols if c in df_cases.columns] 
#     cols_to_drop = [c for c in cols_to_drop if c in df_cases.columns] 
    
#     ## CHECKING FOR NON-DATE COLS TO REMOVE
#     value_cols = [c for c in df_cases.columns if c not in [*id_cols,*cols_to_drop]]
#     value_cols = list(filter(lambda x: len(x.split('/'))>1,value_cols))
    
    
#     df_cases_ts = pd.melt(df_cases, 
#                           id_vars=id_cols, value_vars=value_cols,
#                           var_name=var_name, value_name=value_name)
    
#     df_cases_ts['Date'] = pd.to_datetime(df_cases_ts['Date'])
#     df_cases_ts = df_cases_ts.set_index(multi_index_cols).sort_index()
#     return df_cases_ts

help(da.melt_df_to_ts)


Help on function load_raw_ts_file in module data_acquisition:

load_raw_ts_file(jhu_data_zip, file='RAW_us_confirmed_cases.csv', mapper_path='data/state_names_to_codes_map.joblib', verbose=True)

Help on function melt_df_to_ts in module data_acquisition:

melt_df_to_ts(df_cases, value_name, var_name='Date', multi_index_cols=['State_Code', 'Date'], id_cols=['Province_State', 'State_Code', 'Admin2'], cols_to_drop=['iso2', 'iso3', 'code3', 'UID', 'Country_Region', 'Combined_Key', 'Lat', 'Long_', 'FIPS'])



In [11]:
## Prep ` df_cases_ts`
df_cases = da.load_raw_ts_file(jhu_data_zip, file = 'RAW_us_confirmed_cases.csv',)
df_cases_ts = da.melt_df_to_ts(df_cases,'Cases')
df_cases_ts

Loading data from RAW_us_confirmed_cases.csv


Unnamed: 0_level_0,Unnamed: 1_level_0,Province_State,Admin2,Cases
State_Code,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AK,2020-01-22,Alaska,Aleutians East,0
AK,2020-01-22,Alaska,Aleutians West,0
AK,2020-01-22,Alaska,Anchorage,0
AK,2020-01-22,Alaska,Bethel,0
AK,2020-01-22,Alaska,Bristol Bay,0
...,...,...,...,...
WY,2021-08-14,Wyoming,Teton,4068
WY,2021-08-14,Wyoming,Uinta,2593
WY,2021-08-14,Wyoming,Unassigned,0
WY,2021-08-14,Wyoming,Washakie,967


In [12]:
## Prep df_deaths_ts
df_deaths = da.load_raw_ts_file(jhu_data_zip,file = 'RAW_us_deaths.csv')
df_deaths_ts = da.melt_df_to_ts(df_deaths,'Deaths')
df_deaths_ts

Loading data from RAW_us_deaths.csv


Unnamed: 0_level_0,Unnamed: 1_level_0,Province_State,Admin2,Deaths
State_Code,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AK,2020-01-22,Alaska,Aleutians East,0
AK,2020-01-22,Alaska,Aleutians West,0
AK,2020-01-22,Alaska,Anchorage,0
AK,2020-01-22,Alaska,Bethel,0
AK,2020-01-22,Alaska,Bristol Bay,0
...,...,...,...,...
WY,2021-08-14,Wyoming,Teton,11
WY,2021-08-14,Wyoming,Uinta,14
WY,2021-08-14,Wyoming,Unassigned,0
WY,2021-08-14,Wyoming,Washakie,26


### pd.merge for `df_cases_deaths_ts`

In [13]:
## Merge df_cases_ts and df_deaths_ts
df_cases_deaths_ts = pd.merge(df_cases_ts.reset_index(), df_deaths_ts.reset_index())
df_cases_deaths_ts

Unnamed: 0,State_Code,Date,Province_State,Admin2,Cases,Deaths
0,AK,2020-01-22,Alaska,Aleutians East,0,0
1,AK,2020-01-22,Alaska,Aleutians West,0,0
2,AK,2020-01-22,Alaska,Anchorage,0,0
3,AK,2020-01-22,Alaska,Bethel,0,0
4,AK,2020-01-22,Alaska,Bristol Bay,0,0
...,...,...,...,...,...,...
1904851,WY,2021-08-14,Wyoming,Teton,4068,11
1904852,WY,2021-08-14,Wyoming,Uinta,2593,14
1904853,WY,2021-08-14,Wyoming,Unassigned,0,0
1904854,WY,2021-08-14,Wyoming,Washakie,967,26


#### Saving df_cases_deaths_ts `'us_states_cases_deaths.csv'`

In [14]:
df_cases_deaths_ts.to_csv(os.path.join(fpath_clean,'us_states_cases_deaths.csv'),index=True)

### Making & saving  "`df_daily_cases_deaths_ts`"

In [15]:
df_daily_cases_deaths_ts = df_cases_deaths_ts.set_index('Date')\
                                .groupby('State_Code').resample("D")\
                                    .sum().reset_index()
df_daily_cases_deaths_ts.to_csv(os.path.join(fpath_clean,'us_states_daily_cases_deaths.csv'),index=True)
df_daily_cases_deaths_ts

Unnamed: 0,State_Code,Date,Cases,Deaths
0,AK,2020-01-22,0,0
1,AK,2020-01-23,0,0
2,AK,2020-01-24,0,0
3,AK,2020-01-25,0,0
4,AK,2020-01-26,0,0
...,...,...,...,...
29687,WY,2021-08-10,67326,793
29688,WY,2021-08-11,67582,793
29689,WY,2021-08-12,67957,793
29690,WY,2021-08-13,68272,793


In [16]:
# pd.concat([df_cases_ts,df_deaths_ts],axis=1)

## New Hospital Data [Added 08/13/21]

- From "`0_Exploring Options for Replacing Covid Tracking Project Data_08-2021.ipynb`"

In [17]:
# def get_hospital_data():
#     offset = 0
#     ## Getting Hospital Capacity Data
#     base_url = 'https://healthdata.gov/resource/g62h-syeh.csv'
#     page = 0
#     results = []

#     ## seting random, large page-len
#     page_len = 1000

#     while (page_len>0):
#         try:
#             print(f"[i] Page {page} (offset = {offset})")
#             url = base_url+f"?$offset={offset}"
#             df_temp = pd.read_csv(url)
#             results.append(df_temp)

#             page_len = len(df_temp)
#             offset+=page_len
#             page+=1
#         except Exception as e:
#             print('[!] ERROR:')
#             print(e)
#             print('-- returning raw results list instead of dataframe..')
#             return results
        
#     return pd.concat(results)

# hospital_df = get_hospital_data()
# hospital_df

help(da.get_hospital_data)

Help on function get_hospital_data in module data_acquisition:

get_hospital_data()



In [18]:
# ## Get hospital data with function
# df1 = get_hospital_data()
# df1 = df1.rename({'state':'State_Code',
#                  'date','Date'})
# df1['Date'] = pd.to_datetime(df1['Date'])
# df1 = df1.sort_values(['State_Code','Date'])
# df1

In [19]:
import datetime as dt
today = dt.date.today().strftime("%m-%d-%Y")
# df1.to_csv(os.path.join(fpath_raw,f'hospital_data_{today}.csv'))


### Saving Columns

In [20]:
# class ColumnDict(dict):
#     """Inherits from a normal dictionary.
    
#     Methods:
#         find_expr_cols: methods for finding columns based on expressions
#                         saves the column names under with the expression  as key
#         get_all_values: gets list of all unique values stored in dict
#     Adds 
#     Also saved keep_keys True/False dict of expressions that should be kept or dropped
#     """
#     keep_keys = {True:list(),False:list(),'id':list()} # Expressions 
#     keep_cols = {True:list(),False:list()} # column names
    
#     def __init__(self, id_cols=[],*args,**kwargs):

#         self.id_cols=id_cols
#         ## Empty list of keep keys/cols
# #         self['id'] = self.id_cols
#         self.keep_keys = {True:list(),False:list(),'id':self.id_cols} # Expressions 
#         self.keep_cols = {True:[*self.id_cols],False:list()} # column names
#     #     id_cols = list() ## id columns to be auto-kept 
#         super().__init__(*args,**kwargs)
    

    
#     def get_all_values(self,keep=None):
#         """Retrieves list of unique column names:
#         Args:
#             keep (None, True, False): determines subset of columns returned
#             # Adapter from: https://www.geeksforgeeks.org/python-concatenate-dictionary-value-lists/
#             """
#         if keep is None:
#             from itertools import chain
#             return [*self.id_cols,*set(list(chain(*self.values())))]
        
#         elif keep==True:
#             col_list = list(set(self.keep_cols[keep]))
#             return [*self.id_cols, *[c for c in col_list if c not in self.id_cols]]
# #             return list(set([*self.id_cols,*]))
#         elif keep==False:
#             return list(set(self.keep_cols[keep]))

        
        
#     def find_expr_cols(self,expressions,df,keep,exlcude_known_cols=None):
#         """Saves lists of column names as values in dict
#         Args:
#             Expresssions (str,list): patterns to find in column names 
#             df (DataFrame): dataframe to check
#             keep (bool): saves expr and cols keep_cols/keep_keys as True or False
            
#         TO DO:
#             exlcude_known_cols (NOT IMPLEMENTED YET): will check if found columns 
#                                 are already in any of the known lists of cols
                                
                                
                                
#         EXAMPLE USAAGE:
#         >>> COLUMNS = ColumnDict()
#         >>> COLUMNS.find_expr_cols(['staffing','previous_day','coverage'],
#                                     df1,keep=False)
#         """
            
#         if isinstance(expressions,str):
#                 expressions = [expressions]
                
#         for expr in expressions:
#             found_cols = [c for c in df.columns if expr in c]
#             self[expr] = found_cols

#             ## Save exression and fond_cols to keep_keys/keep_cols
#             self.keep_keys[keep].append(expr)
            
#             [self.keep_cols[keep].append(c) for c in found_cols if c not in self.keep_cols[keep]]

help(da.ColumnDict)



Help on class ColumnDict in module data_acquisition:

class ColumnDict(builtins.dict)
 |  ColumnDict(id_cols=[], *args, **kwargs)
 |  
 |  Inherits from a normal dictionary.
 |  
 |  Methods:
 |      find_expr_cols: methods for finding columns based on expressions
 |                      saves the column names under with the expression  as key
 |      get_all_values: gets list of all unique values stored in dict
 |  Adds 
 |  Also saved keep_keys True/False dict of expressions that should be kept or dropped
 |  
 |  Method resolution order:
 |      ColumnDict
 |      builtins.dict
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, id_cols=[], *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  find_expr_cols(self, expressions, df, keep, exlcude_known_cols=None)
 |      Saves lists of column names as values in dict
 |      Args:
 |          Expresssions (str,list): patterns to find in column names 
 |          df (

# Full Workflow

### Making `df_hospitals`

In [21]:
FULL_WORKFLOW =True

import datetime as dt
today = dt.date.today().strftime("%m-%d-%Y")
# df1.to_csv(os.path.join(fpath_raw,f'hospital_data_{today}.csv'))


if FULL_WORKFLOW:
    ## Get hospital data with function

    ## Get hospital data with function
    df1 = da.get_hospital_data()
    df1 = df1.rename({'state':'State_Code',
                     'date':'Date'},axis=1)
    df1['Date'] = pd.to_datetime(df1['Date'])
    df1 = df1.sort_values(['State_Code','Date'])

[i] Page 0 (offset = 0)
[i] Page 1 (offset = 1000)
[i] Page 2 (offset = 2000)
[i] Page 3 (offset = 3000)
[i] Page 4 (offset = 4000)
[i] Page 5 (offset = 5000)
[i] Page 6 (offset = 6000)
[i] Page 7 (offset = 7000)
[i] Page 8 (offset = 8000)
[i] Page 9 (offset = 9000)
[i] Page 10 (offset = 10000)
[i] Page 11 (offset = 11000)
[i] Page 12 (offset = 12000)
[i] Page 13 (offset = 13000)
[i] Page 14 (offset = 14000)
[i] Page 15 (offset = 15000)
[i] Page 16 (offset = 16000)
[i] Page 17 (offset = 17000)
[i] Page 18 (offset = 18000)
[i] Page 19 (offset = 19000)
[i] Page 20 (offset = 20000)
[i] Page 21 (offset = 21000)
[i] Page 22 (offset = 22000)
[i] Page 23 (offset = 23000)
[i] Page 24 (offset = 24000)
[i] Page 25 (offset = 25000)
[i] Page 26 (offset = 26000)
[i] Page 27 (offset = 27000)
[i] Page 28 (offset = 28000)
[i] Page 29 (offset = 28477)


In [22]:
COLUMNS = da.ColumnDict(id_cols=['State_Code','Date'])

## saving names to DROP to COLUMNS dict
drop_col_expressions = ['staff','previous_day','coverage','onset']
COLUMNS.find_expr_cols(drop_col_expressions,df1,keep=False)


## saving names to KEEP to COLUMNS dict
keep_col_expressions = ['inpatient_bed','adult_icu_bed','utilization',
                        'total_adult_patients','total_pediatric_patients',
                       'percent_of_inpatients_with_covid','deaths']
COLUMNS.find_expr_cols(keep_col_expressions,df1,keep=True)


## Making df_hospitals
df_hospitals = df1[COLUMNS.get_all_values(keep=True)].copy()
df_hospitals = df_hospitals.set_index(COLUMNS.id_cols).sort_index()
df_hospitals.reset_index().to_csv(os.path.join(fpath_raw,'hospital_data.csv'))

df_hospitals#.loc['MD',['inpatient_beds_utilization']].plot()

Unnamed: 0_level_0,Unnamed: 1_level_0,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
State_Code,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1
AK,2020-03-23,1,,,56.0,,,3.0,1,0,1,,,,0,,21.0,21.0,1,,1.0,3.0,,56.0,,1.0,0.142857,0.375000,21.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-24,1,,,56.0,,,3.0,1,0,1,,,,0,,20.0,20.0,1,,1.0,3.0,,56.0,,1.0,0.150000,0.357143,20.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-25,1,,,56.0,,,1.0,1,0,1,,,,0,,15.0,15.0,1,,1.0,1.0,,56.0,,1.0,0.066667,0.267857,15.0,0.0,56.0,0,0,,0,,0,1.0,0.017857,1.0,
AK,2020-03-26,1,,,56.0,,,2.0,1,0,1,,,,0,,16.0,16.0,1,,1.0,2.0,,56.0,,1.0,0.125000,0.285714,16.0,0.0,56.0,0,0,,0,,0,1.0,0.035714,2.0,
AK,2020-03-27,2,,,81.0,,,1.0,2,0,2,,,,0,,23.0,23.0,2,,2.0,1.0,,81.0,,2.0,0.043478,0.283951,23.0,0.0,81.0,0,0,,0,,0,2.0,0.012346,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WY,2021-08-11,31,137.0,0.0,1695.0,0.459854,63.0,111.0,31,31,29,26.0,0.206349,63.0,29,111.0,837.0,822.0,29,0.0,29.0,111.0,137.0,1755.0,126.0,29.0,0.135036,0.476923,837.0,2.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.065487,111.0,97.0
WY,2021-08-12,31,136.0,0.0,1695.0,0.463235,63.0,105.0,31,31,29,25.0,0.200000,63.0,29,105.0,852.0,836.0,29,0.0,29.0,105.0,136.0,1755.0,125.0,29.0,0.125598,0.485470,852.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.061947,105.0,99.0
WY,2021-08-13,31,137.0,0.0,1695.0,0.401460,55.0,107.0,31,31,29,27.0,0.214286,55.0,29,107.0,838.0,830.0,29,0.0,29.0,107.0,137.0,1755.0,126.0,29.0,0.128916,0.477493,838.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.063127,107.0,99.0
WY,2021-08-14,31,137.0,0.0,1695.0,0.430657,59.0,116.0,31,31,29,28.0,0.222222,59.0,29,116.0,799.0,791.0,29,0.0,29.0,116.0,137.0,1755.0,126.0,29.0,0.146650,0.455271,799.0,4.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.068437,116.0,106.0


# **BOOKMARK 08/14/21** - combining new hospital data 

### JOIN `df_hospitals` and `df_daily_cases_deaths_ts`

In [23]:
# df_cases_deaths_ts.set_index('Date').groupby('State_Code').resample("D").sum().reset_index()
df_daily_cases_deaths_ts

Unnamed: 0,State_Code,Date,Cases,Deaths
0,AK,2020-01-22,0,0
1,AK,2020-01-23,0,0
2,AK,2020-01-24,0,0
3,AK,2020-01-25,0,0
4,AK,2020-01-26,0,0
...,...,...,...,...
29687,WY,2021-08-10,67326,793
29688,WY,2021-08-11,67582,793
29689,WY,2021-08-12,67957,793
29690,WY,2021-08-13,68272,793


In [24]:
df_hospitals.reset_index()

Unnamed: 0,State_Code,Date,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
0,AK,2020-03-23,1,,,56.0,,,3.0,1,0,1,,,,0,,21.0,21.0,1,,1.0,3.0,,56.0,,1.0,0.142857,0.375000,21.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
1,AK,2020-03-24,1,,,56.0,,,3.0,1,0,1,,,,0,,20.0,20.0,1,,1.0,3.0,,56.0,,1.0,0.150000,0.357143,20.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
2,AK,2020-03-25,1,,,56.0,,,1.0,1,0,1,,,,0,,15.0,15.0,1,,1.0,1.0,,56.0,,1.0,0.066667,0.267857,15.0,0.0,56.0,0,0,,0,,0,1.0,0.017857,1.0,
3,AK,2020-03-26,1,,,56.0,,,2.0,1,0,1,,,,0,,16.0,16.0,1,,1.0,2.0,,56.0,,1.0,0.125000,0.285714,16.0,0.0,56.0,0,0,,0,,0,1.0,0.035714,2.0,
4,AK,2020-03-27,2,,,81.0,,,1.0,2,0,2,,,,0,,23.0,23.0,2,,2.0,1.0,,81.0,,2.0,0.043478,0.283951,23.0,0.0,81.0,0,0,,0,,0,2.0,0.012346,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28472,WY,2021-08-11,31,137.0,0.0,1695.0,0.459854,63.0,111.0,31,31,29,26.0,0.206349,63.0,29,111.0,837.0,822.0,29,0.0,29.0,111.0,137.0,1755.0,126.0,29.0,0.135036,0.476923,837.0,2.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.065487,111.0,97.0
28473,WY,2021-08-12,31,136.0,0.0,1695.0,0.463235,63.0,105.0,31,31,29,25.0,0.200000,63.0,29,105.0,852.0,836.0,29,0.0,29.0,105.0,136.0,1755.0,125.0,29.0,0.125598,0.485470,852.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.061947,105.0,99.0
28474,WY,2021-08-13,31,137.0,0.0,1695.0,0.401460,55.0,107.0,31,31,29,27.0,0.214286,55.0,29,107.0,838.0,830.0,29,0.0,29.0,107.0,137.0,1755.0,126.0,29.0,0.128916,0.477493,838.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.063127,107.0,99.0
28475,WY,2021-08-14,31,137.0,0.0,1695.0,0.430657,59.0,116.0,31,31,29,28.0,0.222222,59.0,29,116.0,799.0,791.0,29,0.0,29.0,116.0,137.0,1755.0,126.0,29.0,0.146650,0.455271,799.0,4.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.068437,116.0,106.0


In [37]:
## combine all data
df = pd.merge(df_daily_cases_deaths_ts,df_hospitals.reset_index())
df.to_csv(os.path.join(fpath_clean,'combined_us_states_full_data.csv'),index=False)
df

Unnamed: 0,State_Code,Date,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
0,AK,2020-03-23,39,0,1,,,56.0,,,3.0,1,0,1,,,,0,,21.0,21.0,1,,1.0,3.0,,56.0,,1.0,0.142857,0.375000,21.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
1,AK,2020-03-24,43,0,1,,,56.0,,,3.0,1,0,1,,,,0,,20.0,20.0,1,,1.0,3.0,,56.0,,1.0,0.150000,0.357143,20.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
2,AK,2020-03-25,50,1,1,,,56.0,,,1.0,1,0,1,,,,0,,15.0,15.0,1,,1.0,1.0,,56.0,,1.0,0.066667,0.267857,15.0,0.0,56.0,0,0,,0,,0,1.0,0.017857,1.0,
3,AK,2020-03-26,64,1,1,,,56.0,,,2.0,1,0,1,,,,0,,16.0,16.0,1,,1.0,2.0,,56.0,,1.0,0.125000,0.285714,16.0,0.0,56.0,0,0,,0,,0,1.0,0.035714,2.0,
4,AK,2020-03-27,75,1,2,,,81.0,,,1.0,2,0,2,,,,0,,23.0,23.0,2,,2.0,1.0,,81.0,,2.0,0.043478,0.283951,23.0,0.0,81.0,0,0,,0,,0,2.0,0.012346,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27769,WY,2021-08-10,67326,793,29,136.0,0.0,1550.0,0.455882,62.0,112.0,29,29,27,21.0,0.168000,62.0,27,112.0,734.0,722.0,27,0.0,27.0,112.0,136.0,1610.0,125.0,27.0,0.155125,0.455901,734.0,2.0,1610.0,29,29,27.0,27,29.0,27,29.0,0.072258,112.0,100.0
27770,WY,2021-08-11,67582,793,31,137.0,0.0,1695.0,0.459854,63.0,111.0,31,31,29,26.0,0.206349,63.0,29,111.0,837.0,822.0,29,0.0,29.0,111.0,137.0,1755.0,126.0,29.0,0.135036,0.476923,837.0,2.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.065487,111.0,97.0
27771,WY,2021-08-12,67957,793,31,136.0,0.0,1695.0,0.463235,63.0,105.0,31,31,29,25.0,0.200000,63.0,29,105.0,852.0,836.0,29,0.0,29.0,105.0,136.0,1755.0,125.0,29.0,0.125598,0.485470,852.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.061947,105.0,99.0
27772,WY,2021-08-13,68272,793,31,137.0,0.0,1695.0,0.401460,55.0,107.0,31,31,29,27.0,0.214286,55.0,29,107.0,838.0,830.0,29,0.0,29.0,107.0,137.0,1755.0,126.0,29.0,0.128916,0.477493,838.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.063127,107.0,99.0


## Make `STATES` dict



In [26]:
df['State_Code'].unique()

array(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
       'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
       'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
       'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'PR', 'RI', 'SC', 'SD', 'TN',
       'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'], dtype=object)

In [27]:
unique_states = df['State_Code'].unique()
len(unique_states)

52

In [28]:
df_states = df.set_index(['State_Code','Date']).sort_index()
df_states

Unnamed: 0_level_0,Unnamed: 1_level_0,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
State_Code,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
AK,2020-03-23,39,0,1,,,56.0,,,3.0,1,0,1,,,,0,,21.0,21.0,1,,1.0,3.0,,56.0,,1.0,0.142857,0.375000,21.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-24,43,0,1,,,56.0,,,3.0,1,0,1,,,,0,,20.0,20.0,1,,1.0,3.0,,56.0,,1.0,0.150000,0.357143,20.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-25,50,1,1,,,56.0,,,1.0,1,0,1,,,,0,,15.0,15.0,1,,1.0,1.0,,56.0,,1.0,0.066667,0.267857,15.0,0.0,56.0,0,0,,0,,0,1.0,0.017857,1.0,
AK,2020-03-26,64,1,1,,,56.0,,,2.0,1,0,1,,,,0,,16.0,16.0,1,,1.0,2.0,,56.0,,1.0,0.125000,0.285714,16.0,0.0,56.0,0,0,,0,,0,1.0,0.035714,2.0,
AK,2020-03-27,75,1,2,,,81.0,,,1.0,2,0,2,,,,0,,23.0,23.0,2,,2.0,1.0,,81.0,,2.0,0.043478,0.283951,23.0,0.0,81.0,0,0,,0,,0,2.0,0.012346,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WY,2021-08-10,67326,793,29,136.0,0.0,1550.0,0.455882,62.0,112.0,29,29,27,21.0,0.168000,62.0,27,112.0,734.0,722.0,27,0.0,27.0,112.0,136.0,1610.0,125.0,27.0,0.155125,0.455901,734.0,2.0,1610.0,29,29,27.0,27,29.0,27,29.0,0.072258,112.0,100.0
WY,2021-08-11,67582,793,31,137.0,0.0,1695.0,0.459854,63.0,111.0,31,31,29,26.0,0.206349,63.0,29,111.0,837.0,822.0,29,0.0,29.0,111.0,137.0,1755.0,126.0,29.0,0.135036,0.476923,837.0,2.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.065487,111.0,97.0
WY,2021-08-12,67957,793,31,136.0,0.0,1695.0,0.463235,63.0,105.0,31,31,29,25.0,0.200000,63.0,29,105.0,852.0,836.0,29,0.0,29.0,105.0,136.0,1755.0,125.0,29.0,0.125598,0.485470,852.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.061947,105.0,99.0
WY,2021-08-13,68272,793,31,137.0,0.0,1695.0,0.401460,55.0,107.0,31,31,29,27.0,0.214286,55.0,29,107.0,838.0,830.0,29,0.0,29.0,107.0,137.0,1755.0,126.0,29.0,0.128916,0.477493,838.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.063127,107.0,99.0


In [29]:
STATES = {}

for state in unique_states:
    ## Reset the state's datafranes for pd.merge
#     df_cases_temp = df_cases_ts.loc[state].reset_index()
#     df_deaths_temp = df_deaths_ts.loc[state].reset_index()

#     ## Merge using date and Admin2/county
#     df_merged = pd.merge(df_cases_temp, df_deaths_temp, on=['Date','Admin2'])
#     df_merged_ts = df_merged.set_index("Date").resample('D').sum()
    
    df_merged_ts = df_states.loc[state]
#     df_merged_ts.columns = [f"{c}-{state}" for c in df_merged_ts.columns]    
    
    STATES[state] = df_merged_ts.copy()

STATES.keys()

dict_keys(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'PR', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'])

In [30]:
STATES['NY']

Unnamed: 0_level_0,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1
2020-03-14,557,5,2,,,2145.0,,,6.0,1,0,1,,,,0,,68.0,68.0,2,,2.0,6.0,,75.0,,1.0,0.000000,0.906667,68.0,0.0,2145.0,0,0,,0,,0,1.0,0.002797,0.0,
2020-03-15,633,10,3,,,1960.0,,,88.0,3,0,3,,,,0,,1599.0,1599.0,3,,3.0,88.0,,1960.0,,3.0,0.055034,0.815816,1599.0,0.0,1960.0,0,0,,0,,0,3.0,0.044898,88.0,
2020-03-16,961,21,4,,,2161.0,,,97.0,4,0,4,,,,0,,1676.0,1676.0,4,,4.0,97.0,,2161.0,,4.0,0.057876,0.775567,1676.0,2.0,2161.0,0,0,,0,,0,4.0,0.044887,97.0,
2020-03-17,1407,35,4,,,2161.0,,,128.0,4,0,4,,,,0,,1682.0,1682.0,4,,4.0,128.0,,2161.0,,4.0,0.076100,0.778343,1682.0,1.0,2161.0,0,0,,0,,0,4.0,0.059232,128.0,
2020-03-18,2507,60,4,,,2161.0,,,174.0,4,0,4,,,,0,,1599.0,1599.0,4,,4.0,174.0,,2161.0,,4.0,0.108818,0.739935,1599.0,4.0,2161.0,0,0,,0,,0,4.0,0.080518,174.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-08-10,2180696,53758,181,5229.0,24.0,46807.0,0.660164,3452.0,1892.0,181,181,172,348.0,0.067902,3452.0,172,1830.0,36516.0,36185.0,172,62.0,172.0,1892.0,5229.0,47389.0,5125.0,172.0,0.052287,0.770559,36516.0,15.0,47389.0,181,181,172.0,172,181.0,172,181.0,0.040421,1892.0,1385.0
2021-08-11,2187349,53797,209,5233.0,29.0,50765.0,0.659851,3453.0,1971.0,209,209,200,352.0,0.068629,3453.0,200,1906.0,40434.0,40096.0,200,65.0,200.0,1971.0,5233.0,51347.0,5129.0,200.0,0.049157,0.787466,40434.0,17.0,51347.0,209,209,200.0,200,209.0,200,209.0,0.038826,1971.0,1475.0
2021-08-12,2192224,53828,209,5229.0,29.0,50577.0,0.663224,3468.0,2073.0,209,209,200,367.0,0.071610,3468.0,200,2013.0,40392.0,40064.0,200,60.0,200.0,2073.0,5229.0,51160.0,5125.0,200.0,0.051742,0.789523,40392.0,17.0,51160.0,209,209,200.0,200,209.0,200,209.0,0.040987,2073.0,1563.0
2021-08-13,2196866,53840,206,5219.0,32.0,50452.0,0.657597,3432.0,2170.0,206,206,197,396.0,0.077419,3432.0,197,2104.0,39819.0,39536.0,197,66.0,197.0,2170.0,5219.0,51034.0,5115.0,197.0,0.054887,0.780245,39819.0,14.0,51034.0,206,206,197.0,197,206.0,197,206.0,0.043011,2170.0,1651.0


### Test individual state before making loop

In [31]:
## Saving CSVs
DATA_FOLDER = os.path.join(fpath_clean,'state_data/')
os.makedirs(DATA_FOLDER,exist_ok=True)
os.listdir(DATA_FOLDER)

['combined_data_PR.csv.gz',
 'combined_data_FL.csv.gz',
 'combined_data_NV.csv.gz',
 'combined_data_MD.csv.gz',
 'combined_data_KS.csv.gz',
 'combined_data_IA.csv.gz',
 'combined_data_WI.csv.gz',
 'combined_data_ND.csv.gz',
 'combined_data_KY.csv.gz',
 'combined_data_NH.csv.gz',
 'combined_data_MN.csv.gz',
 'combined_data_OR.csv.gz',
 'combined_data_AK.csv.gz',
 'combined_data_WY.csv.gz',
 'combined_data_CO.csv.gz',
 'combined_data_WA.csv.gz',
 'combined_data_SD.csv.gz',
 'combined_data_CA.csv.gz',
 'combined_data_VA.csv.gz',
 'combined_data_NJ.csv.gz',
 'combined_data_MT.csv.gz',
 'combined_data_HI.csv.gz',
 'combined_data_OH.csv.gz',
 'combined_data_NC.csv.gz',
 'combined_data_IN.csv.gz',
 'combined_data_MI.csv.gz',
 'combined_data_ME.csv.gz',
 'combined_data_AL.csv.gz',
 'combined_data_DE.csv.gz',
 'combined_data_TX.csv.gz',
 'combined_data_TN.csv.gz',
 'combined_data_GA.csv.gz',
 'combined_data_SC.csv.gz',
 'combined_data_UT.csv.gz',
 'combined_data_AZ.csv.gz',
 'combined_data_NM.c

In [32]:
df_states

Unnamed: 0_level_0,Unnamed: 1_level_0,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
State_Code,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
AK,2020-03-23,39,0,1,,,56.0,,,3.0,1,0,1,,,,0,,21.0,21.0,1,,1.0,3.0,,56.0,,1.0,0.142857,0.375000,21.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-24,43,0,1,,,56.0,,,3.0,1,0,1,,,,0,,20.0,20.0,1,,1.0,3.0,,56.0,,1.0,0.150000,0.357143,20.0,0.0,56.0,0,0,,0,,0,1.0,0.053571,3.0,
AK,2020-03-25,50,1,1,,,56.0,,,1.0,1,0,1,,,,0,,15.0,15.0,1,,1.0,1.0,,56.0,,1.0,0.066667,0.267857,15.0,0.0,56.0,0,0,,0,,0,1.0,0.017857,1.0,
AK,2020-03-26,64,1,1,,,56.0,,,2.0,1,0,1,,,,0,,16.0,16.0,1,,1.0,2.0,,56.0,,1.0,0.125000,0.285714,16.0,0.0,56.0,0,0,,0,,0,1.0,0.035714,2.0,
AK,2020-03-27,75,1,2,,,81.0,,,1.0,2,0,2,,,,0,,23.0,23.0,2,,2.0,1.0,,81.0,,2.0,0.043478,0.283951,23.0,0.0,81.0,0,0,,0,,0,2.0,0.012346,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
WY,2021-08-10,67326,793,29,136.0,0.0,1550.0,0.455882,62.0,112.0,29,29,27,21.0,0.168000,62.0,27,112.0,734.0,722.0,27,0.0,27.0,112.0,136.0,1610.0,125.0,27.0,0.155125,0.455901,734.0,2.0,1610.0,29,29,27.0,27,29.0,27,29.0,0.072258,112.0,100.0
WY,2021-08-11,67582,793,31,137.0,0.0,1695.0,0.459854,63.0,111.0,31,31,29,26.0,0.206349,63.0,29,111.0,837.0,822.0,29,0.0,29.0,111.0,137.0,1755.0,126.0,29.0,0.135036,0.476923,837.0,2.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.065487,111.0,97.0
WY,2021-08-12,67957,793,31,136.0,0.0,1695.0,0.463235,63.0,105.0,31,31,29,25.0,0.200000,63.0,29,105.0,852.0,836.0,29,0.0,29.0,105.0,136.0,1755.0,125.0,29.0,0.125598,0.485470,852.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.061947,105.0,99.0
WY,2021-08-13,68272,793,31,137.0,0.0,1695.0,0.401460,55.0,107.0,31,31,29,27.0,0.214286,55.0,29,107.0,838.0,830.0,29,0.0,29.0,107.0,137.0,1755.0,126.0,29.0,0.128916,0.477493,838.0,1.0,1755.0,31,31,29.0,29,31.0,29,31.0,0.063127,107.0,99.0


In [33]:
STATES = {}
# DATA_FOLDER = fp

for state in unique_states:
#     df_cases_temp = df_cases_ts.loc[state].sort_index().resample("D").sum().diff().fillna(0)
#     df_deaths_temp = df_deaths_ts.loc[state].sort_index().resample("D").sum().diff().fillna(0)
#     df_hospital_temp = df_hospitals.loc[state].drop(columns='Province_State').sort_index().resample("D").asfreq().ffill().fillna(0)
    
#     df_state = pd.concat([df_cases_temp,df_deaths_temp,df_hospital_temp],axis=1).fillna(0)#.loc['03-2020':]
    df_state = df_states.loc[state].copy()
    df_state.to_csv(f"{DATA_FOLDER}combined_data_{state}.csv.gz",compression='gzip')   
    STATES[state] = df_state.copy()

STATES.keys()

dict_keys(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'PR', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'])

In [34]:
STATES['MD']

Unnamed: 0_level_0,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1
2020-03-01,0,0,1,,,6.0,,,0.0,1,0,1,,,,0,,4.0,4.0,1,,1.0,0.0,,6.0,,1.0,0.000000,0.666667,4.0,0.0,6.0,0,0,,0,,0,1.0,0.000000,0.0,
2020-03-02,0,0,1,,,6.0,,,0.0,1,0,1,,,,0,,4.0,4.0,1,,1.0,0.0,,6.0,,1.0,0.000000,0.666667,4.0,0.0,6.0,0,0,,0,,0,1.0,0.000000,0.0,
2020-03-03,0,0,1,,,6.0,,,0.0,1,0,1,,,,0,,4.0,4.0,1,,1.0,0.0,,6.0,,1.0,0.000000,0.666667,4.0,6.0,6.0,0,0,,0,,0,1.0,0.000000,0.0,
2020-03-03,0,0,1,,,6.0,,,0.0,1,0,1,,,,0,,4.0,4.0,1,,1.0,0.0,,6.0,,1.0,0.000000,0.666667,4.0,6.0,6.0,0,0,,0,,0,1.0,0.000000,0.0,
2020-03-04,0,0,1,,,6.0,,,0.0,1,0,1,,,,0,,4.0,4.0,1,,1.0,0.0,,6.0,,1.0,0.000000,0.666667,4.0,0.0,6.0,0,0,,0,,0,1.0,0.000000,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-08-10,475184,9869,53,1307.0,5.0,10298.0,0.683244,893.0,645.0,53,53,52,139.0,0.109535,893.0,52,630.0,8520.0,8460.0,52,15.0,52.0,645.0,1307.0,10402.0,1269.0,52.0,0.076241,0.819073,8520.0,0.0,10402.0,53,53,52.0,52,53.0,52,53.0,0.062634,645.0,444.0
2021-08-11,476070,9870,60,1287.0,6.0,11395.0,0.707848,911.0,689.0,60,60,59,144.0,0.115292,911.0,59,673.0,9649.0,9589.0,59,16.0,59.0,689.0,1287.0,11499.0,1249.0,59.0,0.071853,0.839116,9649.0,3.0,11499.0,60,60,59.0,59,60.0,59,60.0,0.060465,689.0,475.0
2021-08-12,477117,9873,60,1293.0,6.0,11435.0,0.694509,898.0,688.0,60,60,59,149.0,0.118725,898.0,59,674.0,9603.0,9543.0,59,14.0,59.0,688.0,1293.0,11539.0,1255.0,59.0,0.072095,0.832221,9603.0,5.0,11539.0,60,60,59.0,59,60.0,59,60.0,0.060166,688.0,482.0
2021-08-13,478067,9878,60,1310.0,6.0,11513.0,0.698473,915.0,718.0,60,60,59,144.0,0.113208,915.0,59,708.0,9701.0,9648.0,59,10.0,59.0,718.0,1310.0,11617.0,1272.0,59.0,0.074420,0.835069,9701.0,3.0,11617.0,60,60,59.0,59,60.0,59,60.0,0.062364,718.0,506.0


## Saving Data

In [35]:
import joblib
joblib.dump(STATES,os.path.join(fpath_clean,'STATE_DICT.joblib'))

['data/STATE_DICT.joblib']

In [36]:
STATES_LOADED = joblib.load(os.path.join(fpath_clean,'STATE_DICT.joblib'))
STATES_LOADED['TX']

Unnamed: 0_level_0,Cases,Deaths,inpatient_beds_coverage,adult_icu_bed_utilization_denominator,total_pediatric_patients_hospitalized_confirmed_covid,inpatient_bed_covid_utilization_denominator,adult_icu_bed_utilization,staffed_adult_icu_bed_occupancy,inpatient_bed_covid_utilization_numerator,inpatient_beds_used_coverage,total_adult_patients_hospitalized_confirmed_covid_coverage,deaths_covid_coverage,adult_icu_bed_covid_utilization_numerator,adult_icu_bed_covid_utilization,adult_icu_bed_utilization_numerator,total_pediatric_patients_hospitalized_confirmed_covid_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid,inpatient_beds_used,percent_of_inpatients_with_covid_denominator,inpatient_beds_used_covid_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid,inpatient_bed_covid_utilization_coverage,inpatient_beds_used_covid,total_staffed_adult_icu_beds,inpatient_beds_utilization_denominator,adult_icu_bed_covid_utilization_denominator,percent_of_inpatients_with_covid_coverage,percent_of_inpatients_with_covid,inpatient_beds_utilization,inpatient_beds_utilization_numerator,deaths_covid,inpatient_beds,staffed_adult_icu_bed_occupancy_coverage,total_staffed_adult_icu_beds_coverage,adult_icu_bed_covid_utilization_coverage,total_adult_patients_hospitalized_confirmed_and_suspected_covid_coverage,adult_icu_bed_utilization_coverage,total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_coverage,inpatient_beds_utilization_coverage,inpatient_bed_covid_utilization,percent_of_inpatients_with_covid_numerator,total_adult_patients_hospitalized_confirmed_covid
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1
2020-01-22,0,0,3,,,82.0,,,0.0,2,0,1,,,,0,,51.0,51.0,2,,2.0,0.0,,82.0,,2.0,0.000000,0.621951,51.0,0.0,808.0,0,0,,0,,0,2.0,0.000000,0.0,
2020-01-23,0,0,3,,,88.0,,,0.0,2,0,1,,,,0,,58.0,58.0,2,,2.0,0.0,,88.0,,2.0,0.000000,0.659091,58.0,0.0,814.0,0,0,,0,,0,2.0,0.000000,0.0,
2020-01-24,0,0,3,,,75.0,,,0.0,2,0,1,,,,0,,49.0,49.0,2,,2.0,0.0,,75.0,,2.0,0.000000,0.653333,49.0,0.0,801.0,0,0,,0,,0,2.0,0.000000,0.0,
2020-01-25,0,0,3,,,62.0,,,0.0,2,0,1,,,,0,,36.0,36.0,2,,2.0,0.0,,62.0,,2.0,0.000000,0.580645,36.0,0.0,788.0,0,0,,0,,0,2.0,0.000000,0.0,
2020-01-25,0,0,3,,,62.0,,,0.0,2,0,1,,,,0,,36.0,36.0,2,,2.0,0.0,,62.0,,2.0,0.000000,0.580645,36.0,0.0,788.0,0,0,,0,,0,2.0,0.000000,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-08-10,3272462,53831,508,6832.0,235.0,61363.0,0.927547,6337.0,11145.0,508,508,503,2763.0,0.415551,6337.0,503,10841.0,49765.0,49125.0,503,304.0,503.0,11145.0,6832.0,62133.0,6649.0,503.0,0.226870,0.800943,49765.0,105.0,62133.0,508,508,503.0,503,508.0,503,508.0,0.181624,11145.0,10378.0
2021-08-11,3293869,53939,596,6749.0,245.0,66973.0,0.943103,6365.0,11351.0,596,596,591,2876.0,0.438415,6365.0,591,11038.0,53673.0,53027.0,591,313.0,591.0,11351.0,6749.0,67746.0,6560.0,591.0,0.214061,0.792268,53673.0,143.0,67746.0,596,596,591.0,591,596.0,591,596.0,0.169486,11351.0,10628.0
2021-08-12,3306267,54057,596,6844.0,260.0,67362.0,0.947545,6485.0,12016.0,596,596,591,2928.0,0.440499,6485.0,591,11691.0,54321.0,53665.0,591,325.0,591.0,12016.0,6844.0,68140.0,6647.0,591.0,0.223908,0.797197,54321.0,130.0,68140.0,596,596,591.0,591,596.0,591,596.0,0.178380,12016.0,11157.0
2021-08-13,3323844,54202,596,7127.0,244.0,68635.0,0.911604,6497.0,11961.0,596,596,591,2939.0,0.424221,6497.0,591,11652.0,54410.0,53797.0,591,309.0,591.0,11961.0,7127.0,69411.0,6928.0,591.0,0.222336,0.783882,54410.0,131.0,69411.0,596,596,591.0,591,596.0,591,596.0,0.174270,11961.0,11056.0


# Summary

- by the end of the workflow the following files should be created:
    - data_raw:
    - `'us_states_cases_deaths.csv'`
    - `'combined_us_states_full_data.csv'`