csv dataset from here: https://gis.cdc.gov/grasp/COVIDNet/COVID19_5.html


"COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention. WEBSITE. Accessed on November 16, 2020".

dataset updated weekly

# COVID-NET Hospitalization Dataset EDA

This is a dataset of confirmed or probable hospitalizations due to COVID in select areas

description from: https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covid-net/purpose-methods.html

## How COVID-NET Hospitalization Data Is Different from Hospitalizations Reported in National and State Case Counts
COVID-NET differs from hospitalizations reported in national and state case counts in two ways. First, state and national COVID-19 case reporting are based on all people who test positive for COVID-19 in the United States. COVID-NET is limited to COVID-19-associated hospitalizations captured in the COVID-NET surveillance area. Second, COVID-NET reports rates and not just counts. These rates show how many people are hospitalized with COVID-19 in the surveillance area, compared to the entire number of people residing in that area.

### Loading in the dataset

In [2]:
#import libraries

import pandas as pd
import matplotlib.pyplot as plt

In [3]:
FILENAME = 'Covid19Phase5Data/Characteristics.csv'

# open up the file and read into a dataframe
# need to specify encoding, because UTF will result in error
# skip first 2 rows because they're just titles
# skip last 4 rows because they're just footnotes
df = pd.read_csv(FILENAME, skiprows = 2, skipfooter = 4, encoding = "ISO-8859-1")

#display overview of dataset
df

  df = pd.read_csv(FILENAME, skiprows = 2, skipfooter = 4, encoding = "ISO-8859-1")


Unnamed: 0,Primary Strata,Primary Strata Name,Secondary Strata,Secondary Strata Name,Count,Percent
0,Age,0-4 yr,Sex,Male,241.0,55.8
1,Age,0-4 yr,Sex,Female,191.0,44.2
2,Age,0-4 yr,Race/Ethnicity,White,64.0,15.5
3,Age,0-4 yr,Race/Ethnicity,Black,125.0,30.3
4,Age,0-4 yr,Race/Ethnicity,Hispanic/Latino,172.0,41.6
...,...,...,...,...,...,...
1053,Mechanical ventilation,No,Race/Ethnicity,Black,,33.2
1054,Mechanical ventilation,No,Race/Ethnicity,Hispanic/Latino,,22.1
1055,Mechanical ventilation,No,Race/Ethnicity,Asian/Pacific Islander,,4.3
1056,Mechanical ventilation,No,Race/Ethnicity,American Indian/Alaska Native,,1.3


### Looking at the categorical variables "Primary Strata" and "Secondary Strata"

In [4]:
#determine what the categorical variables are and what values they take on

#primary_strata is a pandas.Series
#index is the categorical variables
#value is list of values that categorical variable can take on

primary_strata = df.groupby('Primary Strata')['Primary Strata Name'].unique()
primary_strata

Primary Strata
Age                       [0-4 yr, 5-17 yr, 18-49 yr, 50-64 yr, 65+ yr, ...
In-hospital death                                  [Yes       , No        ]
Intensive care unit                                [Yes       , No        ]
Mechanical ventilation                             [Yes       , No        ]
Race/Ethnicity            [White, Black, Hispanic/Latino, Asian/Pacific ...
Sex                                                          [Male, Female]
Name: Primary Strata Name, dtype: object

In [5]:
#looking at a specific primary strata
#let's look at all values that race/ethnicity can be
print(primary_strata['Race/Ethnicity'])

['White' 'Black' 'Hispanic/Latino' 'Asian/Pacific Islander'
 'American Indian/Alaska Native' 'Other']


In [6]:
#secondary_strata is a pandas.Series
#index is the categorical variables
#value is list of values that categorical variable can take on

secondary_strata = df.groupby('Secondary Strata')['Secondary Strata Name'].unique()
secondary_strata

Secondary Strata
Abdominal Pain                                                      [Yes       , No        ]
Acute renal failure/acute kidney injury                             [Yes       , No        ]
Acute respiratory distress syndrome                                 [Yes       , No        ]
Acute respiratory failure                                           [Yes       , No        ]
Age                                                             [18-49 yr, 50-64 yr, 65+ yr]
Altered mental status/confusion                                     [Yes       , No        ]
Anosmia/decreased smell                                             [Yes       , No        ]
Asthma                                                              [Yes       , No        ]
COPD/emphysema                                                      [Yes       , No        ]
Chest pain                                                          [Yes       , No        ]
Chronic kidney disease                               

In [7]:
#Now see what combos of "Primary Strata" and "Secondary Strata" are possible

strata_combos = df.groupby('Primary Strata')['Secondary Strata'].unique()
strata_combos

Primary Strata
Age                       [Sex, Race/Ethnicity, In-hospital death, Inten...
In-hospital death                                [Age, Sex, Race/Ethnicity]
Intensive care unit                              [Age, Sex, Race/Ethnicity]
Mechanical ventilation                           [Age, Sex, Race/Ethnicity]
Race/Ethnicity            [Age, Sex, In-hospital death, Intensive care u...
Sex                       [Age, Race/Ethnicity, In-hospital death, Inten...
Name: Secondary Strata, dtype: object

In [12]:
#take a peek at all secondary strata that can be paired with the Primary Strata "Sex"
print(strata_combos['Sex'])

['Age' 'Race/Ethnicity' 'In-hospital death' 'Intensive care unit'
 'Mechanical ventilation' 'Asthma' 'COPD/emphysema' 'Diabetes'
 'Coronary artery disease' 'Heart failure' 'Hypertension' 'Obesity'
 'Chronic kidney disease' 'Abdominal Pain'
 'Altered mental status/confusion' 'Anosmia/decreased smell' 'Chest pain'
 'Congested/runny nose' 'Cough' 'Diarrhea' 'Dysgeusia/decreased taste'
 'Fever/chills' 'Headache' 'Hemoptysis/bloody sputum'
 'Muscle aches/myalgias' 'Nausea/vomiting' 'Shortness of breath'
 'Sore throat' 'Wheezing' 'Acute renal failure/acute kidney injury'
 'Acute respiratory distress syndrome' 'Acute respiratory failure'
 'Pneumonia' 'Sepsis']


In [9]:
#visualize # of examples of each possible strata combo

#get number of possible combos for each strata combination
#ie (Age = 0-4 yr, Abdominal Pain = Yes), (Age = 0-17 yr, Abdominal Pain = No) ...
num_combos = df.groupby(['Primary Strata','Secondary Strata']).count().max(axis = 1)

Primary Strata  Secondary Strata                       
Age             Abdominal Pain                             12
                Acute renal failure/acute kidney injury    12
                Acute respiratory distress syndrome        12
                Acute respiratory failure                  12
                Altered mental status/confusion            12
                                                           ..
Sex             Race/Ethnicity                             12
                Sepsis                                      4
                Shortness of breath                         4
                Sore throat                                 4
                Wheezing                                    4
Length: 111, dtype: int64

### Look for missing values

In [10]:
#there are some missing values NaN in the dataset
#check number of missing values in each column

df.isna().sum()

Primary Strata             0
Primary Strata Name        0
Secondary Strata           0
Secondary Strata Name      0
Count                    801
Percent                   13
dtype: int64

In [11]:
#determine where are the missing values

missing = df[df.isna().any(axis=1)]

a = missing.groupby(['Primary Strata', 'Secondary Strata']).count()

a

Unnamed: 0_level_0,Unnamed: 1_level_0,Primary Strata Name,Secondary Strata Name,Count,Percent
Primary Strata,Secondary Strata,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Age,Abdominal Pain,8,8,0,8
Age,Acute renal failure/acute kidney injury,8,8,0,8
Age,Acute respiratory distress syndrome,8,8,0,8
Age,Acute respiratory failure,8,8,0,8
Age,Altered mental status/confusion,8,8,0,8
...,...,...,...,...,...
Sex,Pneumonia,4,4,0,4
Sex,Sepsis,4,4,0,4
Sex,Shortness of breath,4,4,0,4
Sex,Sore throat,4,4,0,4
