![COVID 19](./images/CDC_Covid19.jpg)

# Covid-19 Cases Visualization


I have been following the Covid-19 cases and death rate actively since January 2020. The first images that came out of China, when the world (especially US) was taking the virus casually, to now, when the hospitalizations are down. 
In the process I used a few epidemic trackers and followed a few authors:
1. Metabiota's [epidemic tracker](https://www.epidemictracker.com/): This company tracks diseases from Dengue to Covid-19 and was one of the few companies who raised the red flag as early as December 25th, 2019 (saying that something was terribly wrong in China)
2. Tomas Pueyo's articles on Medium: The most famous being [The Hammer and the Dance](https://tomaspueyo.medium.com/coronavirus-the-hammer-and-the-dance-be9337092b56). This was the first article that discussed the current scenarios and what countries can do to control the spread of the virus. Unfortunately the virus did spread and the lockdowns helped but not so much. He followed this up with [Coronavirus: Learning How to Dance](https://tomaspueyo.medium.com/coronavirus-learning-how-to-dance-b8420170203e), [Coronavirus: The Basic Dance Steps Everybody Can Follow](https://tomaspueyo.medium.com/coronavirus-the-basic-dance-steps-everybody-can-follow-b3d216daa343)and [Coronavirus: How to Do Testing and Contact Tracing](https://tomaspueyo.medium.com/coronavirus-how-to-do-testing-and-contact-tracing-bde85b64072e)


## Here's how I obtained the datasets
1. I wanted to compare US-wide cases and deaths to California(The state I live in). The US and California data was obtained from the [CDC Covid tracker website](https://covid.cdc.gov/covid-data-tracker/#trends_dailycases_currenthospitaladmissions)
2. County wide data was procured from [California Health and Human Services Open Data Portal](https://data.chhs.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state/resource/046cdd2b-31e5-4d34-9ed3-b48cdbc4be7a). I searched for "Santa Clara" County to download my county's data.



In [36]:
import pandas as pd
import numpy as np
import pandas_bokeh
pandas_bokeh.output_notebook()
import matplotlib.dates as dates
import matplotlib.ticker as ticker
pd.set_option('plotting.backend', 'pandas_bokeh')
# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.models import SingleIntervalTicker, LinearAxis
from bokeh.models import Range1d, HoverTool, ColumnDataSource
print("Setup Complete")


Setup Complete


### Pandas-Bokeh Library
There are several visualization options available, the most straightforward being pandas itself. While Matplotlib and Seaborn offer great varieties of plots, it takes a lot of code to make the plots interactive.
[Pandas-Bokeh](https://github.com/PatrikHlobil/Pandas-Bokeh) offers zoom, pan abilities right out of the box and is intuitive to use!

In [37]:
# Obtain Country wide data for daily Covid cases.
# Use the 7-day moving average to plot, because the daily data is too noisy
us_cases = pd.read_csv('data/US-Daily-Cases.csv' , usecols=['Date', '7-Day Moving Avg'], index_col='Date', parse_dates=True)
us_cases = us_cases.sort_index()
us_cases = us_cases.rename(columns = {'7-Day Moving Avg': 'US-cases'})
# Add a new column so as to visualize cases and deaths in one plot. death plot looks flat if plotted without dividing the cases column by 1K
# us_cases['US-cases/1K'] = us_cases['US-cases']/1000
# us_cases['US-cases/1K'] = us_cases['US-cases/1K'].astype(int)
# us_cases.drop('US-cases', axis=1, inplace=True) 
print(us_cases.shape)
us_cases




(816, 1)


Unnamed: 0_level_0,US-cases
Date,Unnamed: 1_level_1
2020-01-23,0
2020-01-24,0
2020-01-25,0
2020-01-26,0
2020-01-27,0
...,...
2022-04-13,31380
2022-04-14,35562
2022-04-15,35359
2022-04-16,34972


In [221]:
# I wanted to experiment with 7-day averages, because the county data I obtained does not have a 7-day-moving average column.
''' 
# Below is my experiment

us_cases = pd.read_csv('data/US-Daily-Cases.csv' , usecols=['Date', 'New Cases', '7-Day Moving Avg'], index_col='Date', parse_dates=True)
us_cases = us_cases.sort_index()
us_cases['Cases-Avg'] = us_cases['New Cases'].rolling(window=7).mean()
us_cases['Cases-Avg'] = us_cases['Cases-Avg'].fillna(0).astype(int)
us_cases.to_csv('data/us_daily_try1.csv')
#print(us_cases.head())
print(us_cases.shape)
us_cases
'''

" \n# Below is my experiment\n\nus_cases = pd.read_csv('data/US-Daily-Cases.csv' , usecols=['Date', 'New Cases', '7-Day Moving Avg'], index_col='Date', parse_dates=True)\nus_cases = us_cases.sort_index()\nus_cases['Cases-Avg'] = us_cases['New Cases'].rolling(window=7).mean()\nus_cases['Cases-Avg'] = us_cases['Cases-Avg'].fillna(0).astype(int)\nus_cases.to_csv('data/us_daily_try1.csv')\n#print(us_cases.head())\nprint(us_cases.shape)\nus_cases\n"

In [38]:
# Obtain country wide Covid deaths data. Use the '7-Day Moving Avg'.
us_deaths = pd.read_csv('data/US-Daily-Deaths.csv' , usecols=['Date', '7-Day Moving Avg'], index_col='Date', parse_dates=True)
us_deaths = us_deaths.rename(columns = {'7-Day Moving Avg': 'US-deaths'}) 
us_deaths = us_deaths.sort_index()
print(us_deaths.head())
print(us_deaths.shape)

            US-deaths
Date                 
2020-01-23          0
2020-01-24          0
2020-01-25          0
2020-01-26          0
2020-01-27          0
(816, 1)


In [5]:

us_deaths.plot_bokeh(kind='line',
                    figsize =(1000,800), 
                    xlabel = 'Date',
                    rangetool = True,
                    legend = "top_left")

In [39]:
# Merge the 2 us_cases and us_deaths tables
us_df = us_cases.join(us_deaths)
# Start a column for Omicron variant
us_df['Omicron Variant'] = 0
us_df.loc['2021-11-22', 'Omicron Variant'] = 1000000

# Start a column for Delta variant
us_df['Delta Variant'] = 0
us_df.loc['2021-4-15', 'Delta Variant'] = 1000000

#us_df.to_csv("data/try1.csv")
us_df

Unnamed: 0_level_0,US-cases,US-deaths,Omicron Variant,Delta Variant
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-23,0,0,0,0
2020-01-24,0,0,0,0
2020-01-25,0,0,0,0
2020-01-26,0,0,0,0
2020-01-27,0,0,0,0
...,...,...,...,...
2022-04-13,31380,416,0,0
2022-04-14,35562,443,0,0
2022-04-15,35359,397,0,0
2022-04-16,34972,379,0,0


In [7]:
us_df.columns

Index(['US-cases', 'US-deaths', 'Omicron Variant', 'Delta Variant'], dtype='object')

In [40]:
us_df["DateString"] = us_df.index.strftime("%Y-%m-%d")
us_df

Unnamed: 0_level_0,US-cases,US-deaths,Omicron Variant,Delta Variant,DateString
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-23,0,0,0,0,2020-01-23
2020-01-24,0,0,0,0,2020-01-24
2020-01-25,0,0,0,0,2020-01-25
2020-01-26,0,0,0,0,2020-01-26
2020-01-27,0,0,0,0,2020-01-27
...,...,...,...,...,...
2022-04-13,31380,416,0,0,2022-04-13
2022-04-14,35562,443,0,0,2022-04-14
2022-04-15,35359,397,0,0,2022-04-15
2022-04-16,34972,379,0,0,2022-04-16


In [41]:
### US Wide Covid-19 cases plot with Delta and Omicron variant vertical lines to better visualize the effects on Covid cases and deaths ###
us_df.plot_bokeh(kind='bar',
                    figsize =(800,600),
                    xticks = np.arange(0, len(us_df.index), 100),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Covid-19 cases",
                    yticks = np.arange(0, max(us_df['Delta Variant']), 100000),
                    ylim = [0, max(us_df['Delta Variant'])],
                    title = "US Countrywide Covid-19 Plot",
                    fontsize_title = 30,
                    fontsize_label = 20,
                    zooming=False,
                    panning=False,
                    hovertool_string="""<h5> Date: @{DateString} </h5> 
                        <h5> US Cases: @{US-cases} </h5>
                        <h5> US Deaths: @{US-deaths} </h5>""",
                    legend = "top_left")
                                       

In [42]:
# California wide Covid cases
ca_cases = pd.read_csv('data/CA-Daily-Cases.csv' , usecols=['Date', '7-Day Moving Avg'], index_col='Date', parse_dates=True)
ca_cases = ca_cases.rename(columns = {'7-Day Moving Avg': 'CA-cases'}) 
ca_cases.head()


Unnamed: 0_level_0,CA-cases
Date,Unnamed: 1_level_1
2022-04-17,3648
2022-04-16,3648
2022-04-15,3648
2022-04-14,4116
2022-04-13,1405


In [137]:
ca_cases.plot_bokeh()

In [43]:
# Merge the US wide and state wide cases data
merged_cases = us_cases.join(ca_cases)
merged_cases

Unnamed: 0_level_0,US-cases,CA-cases
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-23,0,0
2020-01-24,0,0
2020-01-25,0,0
2020-01-26,0,0
2020-01-27,0,0
...,...,...
2022-04-13,31380,1405
2022-04-14,35562,4116
2022-04-15,35359,3648
2022-04-16,34972,3648


In [44]:
# California wide Covid cases
ca_deaths = pd.read_csv('data/CA-Daily-Deaths.csv' , usecols=['Date', '7-Day Moving Avg'], index_col='Date', parse_dates=True)
ca_deaths = ca_deaths.rename(columns = {'7-Day Moving Avg': 'CA-deaths'}) 
ca_deaths.head()


Unnamed: 0_level_0,CA-deaths
Date,Unnamed: 1_level_1
2022-04-17,50
2022-04-16,50
2022-04-15,50
2022-04-14,56
2022-04-13,28


In [45]:
# Merge the 2 ca_cases and ca_deaths tables
ca_df = ca_cases.join(ca_deaths)
ca_df = ca_df.sort_index()
# Start a column for Omicron variant
ca_df['Omicron Variant'] = 0
ca_df.loc['2021-11-22', 'Omicron Variant'] = 130000

# Start a column for Delta variant
ca_df['Delta Variant'] = 0
ca_df.loc['2021-4-15', 'Delta Variant'] = 130000
ca_df

Unnamed: 0_level_0,CA-cases,CA-deaths,Omicron Variant,Delta Variant
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-23,0,0,0,0
2020-01-24,0,0,0,0
2020-01-25,0,0,0,0
2020-01-26,0,0,0,0
2020-01-27,0,0,0,0
...,...,...,...,...
2022-04-13,1405,28,0,0
2022-04-14,4116,56,0,0
2022-04-15,3648,50,0,0
2022-04-16,3648,50,0,0


In [46]:
ca_df["DateString"] = ca_df.index.strftime("%Y-%m-%d")
ca_df.plot_bokeh(kind='bar',
                    figsize =(800,600),
                    xticks = np.arange(0, len(ca_df.index), 100),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Covid-19 cases",
                    yticks = np.arange(0, max(ca_df['CA-cases']), 10000),
                    title = "California State Covid-19 Plot",
                    fontsize_title = 30,
                    fontsize_label = 20,
                    zooming=False,
                    panning=False,
                    hovertool_string="""<h5> Date: @{DateString} </h5> 
                        <h5> CA Cases: @{CA-cases} </h5>
                        <h5> CA Deaths: @{CA-deaths} </h5>""",
                    legend = "top_left")

In [47]:
# Santa Clara county (Saratoga, CA belongs to this county) data
sc_df = pd.read_csv('data/Santa-Clara-County-Data.csv', usecols=['date', 'cases', 'deaths'], index_col='date', parse_dates=True)


sc_df = sc_df.sort_index()
sc_df['Cases-Avg'] = sc_df['cases'].rolling(window=7).mean()
sc_df['Deaths-Avg'] = sc_df['deaths'].rolling(window=7).mean()
sc_df = sc_df.fillna(0).astype(int)
sc_df = sc_df.rename(columns = {'Cases-Avg': 'County-cases', 'Deaths-Avg': 'County-deaths'})
sc_df = sc_df[['County-cases', 'County-deaths']]
sc_df = sc_df.drop(sc_df.index[-1])
sc_cases = sc_df[['County-cases']]
sc_deaths = sc_df[['County-deaths']]
#sc_df.to_csv('data/sc_df.csv')
print(sc_df.shape)
sc_df

(804, 2)


Unnamed: 0_level_0,County-cases,County-deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-02-01,0,0
2020-02-02,0,0
2020-02-03,0,0
2020-02-04,0,0
2020-02-05,0,0
...,...,...
2022-04-10,202,0
2022-04-11,193,0
2022-04-12,191,0
2022-04-13,159,0


In [48]:
sc_deaths

Unnamed: 0_level_0,County-deaths
date,Unnamed: 1_level_1
2020-02-01,0
2020-02-02,0
2020-02-03,0
2020-02-04,0
2020-02-05,0
...,...
2022-04-10,0
2022-04-11,0
2022-04-12,0
2022-04-13,0


In [49]:
sc_df["DateString"] = sc_df.index.strftime("%Y-%m-%d")
sc_df.plot_bokeh(kind='bar',
                    figsize =(800,600),
                    xticks = np.arange(0, len(sc_df.index), 100),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Covid-19 cases",
                    yticks = np.arange(0, max(sc_df['County-cases']), 1000),
                    title = "Santa Clara County Covid-19 Plot",
                    fontsize_title = 30,
                    fontsize_label = 20,
                    zooming=False,
                    panning=False,
                    hovertool_string="""<h4> @{DateString} </h4> 
                        <h5> County Cases: @{County-cases} </h5>
                        <h5> County Deaths: @{County-deaths} </h5>""",
                    legend = "top_left")

I want to analyze the Covid cases and Covid deaths separately and want to see whether the cases and deaths followed the same graph or differed country wide, state wide and county wide

In [50]:
# Merge US-cases, CA-cases and SC-cases
merged_cases = pd.merge(us_cases, ca_cases, how='outer', left_index=True, right_index=True)
merged_cases = pd.merge(merged_cases, sc_cases, how='outer', left_index=True, right_index=True)
merged_cases = merged_cases.fillna(0).astype(int)
# Start a column for Omicron variant
merged_cases['Omicron Variant'] = 0
merged_cases.loc['2021-11-22', 'Omicron Variant'] = 1000000

# Start a column for Delta variant
merged_cases['Delta Variant'] = 0
merged_cases.loc['2021-4-15', 'Delta Variant'] = 1000000
merged_cases.head(20)

Unnamed: 0,US-cases,CA-cases,County-cases,Omicron Variant,Delta Variant
2020-01-23,0,0,0,0,0
2020-01-24,0,0,0,0,0
2020-01-25,0,0,0,0,0
2020-01-26,0,0,0,0,0
2020-01-27,0,0,0,0,0
2020-01-28,0,0,0,0,0
2020-01-29,0,0,0,0,0
2020-01-30,0,0,0,0,0
2020-01-31,0,0,0,0,0
2020-02-01,0,0,0,0,0


In [51]:
merged_cases["DateString"] = merged_cases.index.strftime("%Y-%m-%d")
merged_cases.plot_bokeh(kind='bar',
                    figsize =(800,600),
                    xticks = np.arange(0, len(merged_cases.index), 100),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Covid-19 cases",
                    yticks = np.arange(0, max(merged_cases['US-cases']), 100000),
                    title = "Country/State/County Covid-19 Cases",
                    fontsize_title = 25,
                    fontsize_label = 18,
                    zooming=False,
                    panning=False,
                    hovertool_string="""
                        <h5> Date        : @{DateString} </h5> 
                        <h5> US Cases    : @{US-cases} </h5>
                        <h5> CA Cases    : @{CA-cases} </h5>
                        <h5> County Cases: @{County-cases} </h5>""",
                    legend = "top_left")

In [52]:
# Merge US-deaths, CA-deaths and SC-deaths
merged_deaths = pd.merge(us_deaths, ca_deaths, how='outer', left_index=True, right_index=True)
merged_deaths = pd.merge(merged_deaths, sc_deaths, how='outer', left_index=True, right_index=True)
merged_deaths = merged_deaths.fillna(0).astype(int)
# Start a column for Omicron variant
merged_deaths['Omicron Variant'] = 0
merged_deaths.loc['2021-11-22', 'Omicron Variant'] = 3500

# Start a column for Delta variant
merged_deaths['Delta Variant'] = 0
merged_deaths.loc['2021-4-15', 'Delta Variant'] = 3500
merged_deaths.head(20)

Unnamed: 0,US-deaths,CA-deaths,County-deaths,Omicron Variant,Delta Variant
2020-01-23,0,0,0,0,0
2020-01-24,0,0,0,0,0
2020-01-25,0,0,0,0,0
2020-01-26,0,0,0,0,0
2020-01-27,0,0,0,0,0
2020-01-28,0,0,0,0,0
2020-01-29,0,0,0,0,0
2020-01-30,0,0,0,0,0
2020-01-31,0,0,0,0,0
2020-02-01,0,0,0,0,0


In [53]:
merged_deaths["DateString"] = merged_deaths.index.strftime("%Y-%m-%d")
merged_deaths.plot_bokeh(kind='bar',
                    figsize =(800,600),
                    xticks = np.arange(0, len(merged_deaths.index), 100),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Covid-19 deaths",
                    yticks = np.arange(0, max(merged_deaths['US-deaths']), 100),
                    title = "Country/State/County Covid-19 Deaths",
                    fontsize_title = 25,
                    fontsize_label = 18,
                    zooming=False,
                    panning=False,
                    hovertool_string="""
                        <h5> Date        : @{DateString} </h5> 
                        <h5> US Deaths    : @{US-deaths} </h5>
                        <h5> CA Deaths    : @{CA-deaths} </h5>
                        <h5> County Deaths: @{County-deaths} </h5>""",
                    legend = "top_left")

Let us now find out how vaccinations affected the overall Covid cases US wide

In [54]:
# I am interested in these 2 data points - 7-Day Avg Daily Count of People Fully Vaccinated, 7-Day Average Daily Count First Booster
vaccine_us = pd.read_csv('data/Covid19_vaccinations_in_the_US.csv' , 
usecols=['Date', '7-Day Avg Daily Count of People Fully Vaccinated', '7-Day Average Daily Count First Booster'], 
index_col='Date', parse_dates=True)
vaccine_us = vaccine_us.sort_index()
vaccine_us = vaccine_us.rename(columns = {'7-Day Avg Daily Count of People Fully Vaccinated': 'Fully Vaccinated',
                                            '7-Day Average Daily Count First Booster': 'First Booster'})
print(vaccine_us.shape)
vaccine_us


(490, 2)


Unnamed: 0_level_0,Fully Vaccinated,First Booster
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-12-14,3272,0
2020-12-15,2269,0
2020-12-16,1814,0
2020-12-17,1602,0
2020-12-18,1535,0
...,...,...
2022-04-13,51185,90702
2022-04-14,49593,88340
2022-04-15,45422,81256
2022-04-16,41816,75608


In [55]:
vaccine_us.plot_bokeh()

In [58]:
vaccine_us_df = pd.merge(us_cases, vaccine_us, how='outer', left_index=True, right_index=True)
vaccine_us_df = vaccine_us_df.fillna(0).astype(int)
# Start a column for Omicron variant
vaccine_us_df['Omicron Variant'] = 0
vaccine_us_df.loc['2021-11-22', 'Omicron Variant'] = max(vaccine_us_df['Fully Vaccinated']) + 1000

# Start a column for Delta variant
vaccine_us_df['Delta Variant'] = 0
vaccine_us_df.loc['2021-4-15', 'Delta Variant'] = max(vaccine_us_df['Fully Vaccinated']) + 1000
vaccine_us_df.head(20)
vaccine_us_df

Unnamed: 0_level_0,US-cases,Fully Vaccinated,First Booster,Omicron Variant,Delta Variant
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-23,0,0,0,0,0
2020-01-24,0,0,0,0,0
2020-01-25,0,0,0,0,0
2020-01-26,0,0,0,0,0
2020-01-27,0,0,0,0,0
...,...,...,...,...,...
2022-04-13,31380,51185,90702,0,0
2022-04-14,35562,49593,88340,0,0
2022-04-15,35359,45422,81256,0,0
2022-04-16,34972,41816,75608,0,0


In [63]:
vaccine_us_df["DateString"] = vaccine_us_df.index.strftime("%Y-%m-%d")
vaccine_us_df.plot_bokeh(kind='line',
                    figsize =(800,600),
                    disable_scientific_axes="y",
                    xlabel = "Date",
                    ylabel = "Cases/Vaccinated ",
                    yticks = np.arange(0, max(vaccine_us_df['Fully Vaccinated']), 100000),
                    title = "Covid Cases and Vaccinations",
                    fontsize_title = 25,
                    fontsize_label = 18,
                    zooming=False,
                    panning=False,
                    hovertool_string="""
                        <h5> Date        : @{DateString} </h5> 
                        <h5> US Cases    : @{US-cases} </h5>
                        <h5> Fully Vaxd  : @{Fully Vaccinated} </h5>
                        """,
                    legend = "top_left")

We can conclude that vaccinations were successfull!
After the first doses of vaccination we can see that Covid cases began to subside and that helped get through the Delta variant.
The Omicron variant is highly transmissible and from the plot we can see that cases rose rapidly.
The good news is that as the booster dose became effective, the total covid cases began their downward trend.