### This project features interactive maps and HTML markup. Since Github does not support dynamic displays for notebooks, it is necessary to download the project to render the elements and to manipulate the data visualizations.

# Data Visualization: Part C 
## *A Closer Look at the Philippines & On a More Scientific Note*

This is the <b>last in a series of five Jupyter notebooks</b> on the Novel Corona Virus 2019 Dataset. This activity is in partial fulfillment of the Tidy Tuesdays deliverables for probationary Lyrids of the <b>Center for Complexity and Emerging Technologies, College of Computer Studies, De La Salle University</b>.

<b>The COVID-19 pandemic is a serious and unprecedented global health crisis.</b> Through this activity, the author of this series of Jupyter notebooks hopes to increase awareness on the importance of data-driven policy directions and to hopefully contribute to the present discourse on global and national trends related to the spread of the coronavirus. In this regard, this attempt to explore and visualize the said dataset does not seek to reduce the gravity of the pandemic to mere numerical figures nor to present a professional, rigorous statistical analysis.

<hr/>

The required dataset for this Tidy Tuesdays activity is the Novel Corona Virus 2019 Dataset from Kaggle: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset.

To enrich the analysis and visualization, the following datasets were integrated:
- Population by Country - 2020: https://www.kaggle.com/tanuprabhu/population-by-country-2020
- Coordinates of US States: https://developers.google.com/public-data/docs/canonical/states_csv
- Coordinates of Countries: https://developers.google.com/public-data/docs/canonical/countries_csv

These datasets are stored in the folder <code>data</code> of the repository.

# TABLE OF CONTENTS
I. [Preliminaries](#Preliminaries) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A. [More Data Preparation](#DataPrep) <br/>
II. [A Closer Look at the Philippines](#PH) <br/>
III. [On a More Scientific Note](#Science) <br/>
     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A. [Flattening the Curve?](#Curve) <br/>
     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; B. [Case Fatality Ratio](#Ratio) <br/>
IV. [References](#References) <br/>

<hr/>

# <a name = "Preliminaries" style = "color:black;">PRELIMINARIES</a> (<i>Lifted from Data Preparation & Analysis</i>)

<b>Due to restrictions related to the size of files in Github repositories, the project had to be divided into separate notebooks. In this regard, this section is just a repeat of pertinent code excerpts from the data preparation and analysis phases. </b> 

For the complete documentation, please refer to the aforementioned notebooks: <code>1. Data Preparation.ipynb</code> and <code>2. Data Analysis.ipynb</code>.

In [1]:
import pandas as pd
import numpy as np

import re

import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected = True)

pd.options.mode.chained_assignment = None 

NUM_ROWS = 10

pd.set_option('display.max_rows', NUM_ROWS)
pd.set_option('display.min_rows', NUM_ROWS)

In [2]:
# http://country.io/names.json

standardized_names = [("('St. Martin',)", "Saint Martin"), 
                      ("St. Martin", "Saint Martin"),
                      ("The Bahamas", "Bahamas"),
                      ("Bahamas, The", "Bahamas"),
                      ("The Gambia", "Gambia"),
                      ("Gambia, The", "Gambia"),
                      ("Congo (Kinshasa)", "Republic of the Congo"),
                      ("Congo (Brazzaville)", "Democratic Republic of the Congo"),
                      ("Cabo Verde", "Cape Verde"),
                      ("Timor-Leste", "East Timor"),
                      ("occupied Palestinian territory", "West Bank and Gaza"),
                      ("US", "United States"),
                      ("UK", "United Kingdom"),
                      ("Holy See", "Vatican"),
                      ("Vatican City", "Vatican"),
                      ("Cote d'Ivoire", "Ivory Coast"),
                      ("Taiwan*", "Taiwan"),
                      ("Czechia", "Czech Republic"),
                      (" Azerbaijan", "Azerbaijan")]

cruise_ships = [("Diamond Princess", "Cruise Ships"),
                ("MS Zaandam", "Cruise Ships"),
                ("Others", "Cruise Ships")]

In [3]:
data_raw = pd.read_csv('data/covid_19_data.csv')
data = data_raw.copy(deep = True)

data.drop(columns = ['SNo', 'Province/State', 'Last Update'], axis = 1, inplace = True)

for name in standardized_names:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

data['ObservationDate'] = pd.to_datetime(data['ObservationDate'], format = '%m/%d/%Y')
data['Confirmed'] = data['Confirmed'].astype('int64')
data['Deaths'] = data['Deaths'].astype('int64')
data['Recovered'] = data['Recovered'].astype('int64')

In [4]:
confirmed_raw = pd.read_csv('data/time_series_covid_19_confirmed.csv')

In [5]:
population_raw = pd.read_csv('data/population_by_country_2020.csv')
population = population_raw.copy(deep = True)

population.drop(columns = ['Yearly Change', 'Net Change', 'Density (P/Km2)', 'Land Area (Km2)', 'Migrants (net)', 
                          'Fert. Rate', 'Med. Age', 'Urban Pop %', 'World Share'], \
                axis = 1, inplace = True)

In [6]:
more_standardized_names = [("Czech Republic (Czechia)", "Czech Republic"),
                           ("China", "Mainland China"),
                           ("Congo", "Republic of the Congo"),
                           ("DR Congo", "Democratic Republic of the Congo"),
                           ("DR Republic of the Congo", "Democratic Republic of the Congo"),
                           ("Sao Tome & Principe", "Sao Tome and Principe"),
                           ("St. Vincent & Grenadines", "Saint Vincent and the Grenadines"),
                           ("Holy See", "Vatican"),
                           ("Côte d'Ivoire", "Ivory Coast")]

for name in more_standardized_names:
    population['Country (or dependency)'] = population['Country (or dependency)'].str.replace(re.escape(name[0]), name[1])

In [7]:
agg_data_date = data.groupby(['Country/Region', 'ObservationDate'], as_index = False).sum()
agg_data_date['Active'] = agg_data_date['Confirmed'] - agg_data_date['Deaths'] - agg_data_date['Recovered']

for i in range(65373, 65449):
    agg_data_date.at[i, 'Recovered'] = 6399531
    
for i in range(6094, 6203):
    agg_data_date.at[i, 'Recovered'] = 31130
    
for i in range(59682, 59954):
    agg_data_date.at[i, 'Recovered'] = 4971
    
for i in range(54456, 54680):
    agg_data_date.at[i, 'Recovered'] = 15564

agg_confirmed_date = agg_data_date.groupby('ObservationDate', as_index = False).sum()

In [8]:
daily_agg_confirmed_date = agg_confirmed_date.copy(deep = True)
daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Confirmed'].diff()
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Deaths'].diff()
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Recovered'].diff()
daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Active'].diff()

daily_agg_confirmed_date = daily_agg_confirmed_date.iloc[1:]
daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Daily Confirmed'].astype('int64')
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Daily Deaths'].astype('int64')
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Daily Recovered'].astype('int64')
daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Daily Active'].astype('int64')

daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Daily Confirmed'].apply(lambda num: 0 if num < 0 else num)
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Daily Deaths'].apply(lambda num: 0 if num < 0 else num)
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Daily Recovered'].apply(lambda num: 0 if num < 0 else num)

daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Daily Confirmed'] - daily_agg_confirmed_date['Daily Deaths'] \
                                            - daily_agg_confirmed_date['Daily Recovered']

In [9]:
WINDOW = 7

moving_agg_confirmed_date = daily_agg_confirmed_date.copy(deep = True)
moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Confirmed'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Deaths'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Recovered'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Active'].rolling(WINDOW).mean()

In [10]:
ph_agg_data_date = agg_data_date.loc[agg_data_date['Country/Region'] == 'Philippines']
ph_agg_confirmed_date = ph_agg_data_date.groupby('ObservationDate', as_index = False).sum()
ph_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active
0,2020-01-23,0,0,0,0
1,2020-01-30,1,0,0,1
2,2020-01-31,1,0,0,1
3,2020-02-01,1,0,0,1
4,2020-02-02,2,1,0,1
...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817
392,2021-02-24,566420,12129,523321,30970
393,2021-02-25,568680,12201,524042,32437
394,2021-02-26,571327,12247,524582,34498


In [11]:
ph_daily_agg_confirmed_date = ph_agg_confirmed_date.copy(deep = True)
ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Confirmed'].diff()
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Deaths'].diff()
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Recovered'].diff()
ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Active'].diff()

ph_daily_agg_confirmed_date = ph_daily_agg_confirmed_date.iloc[1:]
ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Daily Confirmed'].astype('int64')
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Daily Deaths'].astype('int64')
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Daily Recovered'].astype('int64')
ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Daily Active'].astype('int64')

ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Daily Confirmed'].apply(lambda num: 0 if num < 0 else num)
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Daily Deaths'].apply(lambda num: 0 if num < 0 else num)
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Daily Recovered'].apply(lambda num: 0 if num < 0 else num)

ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Daily Confirmed'] - ph_daily_agg_confirmed_date['Daily Deaths'] \
                                            - ph_daily_agg_confirmed_date['Daily Recovered']

In [12]:
WINDOW = 7

ph_moving_agg_confirmed_date = ph_daily_agg_confirmed_date.copy(deep = True)
ph_moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Confirmed'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Deaths'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Recovered'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Active'].rolling(WINDOW).mean()

## <a name = "DataPrep" style = "color:black;">***More Data Preparation***</a>

In order to prepare our data for visualization, we group the entries according to their observation dates and incorporate the data pertaning to their population as we have done during the analysis of the relative values. 

<i>Each observation date was converted to a string since this is the data type accepted by the <code>Plotly</code> library.

In [13]:
moving_agg_data_date = data.groupby(['ObservationDate', 'Country/Region'], as_index = False).sum()
moving_agg_data_date = moving_agg_data_date.sort_values(by = ['ObservationDate'])
moving_agg_data_date['ObservationDate'] = moving_agg_data_date['ObservationDate'].dt.strftime('%m/%d/%Y')

moving_agg_data_date = pd.merge(moving_agg_data_date, population, left_on = ['Country/Region'], right_on = ['Country (or dependency)'])
moving_agg_data_date['Active'] = moving_agg_data_date['Confirmed'] - moving_agg_data_date['Deaths'] - moving_agg_data_date['Recovered']

moving_agg_data_date

Unnamed: 0,ObservationDate,Country/Region,Confirmed,Deaths,Recovered,Country (or dependency),Population (2020),Active
0,01/22/2020,Hong Kong,0,0,0,Hong Kong,7507523,0
1,01/23/2020,Hong Kong,2,0,0,Hong Kong,7507523,2
2,01/24/2020,Hong Kong,2,0,0,Hong Kong,7507523,2
3,01/25/2020,Hong Kong,5,0,0,Hong Kong,7507523,5
4,01/26/2020,Hong Kong,8,0,0,Hong Kong,7507523,8
...,...,...,...,...,...,...,...,...
65838,02/23/2021,Micronesia,1,0,1,Micronesia,115231,0
65839,02/24/2021,Micronesia,1,0,1,Micronesia,115231,0
65840,02/25/2021,Micronesia,1,0,1,Micronesia,115231,0
65841,02/26/2021,Micronesia,1,0,1,Micronesia,115231,0


We compute for the (relative) number of cases per 100,000 since our visualization involves comparing values across countries. This is in consonance with the recommended data preparation by Adams, Li, Zhang, and Chen (2020):
<blockquote> The basic principles for their [choropleth maps'] creation require normalized data rather than absolute data. Cartographers base this criterion on the fact that the difference in enumeration unit size can alter how a spatial distribution appears. Thus, an underlying principle in choropleth mapping is to normalize data before any attempt to symbolize it.</blockquote>

Although the visualizations here employ heat maps rather than choropleth maps, the same principles hold.

In [14]:
NORMALIZATION = 100000

moving_agg_data_date['Confirmed per 100k'] = moving_agg_data_date['Confirmed'] * NORMALIZATION / moving_agg_data_date['Population (2020)']
moving_agg_data_date['Deaths per 100k'] = moving_agg_data_date['Deaths'] * NORMALIZATION / moving_agg_data_date['Population (2020)']
moving_agg_data_date['Recovered per 100k'] = moving_agg_data_date['Recovered'] * NORMALIZATION / moving_agg_data_date['Population (2020)']
moving_agg_data_date['Active per 100k'] = moving_agg_data_date['Active'] * NORMALIZATION / moving_agg_data_date['Population (2020)']

moving_agg_data_date.drop(columns = ['Country (or dependency)', 'Population (2020)', 'Confirmed', 'Deaths', 'Recovered', 'Active'],
                          inplace = True)

moving_agg_data_date

Unnamed: 0,ObservationDate,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
0,01/22/2020,Hong Kong,0.000000,0.0,0.000000,0.00000
1,01/23/2020,Hong Kong,0.026640,0.0,0.000000,0.02664
2,01/24/2020,Hong Kong,0.026640,0.0,0.000000,0.02664
3,01/25/2020,Hong Kong,0.066600,0.0,0.000000,0.06660
4,01/26/2020,Hong Kong,0.106560,0.0,0.000000,0.10656
...,...,...,...,...,...,...
65838,02/23/2021,Micronesia,0.867822,0.0,0.867822,0.00000
65839,02/24/2021,Micronesia,0.867822,0.0,0.867822,0.00000
65840,02/25/2021,Micronesia,0.867822,0.0,0.867822,0.00000
65841,02/26/2021,Micronesia,0.867822,0.0,0.867822,0.00000


We compute the 7-day moving averages to dampen the effect of backlogs, misencoded data, and misclassified cases on the daily tally of COVID-19-related cases.

In [15]:
moving_agg_data_date['Daily Confirmed per 100k'] = \
    moving_agg_data_date.groupby('Country/Region')['Confirmed per 100k'].transform(lambda x: x.diff())
moving_agg_data_date['Daily Deaths per 100k'] = \
    moving_agg_data_date.groupby('Country/Region')['Deaths per 100k'].transform(lambda x: x.diff())
moving_agg_data_date['Daily Recovered per 100k'] = \
    moving_agg_data_date.groupby('Country/Region')['Recovered per 100k'].transform(lambda x: x.diff())
moving_agg_data_date['Daily Active per 100k'] = \
    moving_agg_data_date.groupby('Country/Region')['Active per 100k'].transform(lambda x: x.diff())

moving_agg_data_date.dropna(inplace = True)
moving_agg_data_date = moving_agg_data_date.reset_index(drop = True)

moving_agg_data_date['Daily Confirmed per 100k'] = moving_agg_data_date['Daily Confirmed per 100k'].apply(lambda num: 0 if num < 0 else num)
moving_agg_data_date['Daily Deaths per 100k'] = moving_agg_data_date['Daily Deaths per 100k'].apply(lambda num: 0 if num < 0 else num)
moving_agg_data_date['Daily Recovered per 100k'] = moving_agg_data_date['Daily Recovered per 100k'].apply(lambda num: 0 if num < 0 else num)

moving_agg_data_date['Daily Active per 100k'] = moving_agg_data_date['Daily Confirmed per 100k'] - moving_agg_data_date['Daily Deaths per 100k'] \
                                            - moving_agg_data_date['Daily Recovered per 100k']

WINDOW = 7

moving_agg_data_date['Daily Confirmed per 100k (7-Day Rolling Average)'] = \
    moving_agg_data_date.groupby('Country/Region')['Daily Confirmed per 100k'].transform(lambda x: x.rolling(WINDOW).mean())
moving_agg_data_date['Daily Deaths per 100k (7-Day Rolling Average)'] = \
    moving_agg_data_date.groupby('Country/Region')['Daily Deaths per 100k'].transform(lambda x: x.rolling(WINDOW).mean())
moving_agg_data_date['Daily Recovered per 100k (7-Day Rolling Average)'] = \
    moving_agg_data_date.groupby('Country/Region')['Daily Recovered per 100k'].transform(lambda x: x.rolling(WINDOW).mean())
moving_agg_data_date['Daily Active per 100k (7-Day Rolling Average)'] = \
    moving_agg_data_date.groupby('Country/Region')['Daily Active per 100k'].transform(lambda x: x.rolling(WINDOW).mean())

moving_agg_data_date.dropna(inplace = True)
moving_agg_data_date = moving_agg_data_date.reset_index(drop = True)
moving_agg_data_date2 = moving_agg_data_date.copy(deep = True)

moving_agg_data_date

Unnamed: 0,ObservationDate,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k,Daily Confirmed per 100k,Daily Deaths per 100k,Daily Recovered per 100k,Daily Active per 100k,Daily Confirmed per 100k (7-Day Rolling Average),Daily Deaths per 100k (7-Day Rolling Average),Daily Recovered per 100k (7-Day Rolling Average),Daily Active per 100k (7-Day Rolling Average)
0,01/29/2020,Hong Kong,0.133200,0.0,0.000000,0.13320,0.02664,0.0,0.0,0.02664,0.019029,0.0,0.0,0.019029
1,01/30/2020,Hong Kong,0.133200,0.0,0.000000,0.13320,0.00000,0.0,0.0,0.00000,0.015223,0.0,0.0,0.015223
2,01/31/2020,Hong Kong,0.159840,0.0,0.000000,0.15984,0.02664,0.0,0.0,0.02664,0.019029,0.0,0.0,0.019029
3,02/01/2020,Hong Kong,0.173160,0.0,0.000000,0.17316,0.01332,0.0,0.0,0.01332,0.015223,0.0,0.0,0.015223
4,02/02/2020,Hong Kong,0.199800,0.0,0.000000,0.19980,0.02664,0.0,0.0,0.02664,0.013320,0.0,0.0,0.013320
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64471,02/23/2021,Micronesia,0.867822,0.0,0.867822,0.00000,0.00000,0.0,0.0,0.00000,0.000000,0.0,0.0,0.000000
64472,02/24/2021,Micronesia,0.867822,0.0,0.867822,0.00000,0.00000,0.0,0.0,0.00000,0.000000,0.0,0.0,0.000000
64473,02/25/2021,Micronesia,0.867822,0.0,0.867822,0.00000,0.00000,0.0,0.0,0.00000,0.000000,0.0,0.0,0.000000
64474,02/26/2021,Micronesia,0.867822,0.0,0.867822,0.00000,0.00000,0.0,0.0,0.00000,0.000000,0.0,0.0,0.000000


Finally, we select the pertinent entries related to the Philippines and neighboring Southeast Asian countries.

In [16]:
philippines = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Philippines"]
philippines = philippines.reset_index(drop = True)

brunei = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Brunei"]
brunei = brunei.reset_index(drop = True)

cambodia = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Cambodia"]
cambodia = cambodia.reset_index(drop = True)

indonesia = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Indonesia"]
indonesia = indonesia.reset_index(drop = True)

laos = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Laos"]
laos = laos.reset_index(drop = True)

singapore = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Singapore"]
singapore = singapore.reset_index(drop = True)

thailand = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Thailand"]
thailand = thailand.reset_index(drop = True)

vietnam = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Vietnam"]
vietnam = vietnam.reset_index(drop = True)

malaysia = moving_agg_data_date2.loc[moving_agg_data_date2['Country/Region'] == "Malaysia"]
malaysia = malaysia.reset_index(drop = True)

se_asia = pd.concat([laos, cambodia, vietnam, thailand, brunei, indonesia, philippines, malaysia, singapore],
                   ignore_index = True)
se_asia['ObservationDate'] = pd.to_datetime(se_asia['ObservationDate'], format = '%m/%d/%Y')

se_asia

Unnamed: 0,ObservationDate,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k,Daily Confirmed per 100k,Daily Deaths per 100k,Daily Recovered per 100k,Daily Active per 100k,Daily Confirmed per 100k (7-Day Rolling Average),Daily Deaths per 100k (7-Day Rolling Average),Daily Recovered per 100k (7-Day Rolling Average),Daily Active per 100k (7-Day Rolling Average)
0,2020-03-31,Laos,0.123397,0.000000,0.000000,0.123397,0.013711,0.0,0.000000,0.013711,0.013711,0.0,0.000000,0.013711
1,2020-04-01,Laos,0.137108,0.000000,0.000000,0.137108,0.013711,0.0,0.000000,0.013711,0.013711,0.0,0.000000,0.013711
2,2020-04-02,Laos,0.137108,0.000000,0.000000,0.137108,0.000000,0.0,0.000000,0.000000,0.007835,0.0,0.000000,0.007835
3,2020-04-03,Laos,0.137108,0.000000,0.000000,0.137108,0.000000,0.0,0.000000,0.000000,0.007835,0.0,0.000000,0.007835
4,2020-04-04,Laos,0.137108,0.000000,0.000000,0.137108,0.000000,0.0,0.000000,0.000000,0.003917,0.0,0.000000,0.003917
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3394,2021-02-23,Singapore,1022.186899,0.495022,1019.967834,1.724043,0.068279,0.0,0.119488,-0.051209,0.178013,0.0,0.224345,-0.046332
3395,2021-02-24,Singapore,1022.306387,0.495022,1020.104392,1.706973,0.119488,0.0,0.136558,-0.017070,0.168259,0.0,0.207275,-0.039017
3396,2021-02-25,Singapore,1022.477085,0.495022,1020.514065,1.467997,0.170697,0.0,0.409674,-0.238976,0.165820,0.0,0.258485,-0.092664
3397,2021-02-26,Singapore,1022.698991,0.495022,1020.821321,1.382648,0.221907,0.0,0.307255,-0.085349,0.163382,0.0,0.258485,-0.095103


# <a name = "PH" style = "color:black;">A CLOSER LOOK AT THE PHILIPPINES</a>

For this set of visualizations, we are going to take a closer look at our country, the Philippines, vis-a-vis neighboring countries in Southeast Asia. We lift selected plots from our data analysis. In particular, it is important to communicate the progression in the 7-day rolling average of COVID-19-related cases.

In [17]:
fig = make_subplots(rows = 1, cols = 2)

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'],
                    name = 'Daily Confirmed Cases (7-Day Rolling Average)',
                    marker_color = 'cornflowerblue'),
             row = 1, col = 1)

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'],
                    name = 'Daily Active Cases (7-Day Rolling Average)',
                    marker_color = 'cornflowerblue'),
             row = 1, col = 2)

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'],
                    name = 'Daily Recoveries (7-Day Rolling Average)',
                    marker_color = 'darkseagreen'),
             row = 1, col = 2)

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'],
                    name = 'Daily Deaths (7-Day Rolling Average)',
                    marker_color = 'indianred'),
             row = 1, col = 2)

fig.update_layout(xaxis1_rangeslider_visible = True, xaxis2_rangeslider_visible = True,
                 title = "7-Day Rolling Average of COVID-19-Related Cases in the Philippines",
                 xaxis1_title = "Observation Dates",
                 xaxis2_title = "Observation Dates",
                 yaxis_title = "7-Day Rolling Average of COVID-19-Related Cases")

fig.show()

The trend with respect to the 7-day rolling average of COVID-19-related confirmed cases in our country is less monotonic than the global pattern. Even with the demonstrated smoothening of data via moving averages, jitters continue to be noticeable to a certain degree. Nevertheless, the overall trend continues to be discernible: increasing from March to August 2020 and decreasing from mid-September to the end of December 2020 — which is in consonance with the analyses from the previous graphs. Again, the tail end of the graph seems to be indicative of a resurgence in the rate of transmission.

Giving an analysis of the 7-day rolling average of active cases, recoveries, and deaths in the Philippines is also not as straightfoward due to fluctiations in the trend. Generally, the rolling average of recoveries outweigh both that of active cases and deaths. The largest gap between the rolling average of recoveries and that of active cases was recorded on the first and third weeks of August.

On the third week of January, however, the margin between these two noticeably became smaller; in light of the previous analyses, this might be due to a surge in the number of confirmed cases in our country post-holiday season. In relation to this, it may be relevant to hone in on the tail-end of the graph, which is indicative of an increase in the number of active cases. In hindsight, this may have been one of the early indicators of the resurgence in the virality of transmission that our country is currently facing.

First, we construct a dynamic heatmap map for the daily number of confirmed cases per 100,000.

<i>Note that this is an interactive map. Hovering on regions or adjusting the range slider allows for a more granular look at the data.</i>

In [18]:
fig = go.Figure(data = go.Heatmap(z = se_asia['Daily Confirmed per 100k (7-Day Rolling Average)'],
                                  y = se_asia['Country/Region'],
                                  x = se_asia['ObservationDate'],
                                  colorscale = "YlOrRd"))

fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)'},
                 title = "7-Day Rolling Average of COVID-19-Related Confirmed Cases in Southeast Asia",
                 xaxis_title = "Observation Dates",
                 yaxis_title = "Country")

fig.show()

From the map, we can see the periods of surges in the rolling average of confirmed cases: April to May 2020 for Singapore, January to February 2021 for Malaysia, and August for the Philippines. In particular, Singapore was the most heavily affected, peaking at 17.13 cases per 100,000 people on April 24, 2020. It can also be noted that the gradient for our country is getting slightly more intense toward the tail end, which may be considered an early "warning sign" of the present resurgence in the number of COVID-19-related cases. 

Second, we construct a dynamic heatmap map for the daily number of active cases per 100,000.

<i>Note that this is an interactive map. Hovering on regions or adjusting the range slider allows for a more granular look at the data.</i>

In [19]:
fig = go.Figure(data = go.Heatmap(z = se_asia['Daily Active per 100k (7-Day Rolling Average)'],
                                  y = se_asia['Country/Region'],
                                  x = se_asia['ObservationDate'],
                                  colorscale = "RdYlBu_r",
                                  zmid = 0))

fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)'},
                 title = "7-Day Rolling Average of COVID-19-Related Active Cases in Southeast Asia",
                 xaxis_title = "Observation Dates",
                 yaxis_title = "Country")

fig.show()

The peak of the daily active cases in Singapore was in mid- to late April, but the country was able to follow it up with a subsequent monthlong periods of decrease in May, June, and August. On the other hand, in the Philippines, the highest number of daily active causes was in mid-August. While our country is also able to punctuate this with weeklong periods of decreased active cases, our trend is a noticeably more fluctuating.

Third, we construct a dynamic heatmap map for the daily number of recoveries per 100,000.

<i>Note that this is an interactive map. Hovering on regions or adjusting the range slider allows for a more granular look at the data.</i>

In [20]:
fig = go.Figure(data = go.Heatmap(z = se_asia['Daily Recovered per 100k (7-Day Rolling Average)'],
                                  y = se_asia['Country/Region'],
                                  x = se_asia['ObservationDate'],
                                  colorscale = "PuBuGn"))

fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)'},
                 title = "7-Day Rolling Average of COVID-19-Related Recoveries in Southeast Asia",
                 xaxis_title = "Observation Dates",
                 yaxis_title = "Country")

fig.show()

Among the Southeast Asian countries, Singapore and Malaysia have periods where there are prominently high recovery rates; usually, these occur weeks after a surge in the number of confirmed cases. The rolling average of recoveries in the Philippines peaks on the first and third weeks of August although it does seem to decrease as the months progress. This is an opposite trend when compared  toIndonesia, where the daily tally of recoveries exhibits an increase in the first quarter of 2021.

Finally, we construct a dynamic heatmap map for the daily number of deaths per 100,000.

<i>Note that this is an interactive map. Hovering on regions or adjusting the range slider allows for a more granular look at the data.</i>

In [21]:
fig = go.Figure(data = go.Heatmap(z = se_asia['Daily Deaths per 100k (7-Day Rolling Average)'],
                                  y = se_asia['Country/Region'],
                                  x = se_asia['ObservationDate'],
                                  colorscale = "YlOrRd"))

fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)'},
                 title = "7-Day Rolling Average of COVID-19-Related Deaths in Southeast Asia",
                 xaxis_title = "Observation Dates",
                 yaxis_title = "Country")

fig.show()

This is, perhaps, one of the most alarming indicators of the gravity of the pandemic here in Southeast Asia. In particular, our country, Malaysia, and Indonesia tally the highest daily number of deaths related to COVID-19. In Indonesia, the death toll has climbed to 0.11 per 100,000 people on January 31. Meanwhile, in the Philippines, the trend seems to be a noticeably protracted, as evidenced with the rather extended "band" spanning July 2020 until March 2021. Although there were moderate successes near the end of 2020, the number of deaths is continuing to rise again in the first quarter of 2021. 

# <a name = "Science" style = "color:black;">ON A MORE SCIENTIFIC NOTE</a>

## <a name = "Curve" style = "color:black;">***Flattening the Curve?***</a>

A frequent parlance that we hear related to the COVID-19 pandemic is "flattening the curve." Generally, this refers to a plateauing in the number of cases recorded. Royal Melbourne Institute of Technology professors Sanderson, Hudson, and Osborn (2020) explains that, technically, this can be associated with the "percentage growth of the total," that is, the 
"daily number of cases as a percentage of the total so far." 

Using this formula, we can now prepare the table for the penultimate visualization.

In [22]:
ph_moving_agg_confirmed_date['Percentage Growth'] = ph_daily_agg_confirmed_date['Daily Confirmed'] / ph_daily_agg_confirmed_date['Confirmed'] * 100
ph_moving_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active,Daily Confirmed (7-Day Rolling Average),Daily Deaths (7-Day Rolling Average),Daily Recovered (7-Day Rolling Average),Daily Active (7-Day Rolling Average),Percentage Growth
1,2020-01-30,1,0,0,1,1,0,0,1,,,,,100.000000
2,2020-01-31,1,0,0,1,0,0,0,0,,,,,0.000000
3,2020-02-01,1,0,0,1,0,0,0,0,,,,,0.000000
4,2020-02-02,2,1,0,1,1,1,0,0,,,,,50.000000
5,2020-02-03,2,1,0,1,0,0,0,0,,,,,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817,1409,13,67,1329,1802.714286,83.285714,1592.142857,127.285714,0.249440
392,2021-02-24,566420,12129,523321,30970,1555,22,380,1153,1856.571429,78.857143,1612.571429,165.142857,0.274531
393,2021-02-25,568680,12201,524042,32437,2260,72,721,1467,1931.000000,75.428571,1697.000000,158.571429,0.397412
394,2021-02-26,571327,12247,524582,34498,2647,46,540,2061,2038.428571,59.714286,1684.714286,294.000000,0.463307


We then proceed to plot the resulting line graph.

In [23]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Percentage Growth'],
                    name = 'Percentage Growth',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True)

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Daily Active Cases as Percentage of Total Confirmed Cases",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Daily Active Cases as Percentage of Total Confirmed Cases")

fig.show()

From the graph above, it does seem that the country is succeeding in flattening the curve, particularly towards the last quarter of 2020; since the beginning of September 22, the percentage growth of the total has been under 1%. This can be ascribed to quarantine measures imposed by the government, heightened caution among Filipinos and a more informed understanding of disease dynamics among the healthcare sector. A consonant finding has been echoed by Dr. Guido David of the University of the Philippines, as reported by Hallare (2020).

However, it must be emphasized that this does not preclude the possibility of a resurgence occurring &mdash; which is actually happening right now in our country. In hindsight, if we zero in on the tail end of the curve, there are, indeed, early "warning indicators" at the start of February. Although it is difficult to exactly pinpoint the cause behind this resurgence, it may be ascribed to certain relaxations of policies and the possibly lax attitude of the general public during the holiday season. 

## <a name = "Ratio" style = "color:black;">***Case Fatality Ratio***</a>

Although the intuitive formula for the case fatality ratio is to divide the number of deaths by the number of confirmed cases, this is susceptible to underestimating the actual value since it requires two strict assumptions that are rather impossible to satisfy given the live data we have at hand: (1) "the likelihood of detecting cases and deaths is consistent over the course of the outbreak" and (2) "all detected cases have been resolved (that is, reported cases have either recovered or died)" (World Health Organization, 2020).

Therefore, we follow a less bias-susceptible formula suggested by the World Health Organization: <br/> <br/>

<center>$ \text{Case Fatality Ratio} = \dfrac{\text{Number of Deaths}}{\text{Number of Deaths} + \text{Number of Recoveries}} $ </center> <br/>

Using this formula, we can now prepare the table for the final visualization.

In [24]:
# https://www.who.int/news-room/commentaries/detail/estimating-mortality-from-covid-19

ph_moving_agg_confirmed_date['Case Fatality Ratio'] = ph_daily_agg_confirmed_date['Deaths'] / \
                                    (ph_daily_agg_confirmed_date['Deaths'] + ph_daily_agg_confirmed_date['Recovered']) * 100
ph_moving_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active,Daily Confirmed (7-Day Rolling Average),Daily Deaths (7-Day Rolling Average),Daily Recovered (7-Day Rolling Average),Daily Active (7-Day Rolling Average),Percentage Growth,Case Fatality Ratio
1,2020-01-30,1,0,0,1,1,0,0,1,,,,,100.000000,
2,2020-01-31,1,0,0,1,0,0,0,0,,,,,0.000000,
3,2020-02-01,1,0,0,1,0,0,0,0,,,,,0.000000,
4,2020-02-02,2,1,0,1,1,1,0,0,,,,,50.000000,100.000000
5,2020-02-03,2,1,0,1,0,0,0,0,,,,,0.000000,100.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817,1409,13,67,1329,1802.714286,83.285714,1592.142857,127.285714,0.249440,2.262788
392,2021-02-24,566420,12129,523321,30970,1555,22,380,1153,1856.571429,78.857143,1612.571429,165.142857,0.274531,2.265197
393,2021-02-25,568680,12201,524042,32437,2260,72,721,1467,1931.000000,75.428571,1697.000000,158.571429,0.397412,2.275274
394,2021-02-26,571327,12247,524582,34498,2647,46,540,2061,2038.428571,59.714286,1684.714286,294.000000,0.463307,2.281360


We then proceed to plot the resulting line graph.

In [25]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Case Fatality Ratio'],
                    name = 'Percentage Growth',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Case Fatality Ratio in the Philippines",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Case Fatality Ratio")

fig.show()

It can be observed that, since the case fatality ratio has dropped to below 3% on August 16, 2020, it has consistently maintained this trend for almost half a year. However, this should not be interpreted as a cause for complacency; honing in on the case fatality ratio in the month of February 2021, a gradual increase can be observed, with the ratio climbing up to 2.29% in February 27. Although this is admittedly negligible when the overall trend is considered, this may have been one of the early indicators of the current resurgence that our country is experiencing. 

<b>The COVID-19 pandemic is a serious and unprecedented global health crisis.</b> Through this activity, the author of this series of Jupyter notebooks hopes to increase awareness on the importance of data-driven policy directions and to hopefully contribute to the present discourse on global and national trends related to the spread of the coronavirus. In this regard, this attempt to explore and visualize the said dataset does not seek to reduce the gravity of the pandemic to mere numerical figures nor to present a professional, rigorous statistical analysis.

# <a name = "References" style = "color:black;">REFERENCES</a>

- Adams, A., Li, W., Zhang, C., & Chen, X. (2020). The disguised pandemic: the importance of data normalization in COVID-19 web mapping. <i>Public Health, 183</i>, 36-37. doi:10.1016/j.puhe.2020.04.034 <br/>
- Hallare, K. (2020, September 6). UP expert: PH’s virus curve already flattened, but no cause for excitement yet. <i>Philippine Daily Inquirer.</i> https://newsinfo.inquirer.net/1331933/up-expert-phs-virus-curve-already-flattened-but-no-cause-for-excitement-yet#ixzz6q9VP5MyI
- Sanderson, M., Hudson, I.L., & Osborn, M. (2020, April 7). <i>The bar necessities: 5 ways to understand coronavirus graphs.</i> The Conversation. https://theconversation.com/the-bar-necessities-5-ways-to-understand-coronavirus-graphs-135537 <br/>
- Nau, R. (2020). <i>Stationarity and differincing</i>. Duke University. http://people.duke.edu/~rnau/411diff.htm
- World Health Organization. (2020). <i>Estimating mortality from COVID-19.</i> https://www.who.int/news-room/commentaries/detail/estimating-mortality-from-covid-19