### This project features interactive maps and HTML markup. Since Github does not support dynamic displays for notebooks, it is necessary to download the project to render the elements and to manipulate the data visualizations.

# Data Analysis

This is the <b>second in a series of five Jupyter notebooks</b> on the Novel Corona Virus 2019 Dataset. This activity is in partial fulfillment of the Tidy Tuesdays deliverables for probationary Lyrids of the <b>Center for Complexity and Emerging Technologies, College of Computer Studies, De La Salle University</b>.

<b>The COVID-19 pandemic is a serious and unprecedented global health crisis.</b> Through this activity, the author of this series of Jupyter notebooks hopes to increase awareness on the importance of data-driven policy directions and to hopefully contribute to the present discourse on global and national trends related to the spread of the coronavirus. In this regard, this attempt to explore and visualize the said dataset does not seek to reduce the gravity of the pandemic to mere numerical figures nor to present a professional, rigorous statistical analysis.

<hr/>

The required dataset for this Tidy Tuesdays activity is the Novel Corona Virus 2019 Dataset from Kaggle: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset.

To enrich the analysis and visualization, the following datasets were integrated:
- Population by Country - 2020: https://www.kaggle.com/tanuprabhu/population-by-country-2020
- Coordinates of US States: https://developers.google.com/public-data/docs/canonical/states_csv
- Coordinates of Countries: https://developers.google.com/public-data/docs/canonical/countries_csv

These datasets are stored in the folder <code>data</code> of the repository.

# TABLE OF CONTENTS
I. [Preliminaries](#Preliminaries) <br/>
II. [Data Analysis](#DataAnalysis) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A. [On Descriptive Statistics](#DescStat) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; B. [Cumulative Cases by Country](#CumulCountry) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. [Absolute Values](#AbsolVal) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a. [Anomaly #1: Varying Latest Observation Dates](#Anomaly1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b. [Anomaly #2: Inconsistent Recovery Encoding for US, Belgium, Sweden & Serbia](#Anomaly2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c. [Incorporating Active Cases](#Active) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; d. [The Philippines](#PHabs) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. [Relative Values](#RelVal) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a. [Incorporating the Population](#Popul) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b. [Data Cleaning](#DataCleaning) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c. [Normalized Data (Cases per 100,000](#Normalized) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; d. [The Philippines](#PHrel) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C. [Cumulative Cases by Date](#CumulDate) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. [Global Scale](#GlobalScale) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a. [Anomaly Correction: Zero Recoveries in US, Belgium, Sweden & Serbia](#AnomalyCorrect) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b. [Anomaly #2: Inconsistent Recovery Encoding for US, Belgium, Sweden & Serbia](#Anomaly2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c. [Incorporating Active Cases](#Active3) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; d. [Global Trend](#GlobalTrend) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. [Philippine Context](#PHContext) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; D. [Daily Cases](#DailyCases) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. [Global Scale](#GlobalScale2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a. [Anomaly #3: Negative Number of Daily Confirmed Cases, Deaths, and Recoveries](#Anomaly3) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b. [Graph of Daily Cases (Jittery)](#Jittery1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c. [Moving Averages (More Statistically Representative)](#Move1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. [Philippine Context](#PHContext2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a. [Anomaly #3: Negative Number of Daily Confirmed Cases, Deaths, and Recoveries](#Anomaly4) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; b. [Graph of Daily Cases (Jittery)](#Jittery2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; c. [Moving Averages (More Statistically Representative)](#Move2) <br/>
III. [References](#References)
    

<hr/>

# <a name = "Preliminaries" style = "color:black;">PRELIMINARIES</a> (<i>Lifted from Data Preparation</i>)

<b>Due to restrictions related to the size of files in Github repositories, the project had to be divided into separate notebooks. In this regard, this section is just a repeat of pertinent code excerpts from the data preparation phase. </b> 

For the complete documentation, please refer to the aforementioned notebook: <code>1. Data Preparation.ipynb</code>.

In [1]:
import pandas as pd
import numpy as np

import re

import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected = True)

pd.options.mode.chained_assignment = None  # default='warn'

NUM_ROWS = 10

pd.set_option('display.max_rows', NUM_ROWS)
pd.set_option('display.min_rows', NUM_ROWS)

In [2]:
standardized_names = [("('St. Martin',)", "Saint Martin"), 
                      ("St. Martin", "Saint Martin"),
                      ("The Bahamas", "Bahamas"),
                      ("Bahamas, The", "Bahamas"),
                      ("The Gambia", "Gambia"),
                      ("Gambia, The", "Gambia"),
                      ("Congo (Kinshasa)", "Republic of the Congo"),
                      ("Congo (Brazzaville)", "Democratic Republic of the Congo"),
                      ("Cabo Verde", "Cape Verde"),
                      ("Timor-Leste", "East Timor"),
                      ("occupied Palestinian territory", "West Bank and Gaza"),
                      ("US", "United States"),
                      ("UK", "United Kingdom"),
                      ("Holy See", "Vatican"),
                      ("Vatican City", "Vatican"),
                      ("Cote d'Ivoire", "Ivory Coast"),
                      ("Taiwan*", "Taiwan"),
                      ("Czechia", "Czech Republic"),
                      (" Azerbaijan", "Azerbaijan")]

cruise_ships = [("Diamond Princess", "Cruise Ships"),
                ("MS Zaandam", "Cruise Ships"),
                ("Others", "Cruise Ships")]

In [3]:
data_raw = pd.read_csv('data/covid_19_data.csv')
data = data_raw.copy(deep = True)

data.drop(columns = ['SNo', 'Province/State', 'Last Update'], axis = 1, inplace = True)

for name in standardized_names:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

data['ObservationDate'] = pd.to_datetime(data['ObservationDate'], format = '%m/%d/%Y')
data['Confirmed'] = data['Confirmed'].astype('int64')
data['Deaths'] = data['Deaths'].astype('int64')
data['Recovered'] = data['Recovered'].astype('int64')

<hr/>

# <a name = "DataAnalysis" style = "color:black;">DATA ANALYSIS</a>

Essentially, the data form a time series of the number of confirmed cases, recoveries, and deaths related to the COVID-19 pandemic per country/region for the period <b>January 22, 2020, to February 27, 2021</b> (which spans over a year). This nature of the dataset, thus, lends itself to three themes of analysis &mdash; which, whenever applicable, we present from both global and national contexts:
- Cumulative cases by country
- Cumulative cases by date
- Daily cases

The goal of this step is twofold: (1) detecting anomalies that may have to be addressed before proceeding to visualization and further analysis (or, possibly, forecasting) and (2) recognizing notable trends and patterns that may hopefully enrich our understanding of the data at hand. 

<i>For simplicity, the terms "countries" and "regions" are used in these notebooks interchangeably, unless otherwise inferred from the context.</i>

## A. <a name = "DescStat" style = "color:black;">On Descriptive Statistics</a>

As can be seen in the graphs found on later sections of this notebook, the time series data related to the COVID-19 pandemic are <b>nonstationary</b>, that is, they exhibit a trend. Ergo, it would be inconsequential to get the mean and the standard deviation across the entire set of entries &mdash; the two most common measures of central tendency and variability, respectively &mdash; since they are expected to vary over time depending on the trend. 

Robert Nau (2020) of Duke University explains this as follows:

<blockquote> For example, if the series is consistently increasing over time, the sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and variance in future periods. And if the mean and variance of a series are not well-defined, then neither are its correlations with other variables.  </blockquote>

## B. <a name = "CumulCountry" style = "color:black;">Cumulative Cases by Country</a>

As part of exploring the dataset, we present both absolute and relative values for the analysis of cumulative cases by country, and justify why the latter provides a more accurate picture when comparing the number of cases across geographical areas.

### 1. <a name = "AbsolVal" style = "color:black;">Absolute Values</a>

First, we check if the most recent observation dates for all the countries in the dataset are uniform.

Under the hood, this is done by sorting the entries in ascending order according to the observation date, then grouping them based on the country. Finally, the observation date of the last entry (<code>.tail(1)</code>) per group pertains to the most recent observation date for that specific location.

In [4]:
most_recent_date = data.sort_values('ObservationDate').groupby('Country/Region').tail(1)

print("-- MOST RECENT OBSERVATION DATE PER COUNTRY/REGION --")
most_recent_date

-- MOST RECENT OBSERVATION DATE PER COUNTRY/REGION --


Unnamed: 0,ObservationDate,Country/Region,Confirmed,Deaths,Recovered
2684,2020-02-28,North Ireland,1,0,0
4066,2020-03-08,Republic of Ireland,21,0,0
4321,2020-03-09,Palestine,22,0,0
4697,2020-03-10,Saint Barthelemy,1,0,0
4671,2020-03-10,Faroe Islands,2,0,0
...,...,...,...,...,...
235504,2021-02-27,Mexico,8474,1059,0
235509,2021-02-27,Colombia,12321,269,11683
235511,2021-02-27,Spain,205587,6320,8716
235502,2021-02-27,Peru,57532,2439,0


### <a name = "Anomaly1" style = "color:black;">***Anomaly #1: Varying Latest Observation Dates***</a>

The first anomaly surfaces: countries have varying latest observation dates. For instance, the last entry for North Ireland is on February 28, 2020, whereas the last entries for most other countries are on February 27, 2021 &mdash; which is a year later. To have a more complete idea of this anomaly, we display the 21 countries whose most recent observation dates are prior to February 27, 2021.

In [5]:
print("-- COUNTRIES/REGIONS WITH MOST RECENT OBSERVATION DATE BEFORE FEBRUARY 27, 2021 --")
most_recent_date.loc[most_recent_date['ObservationDate'] < "2021-02-27"]

-- COUNTRIES/REGIONS WITH MOST RECENT OBSERVATION DATE BEFORE FEBRUARY 27, 2021 --


Unnamed: 0,ObservationDate,Country/Region,Confirmed,Deaths,Recovered
2684,2020-02-28,North Ireland,1,0,0
4066,2020-03-08,Republic of Ireland,21,0,0
4321,2020-03-09,Palestine,22,0,0
4697,2020-03-10,Saint Barthelemy,1,0,0
4671,2020-03-10,Faroe Islands,2,0,0
...,...,...,...,...,...
7613,2020-03-21,Puerto Rico,0,0,0
7612,2020-03-21,Jersey,0,0,0
7611,2020-03-21,Guernsey,0,0,0
7610,2020-03-21,Guam,0,0,0


It can be observed that almost all these countries are small island developing states or territories, where the damage to the healthcare and economic sectors are more pronounced. This may possibly hinder the intensity of testing efforts and, consequently, the periodicity of submission of national data to global public health bodies. 

In order to uniformize the data, we filter our dataset to countries whose latest observation date is February 27, 2021. 

In [6]:
agg_data = data.loc[data['ObservationDate'] == "2021-02-27"]
agg_data = agg_data.groupby('Country/Region', as_index = False).sum()
agg_data = agg_data.reset_index(drop = True)

print("-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (RAW) AS OF FEBRUARY 27, 2021 --")
agg_data

-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (RAW) AS OF FEBRUARY 27, 2021 --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered
0,Afghanistan,55707,2443,49288
1,Albania,106215,1775,68969
2,Algeria,112960,2979,77976
3,Andorra,10849,110,10429
4,Angola,20782,506,19315
...,...,...,...,...
188,Vietnam,2432,35,1844
189,West Bank and Gaza,181909,2025,166119
190,Yemen,2269,631,1435
191,Zambia,78202,1081,73609


### <a name = "Anomaly2" style = "color:black;">***Anomaly #2: Inconsistent Recovery Encoding for US, Belgium, Sweden & Serbia***</a>

A cursory inspection of the table above shows that the number of recoveries in the United States is equal to 0, which is clearly incorrect. 

Going through the dataset, we see that the encoding of recoveries for entries related to the United States is different: they are classified under the <code>Province/State</code> called <code>Recovered</code>. However, the pertinent entry for February 27, 2021, is 0; as a matter of fact, all <code>Recovered</code> entries since December 15, 2020, have been encoded as 0 despite the fact that the dataset presents <i>cumulative</i> tallies.

In [7]:
us_recovered = data_raw.loc[data_raw['Province/State'] == "Recovered"]
us_recovered = us_recovered[us_recovered['Country/Region'] == "US"]
us_recovered = us_recovered[us_recovered['Recovered'] > 0].tail(1)

print("-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR US")
us_recovered

-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR US


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
178673,178674,12/14/2020,Recovered,US,2020-12-15 05:26:38,0.0,0.0,6399531.0


To fix this anomaly, we set the number of recovered cases in the United States to the latest nonzero value: 6399531, taken from the December 15, 2020, entry.

In [8]:
agg_data.at[182, 'Recovered'] = 6399531

print("-- CORRECTED ENTRY FOR US --")
agg_data.loc[[182]]

-- CORRECTED ENTRY FOR US --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered
182,United States,28554465,511994,6399531


We check if any other country exhibits the same anomaly.

In [9]:
print("-- COUNTRIES WITH ANOMALY ON THE NUMBER OF RECOVERIES --")
agg_data.loc[agg_data['Recovered'] == 0]

-- COUNTRIES WITH ANOMALY ON THE NUMBER OF RECOVERIES --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered
16,Belgium,769414,22052,0
152,Serbia,456450,4429,0
167,Sweden,657309,12826,0


In [10]:
belgium_recovered = data_raw.loc[data_raw['Country/Region'] == "Belgium"]
belgium_recovered = belgium_recovered[belgium_recovered['Recovered'] > 0].tail(1)

print("-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR BELGIUM --")
belgium_recovered

-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR BELGIUM --


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
153005,153006,11/11/2020,,Belgium,2020-11-12 05:25:55,515391.0,13758.0,31130.0


In [11]:
sweden_recovered = data_raw.loc[data_raw['Country/Region'] == "Sweden"]
sweden_recovered = sweden_recovered[sweden_recovered['Recovered'] > 0].tail(1)

print("-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR SWEDEN --")
sweden_recovered

-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR SWEDEN --


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
32847,32848,06/01/2020,,Sweden,2020-06-02 02:33:08,37814.0,4403.0,4971.0


In [12]:
serbia_recovered = data_raw.loc[data_raw['Country/Region'] == "Serbia"]
serbia_recovered = serbia_recovered[serbia_recovered['Recovered'] > 0].tail(1)

print("-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR SERBIA --")
serbia_recovered

-- INVESTIGATING THE ANOMALY CONCERNING THE ENCODING OF RECOVERIES FOR SERBIA --


Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
67454,67455,07/19/2020,,Serbia,2020-07-20 05:34:40,20894.0,472.0,15564.0


In [13]:
agg_data.at[16, 'Recovered'] = 31130
agg_data.at[152, 'Recovered'] = 4971
agg_data.at[167, 'Recovered'] = 15564

The corrected table for the number of confirmed cases, deaths, and recoveries per country/region is as follows:

In [14]:
print("-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (CORRECTED) AS OF FEBRUARY 27, 2021 --")
agg_data

-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (CORRECTED) AS OF FEBRUARY 27, 2021 --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered
0,Afghanistan,55707,2443,49288
1,Albania,106215,1775,68969
2,Algeria,112960,2979,77976
3,Andorra,10849,110,10429
4,Angola,20782,506,19315
...,...,...,...,...
188,Vietnam,2432,35,1844
189,West Bank and Gaza,181909,2025,166119
190,Yemen,2269,631,1435
191,Zambia,78202,1081,73609


### <a name = "Active" style = "color:black;">***Incorporating Active Cases***</a>

One of the main points of interest in the statistical analysis of COVID-19-related data is the number of active cases, which is computed following this formula: <br/> <br/>


<center>$\text{number of active cases = number of confirmed cases} - \text{number of deaths} - \text{number of recoveries}$</center> 

In this light, we add another column to our table to reflect the number of active cases as of February 27, 2021.

In [15]:
agg_data['Active'] = agg_data['Confirmed'] - agg_data['Deaths'] - agg_data['Recovered']

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION (CORRECTED) AS OF FEBRUARY 27, 2021 --")
agg_data

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION (CORRECTED) AS OF FEBRUARY 27, 2021 --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
0,Afghanistan,55707,2443,49288,3976
1,Albania,106215,1775,68969,35471
2,Algeria,112960,2979,77976,32005
3,Andorra,10849,110,10429,310
4,Angola,20782,506,19315,961
...,...,...,...,...,...
188,Vietnam,2432,35,1844,553
189,West Bank and Gaza,181909,2025,166119,13765
190,Yemen,2269,631,1435,203
191,Zambia,78202,1081,73609,3512


We arrange the countries based on the number of <b>active cases</b>. As a preliminary visualization, we employ red gradients to correspond to the intensity of the absolute number of confirmed cases, deaths, recoveries, and active cases.

In [16]:
sorted_agg_data = agg_data.sort_values(by = 'Active', ascending = False)
sorted_agg_data = sorted_agg_data.reset_index(drop = True)

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION AS OF FEBRUARY 27, 2021 --")
print("-- Arranged based on the number of active cases --")
sorted_agg_data.style.background_gradient(cmap = 'Reds')

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION AS OF FEBRUARY 27, 2021 --
-- Arranged based on the number of active cases --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
0,United States,28554465,511994,6399531,21642940
1,United Kingdom,4182772,122939,11602,4048231
2,France,3747263,85741,261649,3399873
3,Spain,3188553,69142,150376,2969035
4,Netherlands,1098875,15668,14378,1068829
5,Brazil,10517232,254221,9371448,891563
6,Belgium,769414,22052,31130,716232
7,Sweden,657309,12826,15564,628919
8,Serbia,456450,4429,4971,447050
9,Italy,2907825,97507,2398352,411966


Below are some observations that can be noted from the table above:
- The United States has the highest number of confirmed cases (\~28.6 million), deaths (\~512 thousand), and active cases (\~21.6 million). It also has the third-highest number of recoveries (\~6.4 million).
- India has the highest number of recoveries (\~10.8 million).
- With the exception of the United States and Brazil (which belongs to the South American continent), the 10 countries that reported the highest number of active cases are all located in Europe.

To reiterate, these observations are based on the <b>absolute values</b>.

### <a name = "PHabs" style = "color:black;">***The Philippines***</a>

It may also be insightful to see how our country is responding to the pandemic vis-a-vis the global statistics.

Note that it is important to reset the index (via <code>reset_index(drop = True)</code>) for every sort key in order to determine the position of the entry related to the Philippines correctly. 

In [17]:
sorted_agg_data = sorted_agg_data.sort_values(by = 'Confirmed', ascending = False)
sorted_agg_data = sorted_agg_data.reset_index(drop = True)

print("-- Arranged based on number of confirmed cases --")
sorted_agg_data.loc[sorted_agg_data['Country/Region'] == 'Philippines']

-- Arranged based on number of confirmed cases --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
30,Philippines,574247,12289,524865,37093


In [18]:
sorted_agg_data = sorted_agg_data.sort_values(by = 'Deaths', ascending = False)
sorted_agg_data = sorted_agg_data.reset_index(drop = True)

print("-- Arranged based on number of deaths --")
sorted_agg_data.loc[sorted_agg_data['Country/Region'] == 'Philippines']

-- Arranged based on number of deaths --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
31,Philippines,574247,12289,524865,37093


In [19]:
sorted_agg_data = sorted_agg_data.sort_values(by = 'Recovered', ascending = False)
sorted_agg_data = sorted_agg_data.reset_index(drop = True)

print("-- Arranged based on number of recoveries --")
sorted_agg_data.loc[sorted_agg_data['Country/Region'] == 'Philippines']

-- Arranged based on number of recoveries --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
24,Philippines,574247,12289,524865,37093


In [20]:
sorted_agg_data = sorted_agg_data.sort_values(by = 'Active', ascending = False)
sorted_agg_data = sorted_agg_data.reset_index(drop = True)

print("-- Arranged based on number of active cases --")
sorted_agg_data.loc[sorted_agg_data['Country/Region'] == 'Philippines']

-- Arranged based on number of active cases --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
38,Philippines,574247,12289,524865,37093


In summary, the Philippines has the:
- 31<sup>st</sup> highest number of confirmed cases
- 32<sup>nd</sup> highest number of deaths
- 25<sup>th</sup> highest number of recoveries
- 39<sup>th</sup> highest number of active cases

<i>We add 1 to the index since <code>Pandas</code> follows zero-based indexing.</i>

<hr/>

### 2. <a name = "RelVal" style = "color:black;">Relative Values</a>

Although the absolute values give emphasis on the gravity of the pandemic across the globe, they cannot be employed as reliable metrics in <i>comparing</i> the response of each country to the current pandemic. For one, they do not take the population of countries into account, thus leading to both overestimations and underestimations. For instance, suppose Region A has a population of 100, and there are 10 confirmed cases. On the other hand, suppose Region B has a population of 1000, and there are 75 confirmed cases. Looking at percentages, 10% of Region A and 7.5% of Region B are affected by the pandemic. 

Therefore, statistically speaking, Region A is more gravely affected. However, if we only rely on the absolute number of cases (10 versus 75), then the gravity of the situation in Region may come under the risk of underestimation. This possibly misleading presentation and treatment of data has led Adams, Li, Zhang, and Chen (2020) to refer to the COVID-19 crisis as <b>"the disguised pandemic."</b>

### <a name = "Popul" style = "color:black;">***Incorporating the Population***</a>

Since our goal is to translate the figures into number of cases per one hundred thousand (100000) people, the first step is identifying the population of each country. Since this is outside the scope of the Novel Corona Virus 2019 Dataset, we have to look for a dataset that provides information on the 2020 population per country.

One such dataset can also be obtained from Kaggle: https://www.kaggle.com/tanuprabhu/population-by-country-2020. We load it, create a deep copy (to avoid accidental modification of the original), and display its contents.

In [21]:
population_raw = pd.read_csv('data/population_by_country_2020.csv')
population = population_raw.copy(deep = True)

print("-- 2020 Population per Country/Dependency --")
population

-- 2020 Population per Country/Dependency --


Unnamed: 0,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km2),Land Area (Km2),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,China,1440297825,0.39%,5540090,153,9388211,-348399.0,1.7,38,61%,18.47%
1,India,1382345085,0.99%,13586631,464,2973190,-532687.0,2.2,28,35%,17.70%
2,United States,331341050,0.59%,1937734,36,9147420,954806.0,1.8,38,83%,4.25%
3,Indonesia,274021604,1.07%,2898047,151,1811570,-98955.0,2.3,30,56%,3.51%
4,Pakistan,221612785,2.00%,4327022,287,770880,-233379.0,3.6,23,35%,2.83%
...,...,...,...,...,...,...,...,...,...,...,...
230,Montserrat,4993,0.06%,3,50,100,,N.A.,N.A.,10%,0.00%
231,Falkland Islands,3497,3.05%,103,0,12170,,N.A.,N.A.,66%,0.00%
232,Niue,1628,0.68%,11,6,260,,N.A.,N.A.,46%,0.00%
233,Tokelau,1360,1.27%,17,136,10,,N.A.,N.A.,0%,0.00%


We only keep the pertinent columns, namely, the names of the countries and the figures pertaining to their populations.

In [22]:
population.drop(columns = ['Yearly Change', 'Net Change', 'Density (P/Km2)', 'Land Area (Km2)', 'Migrants (net)', 
                          'Fert. Rate', 'Med. Age', 'Urban Pop %', 'World Share'], \
                axis = 1, inplace = True)

print("-- 2020 Population per Country/Dependency --")
population

-- 2020 Population per Country/Dependency --


Unnamed: 0,Country (or dependency),Population (2020)
0,China,1440297825
1,India,1382345085
2,United States,331341050
3,Indonesia,274021604
4,Pakistan,221612785
...,...,...
230,Montserrat,4993
231,Falkland Islands,3497
232,Niue,1628
233,Tokelau,1360


### <a name = "DataCleaning" style = "color:black;">***Data Cleaning***</a>

Since this is a new dataset, we have to subject it to data cleaning in order to make its encoding and formatting consistent with our current datasets. As with the previous iteration of data preparation, a cursory inspection reveals that the foremost issue is again in relation to the standardization of the names of the countries. 

In this light, a list, <code>more_standardized_names</code>, consisting of tuples was constructed. The first element in each tuple refers to the non-standardized name found in the datasets while the second element refers to the standardized one. To reiterate the documentation in the previous notebook (on data preparation), the following JSON file was consulted as reference in standardizing the names: http://country.io/names.json. 

A special case is the entry occupied Palestinian territory, which was renamed to West Bank and Gaza, since most visualization libraries are reliant on geographical names. Moreover, the distinction made in the datasets among Mainland China, Hong Kong, and Macau was also preserved since this project employs map-based (geographical) visualizations extensively.

In [23]:
more_standardized_names = [("Czech Republic (Czechia)", "Czech Republic"),
                           ("China", "Mainland China"),
                           ("Congo", "Republic of the Congo"),
                           ("DR Congo", "Democratic Republic of the Congo"),
                           ("DR Republic of the Congo", "Democratic Republic of the Congo"),
                           ("Sao Tome & Principe", "Sao Tome and Principe"),
                           ("St. Vincent & Grenadines", "Saint Vincent and the Grenadines"),
                           ("Holy See", "Vatican"),
                           ("Côte d'Ivoire", "Ivory Coast")]

Using this list that we constructed at the start of our data preparation, we standardize the names of the countries in the <code>populations</code> table. Since this standardization involves the removal of certain stray characters (like single quotes, asterisks, and parentheses), we proceed using regular expressions, thus explaining the <code>re.escape</code> found in the argument of <code>str.replace</code>.

To verify that the names have been standardized, we print the (updated) unique countries.

In [24]:
for name in more_standardized_names:
    population['Country (or dependency)'] = population['Country (or dependency)'].str.replace(re.escape(name[0]), name[1])

print(population['Country (or dependency)'].unique())

['Mainland China' 'India' 'United States' 'Indonesia' 'Pakistan' 'Brazil'
 'Nigeria' 'Bangladesh' 'Russia' 'Mexico' 'Japan' 'Ethiopia' 'Philippines'
 'Egypt' 'Vietnam' 'Democratic Republic of the Congo' 'Turkey' 'Iran'
 'Germany' 'Thailand' 'United Kingdom' 'France' 'Italy' 'Tanzania'
 'South Africa' 'Myanmar' 'Kenya' 'South Korea' 'Colombia' 'Spain'
 'Uganda' 'Argentina' 'Algeria' 'Sudan' 'Ukraine' 'Iraq' 'Afghanistan'
 'Poland' 'Canada' 'Morocco' 'Saudi Arabia' 'Uzbekistan' 'Peru' 'Angola'
 'Malaysia' 'Mozambique' 'Ghana' 'Yemen' 'Nepal' 'Venezuela' 'Madagascar'
 'Cameroon' 'Ivory Coast' 'North Korea' 'Australia' 'Niger' 'Taiwan'
 'Sri Lanka' 'Burkina Faso' 'Mali' 'Romania' 'Malawi' 'Chile' 'Kazakhstan'
 'Zambia' 'Guatemala' 'Ecuador' 'Syria' 'Netherlands' 'Senegal' 'Cambodia'
 'Chad' 'Somalia' 'Zimbabwe' 'Guinea' 'Rwanda' 'Benin' 'Burundi' 'Tunisia'
 'Bolivia' 'Belgium' 'Haiti' 'Cuba' 'South Sudan' 'Dominican Republic'
 'Czech Republic' 'Greece' 'Jordan' 'Portugal' 'Azerbaijan' 'Swe

Finally, we can merge the population values into the dataset which we are working, giving the table shown below. Note that, since the title of the column in our COVID-19 dataset is <code>Country/Region</code> and the title in our population dataset is <code>Country (or dependency)</code>, we have to drop the latter for nonredundancy.

In [25]:
relative_data = pd.merge(agg_data, population, left_on = ['Country/Region'], right_on = ['Country (or dependency)'])
relative_data.drop(columns = ['Country (or dependency)'], axis = 1, inplace = True)

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION AS OF FEBRUARY 27, 2021 --")
print("-- With 2020 Population Data --")
relative_data

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION AS OF FEBRUARY 27, 2021 --
-- With 2020 Population Data --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,Population (2020)
0,Afghanistan,55707,2443,49288,3976,39074280
1,Albania,106215,1775,68969,35471,2877239
2,Algeria,112960,2979,77976,32005,43984569
3,Andorra,10849,110,10429,310,77287
4,Angola,20782,506,19315,961,33032075
...,...,...,...,...,...,...
180,Venezuela,138739,1341,130834,6564,28421581
181,Vietnam,2432,35,1844,553,97490013
182,Yemen,2269,631,1435,203,29935468
183,Zambia,78202,1081,73609,3512,18468257


As a final sanity check, we verify that there are no null values, that is, each entry in our dataset has a country name and a population. 

Since not all countries/regions (as well as the cruise ships) have their populations recorded in the population dataset that we used, some countries have to be omitted from the analysis of relative values, thus explaining the decrease in the number of rows by 16.

In [26]:
print("Checking for null values:\n", relative_data.isnull().sum(), "\n")

Checking for null values:
 Country/Region       0
Confirmed            0
Deaths               0
Recovered            0
Active               0
Population (2020)    0
dtype: int64 



### <a name = "Normalized" style = "color:black;">***Normalized Data (Cases per 100,000)***</a>

Since our target is to express the number of cases relative to the population &mdash; more specifically, the number of cases for every one hundred thousand (100000) people in that country/region &mdash; we use the following formula for the data normalization per se: <br/> <br/>

<center>$ \text{cases per 100000} = \dfrac{\text{absolute number of cases} * 100000}{\text{population}} $</center>

We apply this formula per country, yielding four additional columns corresponding to the number of confirmed cases, deaths, recoveries, and active cases per 100000 people.

In [27]:
NORMALIZATION = 100000

relative_data['Confirmed per 100k'] = relative_data['Confirmed'] * NORMALIZATION / relative_data['Population (2020)']
relative_data['Deaths per 100k'] = relative_data['Deaths'] * NORMALIZATION / relative_data['Population (2020)']
relative_data['Recovered per 100k'] = relative_data['Recovered'] * NORMALIZATION / relative_data['Population (2020)']
relative_data['Active per 100k'] = relative_data['Active'] * NORMALIZATION / relative_data['Population (2020)']

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER 100K AS OF FEBRUARY 27, 2021 --")
relative_data

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER 100K AS OF FEBRUARY 27, 2021 --


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,Population (2020),Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
0,Afghanistan,55707,2443,49288,3976,39074280,142.566926,6.252195,126.139240,10.175491
1,Albania,106215,1775,68969,35471,2877239,3691.559860,61.691086,2397.054954,1232.813819
2,Algeria,112960,2979,77976,32005,43984569,256.817340,6.772830,177.280355,72.764155
3,Andorra,10849,110,10429,310,77287,14037.289583,142.326653,13493.860546,401.102385
4,Angola,20782,506,19315,961,33032075,62.914606,1.531844,58.473469,2.909293
...,...,...,...,...,...,...,...,...,...,...
180,Venezuela,138739,1341,130834,6564,28421581,488.146666,4.718246,460.333294,23.095126
181,Vietnam,2432,35,1844,553,97490013,2.494614,0.035901,1.891476,0.567238
182,Yemen,2269,631,1435,203,29935468,7.579638,2.107867,4.793645,0.678125
183,Zambia,78202,1081,73609,3512,18468257,423.440068,5.853287,398.570369,19.016413


Since we already have the relative values, we can now drop the columns pertaining to the absolute values.

In [28]:
relative_data.drop(columns = ['Confirmed', 'Deaths', 'Recovered', 'Active', 'Population (2020)'], axis = 1, inplace = True)

We arrange the countries based on the number of <b>active cases</b> per 100000. As a preliminary visualization, we employ red gradients to correspond to the intensity of the relative number of confirmed cases, deaths, recoveries, and active cases.

In [29]:
sorted_relative_data = relative_data.sort_values(by = 'Active per 100k', ascending = False)
sorted_relative_data = sorted_relative_data.reset_index(drop = True)

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER 100K AS OF FEBRUARY 27, 2021 --")
print("-- Arranged based on the number of active cases --")

sorted_relative_data.style.background_gradient(cmap = 'Reds')

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER 100K AS OF FEBRUARY 27, 2021 --
-- Arranged based on the number of active cases --


Unnamed: 0,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
0,United States,8617.847079,154.521753,1931.403006,6531.92232
1,Spain,6819.270208,147.872085,321.604997,6349.793126
2,Netherlands,6410.595218,91.403668,83.878092,6235.313458
3,Sweden,6501.422865,126.861567,153.94304,6220.618259
4,Belgium,6633.765147,190.128837,268.397909,6175.2384
5,United Kingdom,6155.817155,180.930255,17.074751,5957.812149
6,France,5738.628489,131.305367,400.694161,5206.62896
7,Serbia,5227.874991,50.726823,56.934531,5120.213637
8,Ireland,4426.282228,87.179447,472.260745,3866.842036
9,Cyprus,2847.580334,19.108502,170.156657,2658.315176


Below are some observations that can be noted above:
- Eight of the 10 countries that recorded the highest number of active cases per 100,000 are also among the 10 countries that recorded the highest number of absolute cases: United States, United Kingdom, France, Spain, Netherlands, Belgium, Sweden, and Serbia. This is indicative of the alarming gravity of the pandemic in these areas.
- The other two countries that are among the 10 countries which recorded the highest number of active cases per 100,000 are Ireland and Cyprus (occupying the 9<sup>th</sup> and 10<sup>th</sup> positions, respectively). Note that, with the exception of the United States, all these 10 countries are found in Europe anew.
- The United States has the highest number of active cases per 100,000.
- The microstate of San Marino, which is geographically close to Italy, has the highest number of deaths per 100,000.
- The microstate of Andorra, which is geographically close to France and Spain, has the highest numbr of confirmed cases and deaths per 100,000.

To reiterate, these observations are based on the <b>relative values</b>.

### <a name = "PHrel" style = "color:black;">***The Philippines***</a>

It may also be insightful to see how our country is responding to the pandemic vis-a-vis the global statistics.

Note that it is important to reset the index (via <code>reset_index(drop = True)</code>) for every sort key in order to determine the position of the entry related to the Philippines correctly. 

In [30]:
sorted_relative_data = sorted_relative_data.sort_values(by = 'Confirmed per 100k', ascending = False)
sorted_relative_data = sorted_relative_data.reset_index(drop = True)

print("-- Arranged based on number of confirmed cases --")
sorted_relative_data.loc[sorted_relative_data['Country/Region'] == 'Philippines']

-- Arranged based on number of confirmed cases --


Unnamed: 0,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
106,Philippines,522.849227,11.189077,477.887145,33.773004


In [31]:
sorted_relative_data = sorted_relative_data.sort_values(by = 'Deaths per 100k', ascending = False)
sorted_relative_data = sorted_relative_data.reset_index(drop = True)

print("-- Arranged based on number of deaths --")
sorted_relative_data.loc[sorted_relative_data['Country/Region'] == 'Philippines']

-- Arranged based on number of deaths --


Unnamed: 0,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
99,Philippines,522.849227,11.189077,477.887145,33.773004


In [32]:
sorted_relative_data = sorted_relative_data.sort_values(by = 'Recovered per 100k', ascending = False)
sorted_relative_data = sorted_relative_data.reset_index(drop = True)

print("-- Arranged based on number of recoveries --")
sorted_relative_data.loc[sorted_relative_data['Country/Region'] == 'Philippines']

-- Arranged based on number of recoveries --


Unnamed: 0,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
94,Philippines,522.849227,11.189077,477.887145,33.773004


In [33]:
sorted_relative_data = sorted_relative_data.sort_values(by = 'Active per 100k', ascending = False)
sorted_relative_data = sorted_relative_data.reset_index(drop = True)

print("-- Arranged based on number of active cases --")
sorted_relative_data.loc[sorted_relative_data['Country/Region'] == 'Philippines']

-- Arranged based on number of active cases --


Unnamed: 0,Country/Region,Confirmed per 100k,Deaths per 100k,Recovered per 100k,Active per 100k
104,Philippines,522.849227,11.189077,477.887145,33.773004


As a final remark, although the Philippines belongs to the 90-110 bracket when the relative number of cases is taken into account, the number of cases in our country is alarmingly high, as evidenced by our position in the 20-40 bracket in terms of the absolute number of cases.

<hr/>

## C. <a name = "CumulDate" style = "color:black;">Cumulative Cases by Date</a>

In order to further understand the impact of the pandemic since the earliest observation date in the dataset in terms of the number of confirmed cases, we also attempt to analyze the cumulative cases by date and identify some trends or patterns from both global and local foci.

### 1. <a name = "GlobalScale" style = "color:black;">Global Scale</a>

We first group the entries according to the country and the observation date.

In [34]:
agg_data_date = data.groupby(['Country/Region', 'ObservationDate'], as_index = False).sum()

print("-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (RAW) --")
print("-- Grouped per country and arranged based on the observation date")
agg_data_date

-- NUMBER OF CONFIRMED CASES, DEATHS, AND RECOVERIES PER COUNTRY/REGION (RAW) --
-- Grouped per country and arranged based on the observation date


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
0,Afghanistan,2020-02-24,1,0,0
1,Afghanistan,2020-02-25,1,0,0
2,Afghanistan,2020-02-26,1,0,0
3,Afghanistan,2020-02-27,1,0,0
4,Afghanistan,2020-02-28,1,0,0
...,...,...,...,...,...
68729,Zimbabwe,2021-02-23,35910,1448,32288
68730,Zimbabwe,2021-02-24,35960,1456,32410
68731,Zimbabwe,2021-02-25,35994,1458,32455
68732,Zimbabwe,2021-02-26,36044,1463,32539


### <a name = "AnomalyCorrect" style = "color:black;">***Anomaly Correction (Zero Recoveries in US, Belgium, Sweden & Serbia)***</a>

In reference to the anomaly mentioned in the previous section, we have to correct the data pertaining to the number of recoveries in the United States, Belgium, Sweden, and Serbia.

We display the entries related to the United States to identify their starting and ending indices.

In [35]:
us = agg_data_date.loc[agg_data_date['Country/Region'] == "United States"]

print("-- ANOMALY CORRECTION (UNITED STATES) --")
us

-- ANOMALY CORRECTION (UNITED STATES) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
65046,United States,2020-01-22,1,0,0
65047,United States,2020-01-23,1,0,0
65048,United States,2020-01-24,2,0,0
65049,United States,2020-01-25,2,0,0
65050,United States,2020-01-26,5,0,0
...,...,...,...,...,...
65444,United States,2021-02-23,28261595,502660,0
65445,United States,2021-02-24,28336097,505890,0
65446,United States,2021-02-25,28413388,508307,0
65447,United States,2021-02-26,28486394,510458,0


Since the last nonzero entry related to the United States is on December 14, 2020, we identify its index.

In [36]:
print("-- ANOMALY CORRECTION (UNITED STATES) --")

us.loc[us['ObservationDate'] == "2020-12-14"]

-- ANOMALY CORRECTION (UNITED STATES) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
65373,United States,2020-12-14,16601499,305960,6399531


Ergo, we attempt to remedy the anomaly by setting the number of recovered cases to the value found in this last nonzero entry.

In [37]:
for i in range(65373, 65449):
    agg_data_date.at[i, 'Recovered'] = 6399531

We display the entries related to Belgium to identify their starting and ending indices.

In [38]:
belgium = agg_data_date.loc[agg_data_date['Country/Region'] == "Belgium"]

print("-- ANOMALY CORRECTION (BELGIUM) --")
belgium

-- ANOMALY CORRECTION (BELGIUM) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
5813,Belgium,2020-02-04,1,0,0
5814,Belgium,2020-02-05,1,0,0
5815,Belgium,2020-02-06,1,0,0
5816,Belgium,2020-02-07,1,0,0
5817,Belgium,2020-02-08,1,0,0
...,...,...,...,...,...
6198,Belgium,2021-02-23,757696,21956,0
6199,Belgium,2021-02-24,760809,21988,0
6200,Belgium,2021-02-25,763885,22006,0
6201,Belgium,2021-02-26,766654,22034,0


Since the last nonzero entry related to Belgium is on November 11, 2020, we identify its index.

In [39]:
print("-- ANOMALY CORRECTION (BELGIUM) --")

belgium.loc[belgium['ObservationDate'] == "2020-11-11"]

-- ANOMALY CORRECTION (BELGIUM) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
6094,Belgium,2020-11-11,515391,13758,31130


Ergo, we attempt to remedy the anomaly by setting the number of recovered cases to the value found in this last nonzero entry.

In [40]:
for i in range(6094, 6203):
    agg_data_date.at[i, 'Recovered'] = 31130

We display the entries related to Sweden to identify their starting and ending indices.

In [41]:
sweden = agg_data_date.loc[agg_data_date['Country/Region'] == "Sweden"]

print("-- ANOMALY CORRECTION (SWEDEN) --")
sweden

-- ANOMALY CORRECTION (SWEDEN) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
59560,Sweden,2020-01-31,1,0,0
59561,Sweden,2020-02-01,1,0,0
59562,Sweden,2020-02-02,1,0,0
59563,Sweden,2020-02-03,1,0,0
59564,Sweden,2020-02-04,1,0,0
...,...,...,...,...,...
59949,Sweden,2021-02-23,642099,12713,0
59950,Sweden,2021-02-24,647470,12793,0
59951,Sweden,2021-02-25,652465,12798,0
59952,Sweden,2021-02-26,657309,12826,0


Since the last nonzero entry related to the United States is on June 1, 2020, we identify its index.

In [42]:
print("-- ANOMALY CORRECTION (SWEDEN) --")

sweden.loc[sweden['ObservationDate'] == "2020-06-01"]

-- ANOMALY CORRECTION (SWEDEN) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
59682,Sweden,2020-06-01,37814,4403,4971


Ergo, we attempt to remedy the anomaly by setting the number of recovered cases to the value found in this last nonzero entry.

In [43]:
for i in range(59682, 59954):
    agg_data_date.at[i, 'Recovered'] = 4971

We display the entries related to Serbia to identify their starting and ending indices.

In [44]:
serbia = agg_data_date.loc[agg_data_date['Country/Region'] == "Serbia"]

print("-- ANOMALY CORRECTION (SERBIA) --")
serbia

-- ANOMALY CORRECTION (SERBIA) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
54321,Serbia,2020-03-06,1,0,0
54322,Serbia,2020-03-07,1,0,0
54323,Serbia,2020-03-08,1,0,0
54324,Serbia,2020-03-09,1,0,0
54325,Serbia,2020-03-10,5,0,0
...,...,...,...,...,...
54675,Serbia,2021-02-23,442853,4366,0
54676,Serbia,2021-02-24,446313,4383,0
54677,Serbia,2021-02-25,449901,4398,0
54678,Serbia,2021-02-26,453240,4414,0


Since the last nonzero entry related to Serbia is on July 19, 2020, we identify its index.

In [45]:
print("-- ANOMALY CORRECTION (SERBIA) --")

serbia.loc[serbia['ObservationDate'] == "2020-07-19"]

-- ANOMALY CORRECTION (SERBIA) --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered
54456,Serbia,2020-07-19,20894,472,15564


Ergo, we attempt to remedy the anomaly by setting the number of recovered cases to the value found in this last nonzero entry.

In [46]:
for i in range(54456, 54680):
    agg_data_date.at[i, 'Recovered'] = 15564

### <a name = "Active3" style = "color:black;">***Incorporating Active Cases***</a>

After having fixed the said anomalies, we are now ready to add a column pertaining to the number of active cases.

In [47]:
agg_data_date['Active'] = agg_data_date['Confirmed'] - agg_data_date['Deaths'] - agg_data_date['Recovered']

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION (CORRECTED) --")
print("-- Grouped per country and arranged based on the observation date")
agg_data_date

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER COUNTRY/REGION (CORRECTED) --
-- Grouped per country and arranged based on the observation date


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered,Active
0,Afghanistan,2020-02-24,1,0,0,1
1,Afghanistan,2020-02-25,1,0,0,1
2,Afghanistan,2020-02-26,1,0,0,1
3,Afghanistan,2020-02-27,1,0,0,1
4,Afghanistan,2020-02-28,1,0,0,1
...,...,...,...,...,...,...
68729,Zimbabwe,2021-02-23,35910,1448,32288,2174
68730,Zimbabwe,2021-02-24,35960,1456,32410,2094
68731,Zimbabwe,2021-02-25,35994,1458,32455,2081
68732,Zimbabwe,2021-02-26,36044,1463,32539,2042


### <a name = "GlobalTrend" style = "color:black;">***Global Trend***</a>

Since our goal is to analyze the cumulative cases by date, we take the sum of the number of cases recorded for each observation date, yielding the table shown below.

In [48]:
agg_confirmed_date = agg_data_date.groupby('ObservationDate', as_index = False).sum()

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER OBSERVATION DATE (CORRECTED) --")
agg_confirmed_date

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES PER OBSERVATION DATE (CORRECTED) --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active
0,2020-01-22,557,17,30,510
1,2020-01-23,1097,34,60,1003
2,2020-01-24,941,26,39,876
3,2020-01-25,1437,42,42,1353
4,2020-01-26,2118,56,56,2006
...,...,...,...,...,...
398,2021-02-23,112109754,2485434,69746340,39877980
399,2021-02-24,112554301,2497488,69955799,40101014
400,2021-02-25,113001412,2507624,70183491,40310297
401,2021-02-26,113415604,2517422,70443759,40454423


Since the number of confirmed cases is the sum of the number of deaths, recoveries, and active cases, it warrants its own separate line graph.

<i>Note that this is an interactive line graph. Hovering on the curve or adjusting the range slider allows for a more granular look at the data.</i>

In [49]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = agg_confirmed_date['ObservationDate'],
                    y = agg_confirmed_date['Confirmed'],
                    name = 'Confirmed Cases',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Number of COVID-19-Related Confirmed Cases Worldwide",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Number of Confirmed Cases")

fig.show()

From this graph, we can see the following number of confirmed cases at the start of the following two-month periods:
- March 2020: 88.37 thousand
- May 2020: 3.35 million
- July 2020: 10.71 million
- September 2020: 25.79 million
- November 2020: 46.63 million
- January 2021: 84.05 million

These figures do not point out to any strong indicators of plateauing (or "flattening") with respect to the rate at which the cumulative number of confirmed cases increases. On the contrary, from May 2020 to July 2020, the cumulative total jumped by around 3 times; from July 2020 to September 2020, around 2.5 times; and from September 2020 to November 2020 and from November 2020 to January 2021, around 2 times.

We now delve into the "composition" of these confirmed cases: active cases, recoveries, and deaths.

<i>Note that this is an interactive line graph. Hovering on the curve or adjusting the range slider allows for a more granular look at the data.</i>

In [50]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = agg_confirmed_date['ObservationDate'],
                    y = agg_confirmed_date['Active'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = agg_confirmed_date['ObservationDate'],
                    y = agg_confirmed_date['Recovered'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = agg_confirmed_date['ObservationDate'],
                    y = agg_confirmed_date['Deaths'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Number of COVID-19-Related Cases Worldwide (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Number of COVID-19-Related Cases")

fig.show()

From this graph, we can see the following number of active cases, recoveries, and deaths (in this specified order) at the start of the following two-month periods:
- March 2020: 42.66 thousand, 42.72 thousand, 3 thousand
- May 2020: 2.05 million, 1.05 million, 239.85 thousand
- July 2020: 4.61 million, 5.58 million, 516.60 thousand
- September 2020: 7.84 million, 17.10 million, 857.87 thousand
- November 2020: 14.31 million, 31.12 million, 1.20 million
- January 2021: 28.46 million, 53.76 million, 1.84 million

Similar to the observed trend on the number of confirmed cases, these figures are not indicative of any strong indicators of plateauing (or "flattening") with respect to the rate at which these cumulative total of active cases are increasing; in fact, the cumulative total seems to double every 2-month period. On the other hand, the cumulative total of deaths shows an increase by around 50% for every 2-month period. 

The increase in the cumulative number of recoveries does not seem to follow a clear trend although it may be important to note that that, from September 2020 to November 2020 and from November 2020 to January 2021, it increased by around 81.99% and 72.75%, respectively. Notwithstanding these trends, some promising observations can still be drawn: since mid-June, there have been more recoveries than active cases and there are also significantly more recoveries than deaths, with the margins continuing to widen as well.

Some additional impressions from the graph are as follows:
- It was in late September that the total number of COVID-19-related deaths started to exceed 1 million. By late February, over 2.5 million deaths have been recorded worldwide.
- It was in mid-June where the number of recoveries started the trend of surpassing the number of active cases. Prior to it, the figures were roughly equal &mdash; and, from mid-March to mid-June (the transition from the first to the second quarter of the year 2020), the number of active cases noticeably outweighed the number of recoveries. By late February, however, the number of recoveries is around 75% higher than the number of active cases.

Although there are noticeable jitters on October 31, 2020, and December 11, 2020, it may be misleading to attempt to infer something or speculate an anomaly from these since time series data have to be viewed from a "macroscopic" lens. Moreover, given the severity of the ongoing pandemic and its toll on healthcare and data management systems, backlogs and subsequent corrections are inevitable, explaining these rather unusual one-day jitters. 

### 2. <a name = "PHContext" style = "color:black;">Philippine Context</a>

We now zero in one the cumulative cases by date in our country.

In [51]:
ph_agg_data_date = agg_data_date.loc[agg_data_date['Country/Region'] == 'Philippines']

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES IN THE PHILIPPINES --")
ph_agg_data_date

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES IN THE PHILIPPINES --


Unnamed: 0,Country/Region,ObservationDate,Confirmed,Deaths,Recovered,Active
48536,Philippines,2020-01-23,0,0,0,0
48537,Philippines,2020-01-30,1,0,0,1
48538,Philippines,2020-01-31,1,0,0,1
48539,Philippines,2020-02-01,1,0,0,1
48540,Philippines,2020-02-02,2,1,0,1
...,...,...,...,...,...,...
48927,Philippines,2021-02-23,564865,12107,522941,29817
48928,Philippines,2021-02-24,566420,12129,523321,30970
48929,Philippines,2021-02-25,568680,12201,524042,32437
48930,Philippines,2021-02-26,571327,12247,524582,34498


As a sanity test, we group the entries by observation date (with the added side effect of dropping the <code>Country/Region</code> column). Since we are looking at country-level data, it is expected that the number of entries in the resulting table and in the previous table are the same.

In [52]:
ph_agg_confirmed_date = ph_agg_data_date.groupby('ObservationDate', as_index = False).sum()

print("-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES IN THE PHILIPPINES --")
ph_agg_confirmed_date

-- NUMBER OF CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES IN THE PHILIPPINES --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active
0,2020-01-23,0,0,0,0
1,2020-01-30,1,0,0,1
2,2020-01-31,1,0,0,1
3,2020-02-01,1,0,0,1
4,2020-02-02,2,1,0,1
...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817
392,2021-02-24,566420,12129,523321,30970
393,2021-02-25,568680,12201,524042,32437
394,2021-02-26,571327,12247,524582,34498


Since the number of confirmed cases is the sum of the number of deaths, recoveries, and active cases, it warrants its own separate line graph.

<i>Note that this is an interactive line graph. Hovering on the curve or adjusting the range slider allows for a more granular look at the data.</i>

In [53]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_agg_confirmed_date['ObservationDate'],
                    y = ph_agg_confirmed_date['Confirmed'],
                    name = 'Confirmed Cases',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Number of COVID-19-Related Confirmed Cases in the Philippines",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Number of Confirmed Cases")

fig.show()

From this graph, we can see the following number of confirmed cases at the start of the following two-month periods:
- March 2020: 3
- May 2020: 8.77 thousand
- July 2020: 38.51 thousand
- September 2020: 224.26 thousand
- November 2020: 383.11 thousand
- January 2021: 475.82 thousand

The transmission of coronavirus in the Philippines has been rapid and unprecedented. From only 3 confirmed cases at the start of March 2020, it soared to magnitudes of thousands (8.77 thousand) after only a 2-month period. The cumulative total quadrupled by July 2020; after which, it increased by an even higher factor of 6 bringing the total to 383.11 thousand at the start of November 2020. 

Although it can be observed that the rate of transmission became less steeper on the period leading to the opening month of 2021, it should not be construed as an indicator of plateauing. On January 17, 2021, the total number of confirmed cases crossed the 500,000-boundary and, alarmingly, the curve is becoming steeper anew, with 574.25 thousand cases recorded on February 27, 2021. 

In hindsight, this might have been an early "warning sign"; as of the time of writing (March 2021), the country is experiencing a dramatic surge in the number of cases. This is further highlighted when we focus our analysis on the <i>daily</i> number of cases.

<i>Note that this is an interactive line graph. Hovering on the curve or adjusting the range slider allows for a more granular look at the data.</i>

In [54]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_agg_confirmed_date['ObservationDate'],
                    y = ph_agg_confirmed_date['Active'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = ph_agg_confirmed_date['ObservationDate'],
                    y = ph_agg_confirmed_date['Recovered'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = ph_agg_confirmed_date['ObservationDate'],
                    y = ph_agg_confirmed_date['Deaths'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Number of COVID-19-Related Cases in the Philippines (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Number of COVID-19-Related Cases")

fig.show()

At first glance, a striking feature in the graph shown above is the rather stair-like or "saw-toothed" shape of the curves. Closer inspection reveals that this pattern recurs regularly every seven days. This may be related to the regularity of either our submission of data to the repository from which the dataset used in this project is derived or the processing of the data submitted by our country. Nonetheless, the overall trend remains discernible.

A preliminary observation is in relation to the period at which the number of recoveries began surpassing the number of active cases, which was around mid-August for our country (for comparison, the <i>worldwide</i> number of recoveries overtook the number of active cases two months earlier, in mid-June). In a way, this evinces that our populace and our healthcare system have started to adapt to the current situation with greater caution and medical response, respectively.

From this graph, we can see the following number of active cases, recoveries, and deaths (in this specified order) at the start of the following two-month periods:

- March 2020: 1, 1, 1 <br/>
- May 2020: 7.11 thousand, 1.09 thousand, 579 <br/>
- July 2020: 26.80 thousand, 10.44 thousand, 1.27 thousand <br/>
- September 2020: 62.66 thousand, 158.01 thousand, 3.60 thousand <br/>
- November 2020: 27.12 thousand, 348.76, 7.24 thousand <br/>
- January 2021: 26.68 thousand, 439.90 million, 9.25 thousand <br/>

The trend in the Philippines with respect to the cumulative total of active cases is less monotonic compared to the global pattern identified in the previous graphs. From the onset of transmission in our country to August, there was an unprecedented increase, peaking at 83.11 thousand cases on August 15. After this, the trend has begun to move downwards until the end of 2020, which may be indicative of a degree of success in "flattening the curve" (this is explained in more detail on the notebook concerning data visualization: <code>3. Data Visualization - C.ipynb</code>).

However, focusing on the tail end of the graph, we see some alarming indicators that may be indicative of a resurgence. The number of active cases is showing an increase since the opening month of 2021, and the slope of the curve pertaining to the number of deaths is becoming steeper anew. For comparison, from the two-month period November 2020 to January 2021, there were around 2000 deaths recovered, but for the one-month period January 2021 to February 2021, over 3000 cases have already been tallied.

<hr/>

## D. <a name = "DailyCases" style = "color:black;">Daily Cases</a>

Although cumulative analyses may be useful in having a gist of the progression of the effects of the pandemic over time, they give little information on day-to-day trends. In this regard, it may also be helpful to understand our data from a more "granular" perspective by looking at the daily number of confirmed cases, deaths, recoveries, and active cases from both global and national lenses.

### 1. <a name = "GlobalScale2" style = "color:black;">Global Scale</a>

We prepare the dataset and perform the necessary data type conversion before displaying a tabular representation of the daily number of confirmed cases, deaths, recoveries, and active cases worldwide.

In [55]:
daily_agg_confirmed_date = agg_confirmed_date.copy(deep = True)
daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Confirmed'].diff()
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Deaths'].diff()
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Recovered'].diff()
daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Active'].diff()

daily_agg_confirmed_date = daily_agg_confirmed_date.iloc[1:]
daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Daily Confirmed'].astype('int64')
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Daily Deaths'].astype('int64')
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Daily Recovered'].astype('int64')
daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Daily Active'].astype('int64')

print("-- NUMBER OF DAILY CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES WORLDWIDE (RAW) --")
daily_agg_confirmed_date

-- NUMBER OF DAILY CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES WORLDWIDE (RAW) --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
1,2020-01-23,1097,34,60,1003,540,17,30,493
2,2020-01-24,941,26,39,876,-156,-8,-21,-127
3,2020-01-25,1437,42,42,1353,496,16,3,477
4,2020-01-26,2118,56,56,2006,681,14,14,653
5,2020-01-27,2927,82,65,2780,809,26,9,774
...,...,...,...,...,...,...,...,...,...
398,2021-02-23,112109754,2485434,69746340,39877980,387865,11256,278630,97979
399,2021-02-24,112554301,2497488,69955799,40101014,444547,12054,209459,223034
400,2021-02-25,113001412,2507624,70183491,40310297,447111,10136,227692,209283
401,2021-02-26,113415604,2517422,70443759,40454423,414192,9798,260268,144126


### <a name = "Anomaly3" style = "color:black;">***Anomaly #3: Negative Number of Daily Confirmed Cases, Deaths, and Recoveries***</a>

The preceding table points out to an anomaly right at the second entry: the figures pertaining to the daily confirmed cases, deaths, and recoveries are less than 0, which is impossible since they pertain to final statuses. Although this type of anomaly is inevitable due to the problem of backlogs, misencoding of records, and misclassified cases, some form of correction must be performed before performing further analysis, especially since we are looking at the figures from a more granular, day-to-day level. Otherwise, these anomalous values may lead to unexplainably sharp increases and decreases on certain days. 

Before this, however, we print the entries where the daily number of confimed cases is negative.

In [56]:
print("-- Entries with Negative Daily Number of Confirmed Cases --")
daily_agg_confirmed_date.loc[daily_agg_confirmed_date['Daily Confirmed'] < 0]

-- Entries with Negative Daily Number of Confirmed Cases --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
2,2020-01-24,941,26,39,876,-156,-8,-21,-127


Second, we print the entries where the daily number of deaths is negative.

In [57]:
print("-- Entries with Negative Daily Number of Deaths --")
daily_agg_confirmed_date.loc[daily_agg_confirmed_date['Daily Deaths'] < 0]

-- Entries with Negative Daily Number of Deaths --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
2,2020-01-24,941,26,39,876,-156,-8,-21,-127
208,2020-08-17,21915768,774946,13907670,7233152,207677,-1161,211012,-2174


Third, we print the entries where the daily number of recoveries is negative.

In [58]:
print("-- Entries with Negative Daily Number of Recoveries --")
daily_agg_confirmed_date.loc[daily_agg_confirmed_date['Daily Recovered'] < 0]

-- Entries with Negative Daily Number of Recoveries --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
2,2020-01-24,941,26,39,876,-156,-8,-21,-127
282,2020-10-30,45675758,1190731,29736239,14748788,566008,7876,-607789,1165921


A possible fix to this anomaly is to set the number of daily confirmed cases, deaths, and recoveries to 0 if the original values are negative. Under the hood, this is achieved with the application of lambda functions.

The rationale for this is to avoid an overinflation in the daily number of active cases. Since the number of active cases is computed as the difference between the number of confirmed cases and the sum of the number of deaths and recoveries, having a negative number for either of the last two parameters will inflate the number of active cases.

In [59]:
daily_agg_confirmed_date['Daily Confirmed'] = daily_agg_confirmed_date['Daily Confirmed'].apply(lambda num: 0 if num < 0 else num)
daily_agg_confirmed_date['Daily Deaths'] = daily_agg_confirmed_date['Daily Deaths'].apply(lambda num: 0 if num < 0 else num)
daily_agg_confirmed_date['Daily Recovered'] = daily_agg_confirmed_date['Daily Recovered'].apply(lambda num: 0 if num < 0 else num)

daily_agg_confirmed_date['Daily Active'] = daily_agg_confirmed_date['Daily Confirmed'] - daily_agg_confirmed_date['Daily Deaths'] \
                                            - daily_agg_confirmed_date['Daily Recovered']

print("-- NUMBER OF DAILY CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES WORLDWIDE (CORRECTED) --")
daily_agg_confirmed_date

-- NUMBER OF DAILY CONFIRMED CASES, DEATHS, RECOVERIES, AND ACTIVE CASES WORLDWIDE (CORRECTED) --


Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
1,2020-01-23,1097,34,60,1003,540,17,30,493
2,2020-01-24,941,26,39,876,0,0,0,0
3,2020-01-25,1437,42,42,1353,496,16,3,477
4,2020-01-26,2118,56,56,2006,681,14,14,653
5,2020-01-27,2927,82,65,2780,809,26,9,774
...,...,...,...,...,...,...,...,...,...
398,2021-02-23,112109754,2485434,69746340,39877980,387865,11256,278630,97979
399,2021-02-24,112554301,2497488,69955799,40101014,444547,12054,209459,223034
400,2021-02-25,113001412,2507624,70183491,40310297,447111,10136,227692,209283
401,2021-02-26,113415604,2517422,70443759,40454423,414192,9798,260268,144126


### <a name = "Jittery1" style = "color:black;">***Graph of Daily Cases (Jittery)***</a>

We attempt to visualize the daily number of confirmed cases using a line graph.

In [60]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = daily_agg_confirmed_date['ObservationDate'],
                    y = daily_agg_confirmed_date['Daily Confirmed'],
                    name = 'Daily Confirmed Cases',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Daily Number of COVID-19-Related Confirmed Cases Worlwide",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Daily Number of COVID-19-Related Confirmed Cases")

fig.show()

Similarly, we attempt to visualize the daily number of active cases, recoveries, and deaths via a line graph.

In [61]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = daily_agg_confirmed_date['ObservationDate'],
                    y = daily_agg_confirmed_date['Daily Active'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = daily_agg_confirmed_date['ObservationDate'],
                    y = daily_agg_confirmed_date['Daily Recovered'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = daily_agg_confirmed_date['ObservationDate'],
                    y = daily_agg_confirmed_date['Daily Deaths'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Daily Number of COVID-19-Related Cases Worlwide (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Daily Number of COVID-19-Related Cases")

fig.show()

Although it may possible to distill some trends from these two graphs, the occurrence of numerous jitters prevents us from making concrete analyses. In more concrete terms, there are sharp peaks and troughs that almost seem anomalous. For instance, on December 10, the number of confirmed cases peaked at 1.50 million &mdash; a statistic that is sandwiched between 669.02 thousand cases the previous day and 702.83 thousand cases the succeeding day. 

Similar observations can be made in reference to the sudden increase in the number of recoveries on October 31 and December 12 (which are isolated instances of crossing the 1,000,000-mark) and the corresponding sharp decrease in the number of active cases. These evince that the daily cases per se present themselves as "noisy data," thus prompting a more sophisticated data processing using <b>moving averages</b>.

### <a name = "Move1" style = "color:black;">***Moving Averages (More Statistically Representative)***</a>

A more statistically representative reporting of the daily number of cases is expressed through 7-day moving averages (which is also referred to <i>rolling averages</i> due to the manner of computation). The 7-day window corresponds to a week in order to dampen the effect of backlogs, misencoded data, and misclassified cases. 

In [62]:
WINDOW = 7

moving_agg_confirmed_date = daily_agg_confirmed_date.copy(deep = True)
moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Confirmed'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Deaths'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Recovered'].rolling(WINDOW).mean()
moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'] = daily_agg_confirmed_date['Daily Active'].rolling(WINDOW).mean()

moving_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active,Daily Confirmed (7-Day Rolling Average),Daily Deaths (7-Day Rolling Average),Daily Recovered (7-Day Rolling Average),Daily Active (7-Day Rolling Average)
1,2020-01-23,1097,34,60,1003,540,17,30,493,,,,
2,2020-01-24,941,26,39,876,0,0,0,0,,,,
3,2020-01-25,1437,42,42,1353,496,16,3,477,,,,
4,2020-01-26,2118,56,56,2006,681,14,14,653,,,,
5,2020-01-27,2927,82,65,2780,809,26,9,774,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
398,2021-02-23,112109754,2485434,69746340,39877980,387865,11256,278630,97979,371186.571429,9467.571429,244695.428571,117023.571429
399,2021-02-24,112554301,2497488,69955799,40101014,444547,12054,209459,223034,378251.571429,9573.428571,232626.571429,136051.571429
400,2021-02-25,113001412,2507624,70183491,40310297,447111,10136,227692,209283,384523.285714,9426.571429,231351.428571,143745.285714
401,2021-02-26,113415604,2517422,70443759,40454423,414192,9798,260268,144126,380922.857143,9239.714286,235013.142857,136670.000000


We visualize the 7-day moving averages with the use of a line graph.

In [63]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = moving_agg_confirmed_date['ObservationDate'],
                    y = moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'],
                    name = 'Daily Confirmed Cases (7-Day Rolling Average)',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "7-Day Rolling Average of COVID-19-Related Confirmed Cases Worlwide",
                 xaxis_title = "Observation Date",
                 yaxis_title = "7-Day Rolling Average of COVID-19-Related Confirmed Cases")

fig.show()

Below are some observations that can be noted from the graph above:
- There were noticeable peaks in the rolling average of confirmed cases on December 11, 2020, and January 11, 2021 &mdash; which coincide with the weeks before and after the Christmas holiday season. 
- The two-week period from December 16, 2020, to December 28, 2020 and the five-week period from January 11, 2021, to February 18, 2021, saw a decrease in the 7-day rolling averages of confirmed cases.
- Looking at the tail end of the graph, there seems to be an increase in the daily number of confirmed cases worldwide. 

In [64]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = moving_agg_confirmed_date['ObservationDate'],
                    y = moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = moving_agg_confirmed_date['ObservationDate'],
                    y = moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = moving_agg_confirmed_date['ObservationDate'],
                    y = moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "7-Day Rolling Average of COVID-19-Related Cases Worlwide (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "7-Day Rolling Average of COVID-19-Related Cases")

fig.show()

Below are some observations that can be noted from the graph above:
- Up until late April 2020, the daily number of active cases had consistently been higher compared to the daily number of recoveries. 
- Post-April 2020, the daily number of active cases is generally lower than the daily number of recoveries. However, there are some brief episodes when the reverse happened; the most pronounced of which was noted on the first two weeks of January 2021.
- Notwithstanding these trends, some promising observations can still be drawn. For instance, the number of deaths continue to be significantly lower than the number of active cases and recoveries.

Although the data points are insufficient for deriving a strong statistical conclusion, looking at the tail end of the graph, it can be seen that the daily number of active cases seems to be on the rise anew. 

### 2. <a name = "PHContext2" style = "color:black;">Philippine Context</a>

We now zero in on the daily number of cases in the Philippines. We distill the pertinent portion of the global dataset and perform the necessary data type conversion before displaying a tabular representation of the daily number of confirmed cases, deaths, recoveries, and active cases in our country.

In [65]:
ph_daily_agg_confirmed_date = ph_agg_confirmed_date.copy(deep = True)
ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Confirmed'].diff()
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Deaths'].diff()
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Recovered'].diff()
ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Active'].diff()

ph_daily_agg_confirmed_date = ph_daily_agg_confirmed_date.iloc[1:]
ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Daily Confirmed'].astype('int64')
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Daily Deaths'].astype('int64')
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Daily Recovered'].astype('int64')
ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Daily Active'].astype('int64')

ph_daily_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
1,2020-01-30,1,0,0,1,1,0,0,1
2,2020-01-31,1,0,0,1,0,0,0,0
3,2020-02-01,1,0,0,1,0,0,0,0
4,2020-02-02,2,1,0,1,1,1,0,0
5,2020-02-03,2,1,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817,1409,13,67,1329
392,2021-02-24,566420,12129,523321,30970,1555,22,380,1153
393,2021-02-25,568680,12201,524042,32437,2260,72,721,1467
394,2021-02-26,571327,12247,524582,34498,2647,46,540,2061


### <a name = "Anomaly4" style = "color:black;">***Anomaly #3: Negative Number of Daily Confirmed Cases, Deaths, and Recoveries***</a>

We check if the Philippine data exhibit any anomalies related to a negative tally of daily confirmed cases.

In [66]:
ph_daily_agg_confirmed_date.loc[ph_daily_agg_confirmed_date['Daily Confirmed'] < 0]

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active


We then proceed to check for anomalies related to a negative tally of daily deaths.

In [67]:
ph_daily_agg_confirmed_date.loc[ph_daily_agg_confirmed_date['Daily Deaths'] < 0]

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
50,2020-03-19,217,17,8,192,15,-2,3,14


We then proceed to check for anomalies related to a negative tally of daily recoveries.

In [68]:
ph_daily_agg_confirmed_date.loc[ph_daily_agg_confirmed_date['Daily Recovered'] < 0]

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
229,2020-09-14,265888,4630,207504,53754,4672,259,-64,4477
230,2020-09-15,269407,4663,207352,57392,3519,33,-152,3638
349,2021-01-12,491258,9554,458172,23532,1522,138,-34,1418
355,2021-01-18,502736,9909,465988,26839,2159,14,-3,2148
362,2021-01-25,514996,10292,475422,29282,1377,50,-190,1517
376,2021-02-08,538995,11231,499772,27992,1685,52,-26,1659
377,2021-02-09,540227,11296,499764,29167,1232,65,-8,1175


A possible fix to this anomaly is to set the number of daily confirmed cases, deaths, and recoveries to 0 if the original values are negative. Under the hood, this is achieved with the application of lambda functions.

To reiterate, the rationale for this is to avoid an overinflation in the daily number of active cases. Since the number of active cases is computed as the difference between the number of confirmed cases and the sum of the number of deaths and recoveries, having a negative number for either of the last two parameters will inflate the number of active cases.

In [69]:
ph_daily_agg_confirmed_date['Daily Confirmed'] = ph_daily_agg_confirmed_date['Daily Confirmed'].apply(lambda num: 0 if num < 0 else num)
ph_daily_agg_confirmed_date['Daily Deaths'] = ph_daily_agg_confirmed_date['Daily Deaths'].apply(lambda num: 0 if num < 0 else num)
ph_daily_agg_confirmed_date['Daily Recovered'] = ph_daily_agg_confirmed_date['Daily Recovered'].apply(lambda num: 0 if num < 0 else num)

ph_daily_agg_confirmed_date['Daily Active'] = ph_daily_agg_confirmed_date['Daily Confirmed'] - ph_daily_agg_confirmed_date['Daily Deaths'] \
                                            - ph_daily_agg_confirmed_date['Daily Recovered']
ph_daily_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active
1,2020-01-30,1,0,0,1,1,0,0,1
2,2020-01-31,1,0,0,1,0,0,0,0
3,2020-02-01,1,0,0,1,0,0,0,0
4,2020-02-02,2,1,0,1,1,1,0,0
5,2020-02-03,2,1,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817,1409,13,67,1329
392,2021-02-24,566420,12129,523321,30970,1555,22,380,1153
393,2021-02-25,568680,12201,524042,32437,2260,72,721,1467
394,2021-02-26,571327,12247,524582,34498,2647,46,540,2061


### <a name = "Jittery2" style = "color:black;">***Graph of Daily Cases (Jittery)***</a>

We attempt to visualize the daily number of confirmed cases using a line graph.

In [70]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_daily_agg_confirmed_date['ObservationDate'],
                    y = ph_daily_agg_confirmed_date['Daily Confirmed'],
                    name = 'Daily Confirmed Cases',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Daily Number of COVID-19-Related Confirmed Cases in the Philippines",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Daily Number of COVID-19-Related Confirmed Cases")

fig.show()

Similarly, we attempt to visualize the daily number of active cases, recoveries, and deaths via a line graph.

In [71]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_daily_agg_confirmed_date['ObservationDate'],
                    y = ph_daily_agg_confirmed_date['Daily Active'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = ph_daily_agg_confirmed_date['ObservationDate'],
                    y = ph_daily_agg_confirmed_date['Daily Recovered'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = ph_daily_agg_confirmed_date['ObservationDate'],
                    y = ph_daily_agg_confirmed_date['Daily Deaths'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "Daily Number of COVID-19-Related Cases in the Philippines (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "Daily Number of COVID-19-Related Cases")


fig.show()

Again, we only showed the (raw) graphs of daily cases for the completeness of this exploratory data analysis. Since they are essentially "noisy data," we are prompted to conduct a more sophisticated data processing using <b>moving averages</b>.

### <a name = "Move2" style = "color:black;">***Moving Averages (More Statistically Representative)***</a>

A more statistically representative reporting of the daily number of cases is expressed through 7-day moving averages (which is also referred to rolling averages due to the manner of computation). The 7-day window corresponds to a week in order to dampen the effect of backlogs, misencoded data, and misclassified cases.

In [72]:
WINDOW = 7

ph_moving_agg_confirmed_date = ph_daily_agg_confirmed_date.copy(deep = True)
ph_moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Confirmed'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Deaths'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Recovered'].rolling(WINDOW).mean()
ph_moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'] = ph_moving_agg_confirmed_date['Daily Active'].rolling(WINDOW).mean()

ph_moving_agg_confirmed_date

Unnamed: 0,ObservationDate,Confirmed,Deaths,Recovered,Active,Daily Confirmed,Daily Deaths,Daily Recovered,Daily Active,Daily Confirmed (7-Day Rolling Average),Daily Deaths (7-Day Rolling Average),Daily Recovered (7-Day Rolling Average),Daily Active (7-Day Rolling Average)
1,2020-01-30,1,0,0,1,1,0,0,1,,,,
2,2020-01-31,1,0,0,1,0,0,0,0,,,,
3,2020-02-01,1,0,0,1,0,0,0,0,,,,
4,2020-02-02,2,1,0,1,1,1,0,0,,,,
5,2020-02-03,2,1,0,1,0,0,0,0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,2021-02-23,564865,12107,522941,29817,1409,13,67,1329,1802.714286,83.285714,1592.142857,127.285714
392,2021-02-24,566420,12129,523321,30970,1555,22,380,1153,1856.571429,78.857143,1612.571429,165.142857
393,2021-02-25,568680,12201,524042,32437,2260,72,721,1467,1931.000000,75.428571,1697.000000,158.571429
394,2021-02-26,571327,12247,524582,34498,2647,46,540,2061,2038.428571,59.714286,1684.714286,294.000000


We visualize the 7-day moving averages with the use of a line graph.

In [73]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Confirmed (7-Day Rolling Average)'],
                    name = 'Daily Confirmed Cases (7-Day Rolling Average)',
                    marker_color = 'cornflowerblue'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "7-Day Rolling Average of COVID-19-Related Confirmed Cases in the Philippines",
                 xaxis_title = "Observation Date",
                 yaxis_title = "7-Day Rolling Average of COVID-19-Related Confirmed Cases")

fig.show()

The trend with respect to the 7-day rolling average of COVID-19-related confirmed cases in our country is less monotonic than the global pattern. Even with the demonstrated smoothening of data via moving averages, jitters continue to be noticeable to a certain degree. Nevertheless, the overall trend continues to be discernible: increasing from March to August 2020 and decreasing from mid-September to the end of December 2020 &mdash; which is in consonance with the analyses from the previous graphs. Again, the tail end of the graph seems to be indicative of a resurgence in the rate of transmission.

In [74]:
fig = go.Figure()

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Active (7-Day Rolling Average)'],
                    name = 'Active Cases',
                    marker_color = 'cornflowerblue'))

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Recovered (7-Day Rolling Average)'],
                    name = 'Recoveries',
                    marker_color = 'darkseagreen'))

fig.add_trace(go.Scatter(x = ph_moving_agg_confirmed_date['ObservationDate'],
                    y = ph_moving_agg_confirmed_date['Daily Deaths (7-Day Rolling Average)'],
                    name = 'Deaths',
                    marker_color = 'indianred'))

fig.update_layout(xaxis_rangeslider_visible = True,
                 title = "7-Day Rolling Average of COVID-19-Related Cases in the Philippines (Active Cases, Recoveries & Deaths)",
                 xaxis_title = "Observation Date",
                 yaxis_title = "7-Day Rolling Average of COVID-19-Related Cases")

fig.show()

Giving an analysis of the 7-day rolling average of active cases, recoveries, and deaths in the Philippines is not as straightfoward due to fluctuations in the trend Generally, the rolling average of recoveries outweigh both that of active cases and deaths. The largest gap between the rolling average of recoveries and that of active cases was recorded on the first and third weeks of August. 

On the third week of January, however, the margin between these two noticeably became smaller; in light of the previous analyses, this might be due to a surge in the number of confirmed cases in our country post-holiday season. In relation to this, it may be relevant to hone in on the tail-end of the graph, which is indicative of an increase in the number of active cases. In hindsight, this may have been one of the early indicators of the resurgence in the virality of transmission that our country is currently facing.

# <a name = "References" style = "color:black;">REFERENCES</a>

- Adams, A., Li, W., Zhang, C., & Chen, X. (2020). The disguised pandemic: the importance of data normalization in COVID-19 web mapping. <i>Public Health, 183</i>, 36-37. doi:10.1016/j.puhe.2020.04.034 <br/>
- Hallare, K. (2020, September 6). UP expert: PH’s virus curve already flattened, but no cause for excitement yet. <i>Philippine Daily Inquirer.</i> https://newsinfo.inquirer.net/1331933/up-expert-phs-virus-curve-already-flattened-but-no-cause-for-excitement-yet#ixzz6q9VP5MyI
- Sanderson, M., Hudson, I.L., & Osborn, M. (2020, April 7). <i>The bar necessities: 5 ways to understand coronavirus graphs.</i> The Conversation. https://theconversation.com/the-bar-necessities-5-ways-to-understand-coronavirus-graphs-135537 <br/>
- Nau, R. (2020). <i>Stationarity and differincing</i>. Duke University. http://people.duke.edu/~rnau/411diff.htm
- World Health Organization. (2020). <i>Estimating mortality from COVID-19.</i> https://www.who.int/news-room/commentaries/detail/estimating-mortality-from-covid-19