### This project features interactive maps and HTML markup. Since Github does not support dynamic displays for notebooks, it is necessary to download the project to render the elements and to manipulate the data visualizations.

# Data Preparation

This is the <b>first in a series of five Jupyter notebooks</b> on the Novel Corona Virus 2019 Dataset. This activity is in partial fulfillment of the Tidy Tuesdays deliverables for probationary Lyrids of the <b>Center for Complexity and Emerging Technologies, College of Computer Studies, De La Salle University</b>.

<b>The COVID-19 pandemic is a serious and unprecedented global health crisis.</b> Through this activity, the author of this series of Jupyter notebooks hopes to increase awareness on the importance of data-driven policy directions and to hopefully contribute to the present discourse on global and national trends related to the spread of the coronavirus. In this regard, this attempt to explore and visualize the said dataset does not seek to reduce the gravity of the pandemic to mere numerical figures nor to present a professional, rigorous statistical analysis.

<hr/>

The required dataset for this Tidy Tuesdays activity is the Novel Corona Virus 2019 Dataset from Kaggle: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset.

To enrich the analysis and visualization, the following datasets were integrated:
- Population by Country - 2020: https://www.kaggle.com/tanuprabhu/population-by-country-2020
- Coordinates of US States: https://developers.google.com/public-data/docs/canonical/states_csv
- Coordinates of Countries: https://developers.google.com/public-data/docs/canonical/countries_csv

These datasets are stored in the folder <code>data</code> of the repository.

# TABLE OF CONTENTS
I. [Preliminaries](#Preliminaries) <br/>
II. [Data Preparation](#DataPreparation) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A. [Load the Datasets](#LoadDatasets) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. [COVID-19 Data (Master Dataset)](#Covid1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. [Time Series of Confirmed Cases](#Confirmed1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3. [Time Series of Recoveries](#Recov1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4. [Time Series of Deaths](#Deaths1) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; B. [Clean the Datasets](#CleanDatasets) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0. [Standardizing Country/Region Names](#Country) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. [COVID-19 Data (Master Dataset)](#Covid2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. [Time Series of Confirmed Cases](#Confirmed2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3. [Time Series of Recoveries](#Recov2) <br/>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4. [Time Series of Deaths](#Deaths2) <br/>
III. [References](#References) <br/>

<hr/>

# <a name = "Preliminaries" style = "color:black;">PRELIMINARIES</a>

We will be using the following modules and libraries in our program:
- <code>pandas</code> - for performing some data manipulation (https://pandas.pydata.org/)
- <code>numpy</code> - for performing efficient numerical analysis (https://numpy.org/)
- <code>re</code> - for performing operations involving regular expressions (https://docs.python.org/3/library/re.html)
- <code>plotly</code> - for generating data visualizations (https://plotly.com/)

Note that <code>plotly</code> is not automatically bundled with an Anaconda installation. To install this package with conda (https://anaconda.org/plotly/plotly), either of these commands has to be run:

<code>conda install -c plotly plotly</code> <br/>
<code>conda install -c plotly/label/test plotly</code>

In [1]:
import re

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected = True)

The last two lines of code are for displaying the figures (generated via <code>plotly</code>) on the Jupyter notebook itself.

Moreover, we also suppress the warning generated by Pandas for chained assignments.

In [2]:
pd.options.mode.chained_assignment = None  

We set the number of rows to be displayed when the data are presented in tabular form via <code>pandas</code>. Note that, during actual data preparation and analysis, this is usually set to a high value (or the entire datasets themselves are printed) in order to view more rows, preliminarily identify issues related to formatting or encoding, and understand anomalous results. 

In [3]:
NUM_ROWS = 10

pd.set_option('display.max_rows', NUM_ROWS)
pd.set_option('display.min_rows', NUM_ROWS)

<hr/>

# <a name = "DataPreparation" style = "color:black;">DATA PREPARATION</a>

Before we can perform any form of data analysis, we first have to load the datasets and perform some preliminary data cleaning. This is necessary in order to have an initial understanding of the nature and structure of the datasets, standardize the formatting and encoding of entries, and improve the quality of the data before exploratory analysis is conducted. 

This is especially important given the nature of our datasets, which focus on an ongoing pandemic of a global magnitude. Most countries are consolidating, releasing, and submitting statistics on a day-to-day basis, and the sheer volume of incoming data daily heightens the need to systematically correct inconsistencies, handle missing values, remove duplicates, and the like.

## A. <a name = "LoadDatasets" style = "color:black;">Load the datasets</a>

The Novel Corona Virus 2019 Dataset from Kaggle actually consists of 6 CSV files. However, since our analysis is grounded on global and national (Philippine) perspectives, we can opt not to load <code>time_series_covid_19_confirmed_US.csv</code> and <code>time_series_covid_19_deaths_US</code>, which are specific to the United States of America.

### 1. <a name = "Covid1" style = "color:black;">COVID-19 Data (Master Dataset)</a>

We load the file containing the consolidated time series data on the number of confirmed cases, deaths, and recoveries related to the COVID-19 pandemic.

In [4]:
data_raw = pd.read_csv('data/covid_19_data.csv')

It is a good idea to manipulate a copy of the dataset to avoid accidental modification of the contents of the original.

In [5]:
data = data_raw.copy(deep = True)
data

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
1,2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14.0,0.0,0.0
2,3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6.0,0.0,0.0
3,4,01/22/2020,Fujian,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
4,5,01/22/2020,Gansu,Mainland China,1/22/2020 17:00,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
236012,236013,02/27/2021,Zaporizhia Oblast,Ukraine,2021-02-28 05:22:20,69504.0,1132.0,65049.0
236013,236014,02/27/2021,Zeeland,Netherlands,2021-02-28 05:22:20,16480.0,178.0,0.0
236014,236015,02/27/2021,Zhejiang,Mainland China,2021-02-28 05:22:20,1321.0,1.0,1314.0
236015,236016,02/27/2021,Zhytomyr Oblast,Ukraine,2021-02-28 05:22:20,50582.0,834.0,44309.0


### 2. <a name = "Confirmed1" style = "color:black;">Time Series of Confirmed Cases</a>

We load the file containing the daily cumulative tally of confirmed cases related to the COVID-19 pandemic. Again, it is a good idea to manipulate a copy of the dataset to avoid accidental modification of the contents of the original.

In [6]:
confirmed_raw = pd.read_csv('data/time_series_covid_19_confirmed.csv')
confirmed = confirmed_raw.copy(deep = True)

confirmed

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,...,55557,55575,55580,55604,55617,55646,55664,55680,55696,55707
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,96838,97909,99062,100246,101285,102306,103327,104313,105229,106215
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,111418,111600,111764,111917,112094,112279,112461,112622,112805,112960
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,10610,10645,10672,10699,10712,10739,10775,10799,10822,10849
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,20452,20478,20499,20519,20548,20584,20640,20695,20759,20782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269,,Vietnam,14.058324,108.277199,0,2,2,2,2,2,...,2347,2362,2368,2383,2392,2403,2412,2421,2426,2432
270,,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,...,171154,171717,172315,173635,174969,176377,177768,179293,180848,181909
271,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,2154,2157,2157,2165,2176,2187,2221,2255,2267,2269
272,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,72467,73203,73894,74503,75027,75582,76484,77171,77639,78202


### 3. <a name = "Recov1" style = "color:black;">Time Series of Recoveries</a>

We load the file containing the daily cumulative tally of recoveries related to the COVID-19 pandemic. Again, it is a good idea to manipulate a copy of the dataset to avoid accidental modification of the contents of the original.

In [7]:
recovered_raw = pd.read_csv('data/time_series_covid_19_recovered.csv')
recovered = recovered_raw.copy(deep = True)

recovered

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,...,48798,48803,48820,48834,48895,48967,49086,49281,49285,49288
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,60675,61605,62533,63329,64318,65403,66309,67158,68007,68969
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,76640,76797,76940,77076,77225,77382,77537,77683,77842,77976
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,10101,10146,10170,10206,10245,10285,10319,10356,10394,10429
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,18972,18991,19005,19013,19190,19207,19221,19238,19307,19315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254,,Vietnam,14.058324,108.277199,0,0,0,0,0,0,...,1605,1627,1627,1702,1717,1760,1804,1804,1839,1844
255,,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,...,159369,160172,160763,161410,162025,162757,163795,164557,165205,166119
256,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,1432,1432,1432,1432,1432,1432,1433,1434,1434,1435
257,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,65051,66013,66943,67944,68928,69436,69803,70800,72635,73609


### 4. <a name = "Deaths1" style = "color:black;">Time Series of Deaths</a>

We load the file containing the daily cumulative tally of deaths related to the COVID-19 pandemic. Again, it is a good idea to manipulate a copy of the dataset to avoid accidental modification of the contents of the original.

In [8]:
deaths_raw = pd.read_csv('data/time_series_covid_19_deaths.csv')
deaths = deaths_raw.copy(deep = True)

deaths

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,...,2430,2430,2430,2432,2433,2435,2436,2438,2442,2443
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,1617,1636,1653,1666,1681,1696,1715,1736,1756,1775
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,2950,2954,2958,2961,2964,2967,2970,2973,2977,2979
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,107,107,107,107,109,110,110,110,110,110
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,498,498,498,499,499,500,501,502,504,506
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269,,Vietnam,14.058324,108.277199,0,0,0,0,0,0,...,35,35,35,35,35,35,35,35,35,35
270,,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,...,1956,1961,1971,1976,1986,1994,1999,2008,2019,2025
271,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,618,618,618,619,619,620,624,625,627,631
272,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,991,1002,1016,1020,1031,1040,1051,1059,1066,1081


<hr/>

## B. <a name = "CleanDatasets" style = "color:black;">Clean the Datasets</a>

### 0. <a name = "Country" style = "color:black;">Standardizing Country/Region Names</a>

From a cursory inspection of the datasets, it is evident there are some problems related to the names of the countries/regions:
- Some of the names are inconsistent, or they contain some stray characters (probably due to misencoding). For example, <code>('St. Martin')</code> and <code>Saint Martin</code> pertain to the same country. 
- Some of the names do not conform to the those found in standard lists. For example, <code>Congo (Kinshasa)</code> should be officially referred to as the <code>Republic of the Congo</code>, whose capital is Kinshasa. 
- There are some abbreviations that may not be entirely compatible when combined with other datasets or when used with libraries. For example, it is better to spell out <code>US</code> and <code>UK</code> as <code>United States</code> and <code>United Kingdom</code>, respectively.
- There are some official local names (especially those with diacritics) that may not be entirely compatible when combined with English-language datasets or when used with English-language libraries. For example, <code>Côte d'Ivoire</code> and <code>Timor-Leste</code> are translated in English as <code>Ivory Coast</code> and <code>East Timor</code>, respectively.

The following JSON file was consulted as reference in standardizing the names of countries: http://country.io/names.json. A special case is the entry <code>occupied Palestinian territory</code>, which was renamed to <code>West Bank and Gaza</code>, since most visualization libraries are reliant on geographical names. Moreover, the distinction made in the datasets among <code>Mainland China</code>, <code>Hong Kong</code>, and <code>Macau</code> was also preserved since this project employs map-based (geographical) visualizations extensively.

In this light, a list, <code>standardized_names</code>, consisting of tuples was constructed. The first element in each tuple refers to the non-standardized name found in the datasets while the second element refers to the standardized one.

In [9]:
standardized_names = [("('St. Martin',)", "Saint Martin"), 
                      ("St. Martin", "Saint Martin"),
                      ("The Bahamas", "Bahamas"),
                      ("Bahamas, The", "Bahamas"),
                      ("The Gambia", "Gambia"),
                      ("Gambia, The", "Gambia"),
                      ("Congo (Kinshasa)", "Republic of the Congo"),
                      ("Congo (Brazzaville)", "Democratic Republic of the Congo"),
                      ("Cabo Verde", "Cape Verde"),
                      ("Timor-Leste", "East Timor"),
                      ("occupied Palestinian territory", "West Bank and Gaza"),
                      ("US", "United States"),
                      ("UK", "United Kingdom"),
                      ("Holy See", "Vatican"),
                      ("Vatican City", "Vatican"),
                      ("Cote d'Ivoire", "Ivory Coast"),
                      ("Taiwan*", "Taiwan"),
                      ("Czechia", "Czech Republic"),
                      (" Azerbaijan", "Azerbaijan")]

Another problem is in relation to cruise ships that are classified in the datasets as countries/regions, such as <code>Diamond Princess</code> and <code>MS Zaandam</code>. In the interest of uniformity, we rename them (along with those tagged as <code>Others</code>) under the umbrella term <code>Cruise Ships</code>.

In [10]:
cruise_ships = [("Diamond Princess", "Cruise Ships"),
                ("MS Zaandam", "Cruise Ships"),
                ("Others", "Cruise Ships")]

### 1. <a name = "Covid2" style = "color:black;">COVID-19 Data (Master Dataset)</a>

Looking at the shape of the dataset, we find that the consolidated time series dataset has 236017 values and 8 columns corresponding to the following details:
- Entry number
- Date of observation
- Province/state
- Country/region
- Date of the entry's latest update
- (Cumulative) number of confirmed cases
- (Cumulative) number of deaths
- (Cumulative) number of recoveries

We also print the number of null values per column and the data types in order to identify areas that need to be addressed during data preparation.

In [11]:
print("Size/Shape: ", data.shape, "\n") 
print("Checking for null values:\n", data.isnull().sum(), "\n")
print("Data types:\n", data.dtypes)

Size/Shape:  (236017, 8) 

Checking for null values:
 SNo                    0
ObservationDate        0
Province/State     62045
Country/Region         0
Last Update            0
Confirmed              0
Deaths                 0
Recovered              0
dtype: int64 

Data types:
 SNo                  int64
ObservationDate     object
Province/State      object
Country/Region      object
Last Update         object
Confirmed          float64
Deaths             float64
Recovered          float64
dtype: object


### ***Removal of Duplicates***

Since each entry is uniquely identified with a number (<code>SNo</code>), we discard duplicates since they are most likely instances of misencoding.

In [12]:
data = data.drop_duplicates()

print("AFTER DROPPING DUPLICATES\n")
print("Size/Shape: ", data.shape, "\n") 
print("Checking for null values:\n", data.isnull().sum(), "\n")
print("Data types:\n", data.dtypes)

AFTER DROPPING DUPLICATES

Size/Shape:  (236017, 8) 

Checking for null values:
 SNo                    0
ObservationDate        0
Province/State     62045
Country/Region         0
Last Update            0
Confirmed              0
Deaths                 0
Recovered              0
dtype: int64 

Data types:
 SNo                  int64
ObservationDate     object
Province/State      object
Country/Region      object
Last Update         object
Confirmed          float64
Deaths             float64
Recovered          float64
dtype: object


Notice that the shape of the dataset did not change, evincing that the dataset does not contain any duplicates.

### ***Removal of Unneeded Columns & Handling of Null Values***

Since the <code>SNo</code> and <code>Last Update</code> columns only refer to the unique entry number and the date of the entry's last update, respectively, they are immaterial to our analysis; thus, we can drop these columns.

Moreover, we also decide to drop the column <code>Province/State</code>, from where the null values come. There are two primary reasons supporting this manner of handling the null values:
- It is impossible to supply the missing data since they refer to exact geographic locations, thus barring any form of interpolation.
- Over a quarter (26.29%) of the entries do not have a specified province/state.

In [13]:
data.drop(columns = ['SNo', 'Province/State', 'Last Update'], axis = 1, inplace = True)

### ***Standardization of Country/Region Names***

Using the list that we constructed at the start of our data preparation, we standardize the names of the countries/regions, as well as replace the names of cruise ships with a catch-all term. Since this standardization involves the removal of certain stray characters (like single quotes, asterisks, and parentheses), we proceed using regular expressions, thus explaining the <code>re.escape</code> found in the argument of <code>str.replace</code>.

To verify that the names have been standardized, we print the (updated) unique countries/regions.

In [14]:
for name in standardized_names:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    data['Country/Region'] = data['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

print(data['Country/Region'].unique())

['Mainland China' 'Hong Kong' 'Macau' 'Taiwan' 'United States' 'Japan'
 'Thailand' 'South Korea' 'Singapore' 'Philippines' 'Malaysia' 'Vietnam'
 'Australia' 'Mexico' 'Brazil' 'Colombia' 'France' 'Nepal' 'Canada'
 'Cambodia' 'Sri Lanka' 'Ivory Coast' 'Germany' 'Finland'
 'United Arab Emirates' 'India' 'Italy' 'United Kingdom' 'Russia' 'Sweden'
 'Spain' 'Belgium' 'Cruise Ships' 'Egypt' 'Iran' 'Israel' 'Lebanon' 'Iraq'
 'Oman' 'Afghanistan' 'Bahrain' 'Kuwait' 'Austria' 'Algeria' 'Croatia'
 'Switzerland' 'Pakistan' 'Georgia' 'Greece' 'North Macedonia' 'Norway'
 'Romania' 'Denmark' 'Estonia' 'Netherlands' 'San Marino' 'Azerbaijan'
 'Belarus' 'Iceland' 'Lithuania' 'New Zealand' 'Nigeria' 'North Ireland'
 'Ireland' 'Luxembourg' 'Monaco' 'Qatar' 'Ecuador' 'Czech Republic'
 'Armenia' 'Dominican Republic' 'Indonesia' 'Portugal' 'Andorra' 'Latvia'
 'Morocco' 'Saudi Arabia' 'Senegal' 'Argentina' 'Chile' 'Jordan' 'Ukraine'
 'Saint Barthelemy' 'Hungary' 'Faroe Islands' 'Gibraltar' 'Liechtenstein'
 '

### ***Conversion of Data Types***

Notice that the data types of the number of confirmed cases, deaths, and recoveries in the dataset are double-precision floating-point. Therefore, they have to be converted to the integer data type; to prevent data loss, we specify the type to be the 64-bit integer (<code>int64</code>).

Moreover, we also convert the observation dates into <code>datetime</code> objects in order to manipulate them as actual date entries (for example, during sorting or time series analysis). Note that the argument for <code>format</code> is actually optional; however, explicitly specifying it results in an increase in the execution time, which is advantageous for larger datasets.

To verify that the conversion is successful, we print the data types of the pertinent columns.

In [15]:
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'], format = '%m/%d/%Y')
data['Confirmed'] = data['Confirmed'].astype('int64')
data['Deaths'] = data['Deaths'].astype('int64')
data['Recovered'] = data['Recovered'].astype('int64')

print("Data types:\n", data.dtypes)

Data types:
 ObservationDate    datetime64[ns]
Country/Region             object
Confirmed                   int64
Deaths                      int64
Recovered                   int64
dtype: object


### ***One Final Check***

As a final sanity check, we print the cleaned dataset. Notice the change in the format of the entries in the <code>ObservationDate</code> column as a result of the data type conversion.

In [16]:
data

Unnamed: 0,ObservationDate,Country/Region,Confirmed,Deaths,Recovered
0,2020-01-22,Mainland China,1,0,0
1,2020-01-22,Mainland China,14,0,0
2,2020-01-22,Mainland China,6,0,0
3,2020-01-22,Mainland China,1,0,0
4,2020-01-22,Mainland China,0,0,0
...,...,...,...,...,...
236012,2021-02-27,Ukraine,69504,1132,65049
236013,2021-02-27,Netherlands,16480,178,0
236014,2021-02-27,Mainland China,1321,1,1314
236015,2021-02-27,Ukraine,50582,834,44309


### 2. <a name = "Confirmed2" style = "color:black;">Time Series of Confirmed Cases</a>

Looking at the shape of the dataset, we find that the daily cumulative tally of confirmed cases related to COVID-19 has 274 values and 407 columns corresponding to the following details:
- Province/state
- Country/region
- Latitude of the province/state (or the country/region, if the province/state is not specified)
- Longitude of the province/state (or the country/region, if the province/state is not specified)
- (Cumulative) number of confirmed cases per observation date (remaining columns)

We also print the number of null values per column and the data types in order to identify areas that need to be addressed during data preparation.

In [17]:
print("Size/Shape: ", confirmed.shape, "\n") 
print("Checking for null values:\n", confirmed.isnull().sum(), "\n")
print("Data types:\n", confirmed.dtypes)

Size/Shape:  (274, 407) 

Checking for null values:
 Province/State    189
Country/Region      0
Lat                 1
Long                1
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


### ***Removal of Duplicates***

We discard duplicates since they are most likely instances of misencoding.

In [18]:
confirmed = confirmed.drop_duplicates()

print("AFTER DROPPING DUPLICATES\n")
print("Size/Shape: ", confirmed.shape, "\n") 
print("Checking for null values:\n", confirmed.isnull().sum(), "\n")
print("Data types:\n", confirmed.dtypes)

AFTER DROPPING DUPLICATES

Size/Shape:  (274, 407) 

Checking for null values:
 Province/State    189
Country/Region      0
Lat                 1
Long                1
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


Notice that the shape of the dataset did not change, evincing that the dataset does not contain any duplicates.

### ***Removal of Unneeded Columns & Handling of Null Values***

We decide to drop the column <code>Province/State</code>, from where most of the null values come. There are two primary reasons supporting this manner of handling the null values:
- It is impossible to supply the missing data since they refer to exact geographic locations, thus barring any form of interpolation.
- Majority (68.98%) of the entries do not have a specified province/state.

<i>Note that the null values related to the coordinates will be handled separately.</i>

In [19]:
confirmed.drop(columns = ['Province/State'], axis = 1, inplace = True)

### ***Standardization of Country/Region Names***

Using the list that we constructed at the start of our data preparation, we standardize the names of the countries/regions, as well as replace the names of cruise ships with a catch-all term. Since this standardization involves the removal of certain stray characters (like single quotes, asterisks, and parentheses), we proceed using regular expressions, thus explaining the <code>re.escape</code> found in the argument of <code>str.replace</code>.

To verify that the names have been standardized, we print the (updated) unique countries/regions.

In [20]:
for name in standardized_names:
    confirmed['Country/Region'] = confirmed['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    confirmed['Country/Region'] = confirmed['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

print(confirmed['Country/Region'].unique())

['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola'
 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'
 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'
 'Belgium' 'Belize' 'Benin' 'Bhutan' 'Bolivia' 'Bosnia and Herzegovina'
 'Botswana' 'Brazil' 'Brunei' 'Bulgaria' 'Burkina Faso' 'Burma' 'Burundi'
 'Cape Verde' 'Cambodia' 'Cameroon' 'Canada' 'Central African Republic'
 'Chad' 'Chile' 'China' 'Colombia' 'Comoros'
 'Democratic Republic of the Congo' 'Republic of the Congo' 'Costa Rica'
 'Ivory Coast' 'Croatia' 'Cuba' 'Cyprus' 'Czech Republic' 'Denmark'
 'Cruise Ships' 'Djibouti' 'Dominica' 'Dominican Republic' 'Ecuador'
 'Egypt' 'El Salvador' 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Eswatini'
 'Ethiopia' 'Fiji' 'Finland' 'France' 'Gabon' 'Gambia' 'Georgia' 'Germany'
 'Ghana' 'Greece' 'Grenada' 'Guatemala' 'Guinea' 'Guinea-Bissau' 'Guyana'
 'Haiti' 'Vatican' 'Honduras' 'Hungary' 'Iceland' 'India' 'Indonesia'
 'Iran' 'Iraq' 'Ireland' 'Israel' 'Italy' 'Ja

### ***Handling of Null Values Related to Coordinates***

The 52<sup>nd</sup> entry of the dataset, which is related to the country Canada, has null values for its latitude and longitude. Since the province/state is given as Diamond Princess (a cruise ship), it is impossible to pinpoint its exact coordinates. Therefore, as a fill-in, we place the coordinates of the centroid of Canada: 56.130366 (latitude) and -106.346771 (longitude).

The coordinates were taken from the following dataset maintained by Google: https://developers.google.com/public-data/docs/canonical/countries_csv, which will also be used later in data visualization.

In [21]:
confirmed['Lat'] = confirmed['Lat'].replace(np.nan, 56.130366)
confirmed['Long'] = confirmed['Long'].replace(np.nan, -106.346771)

To verify that the null values related to this entry's coordinates are handled properly, we print it.

In [22]:
confirmed.loc[[52]]

Unnamed: 0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
52,Canada,56.130366,-106.346771,0,0,0,0,0,0,0,...,13,13,13,13,13,13,13,13,13,13


### ***Conversion of Data Types***

Since all the data types are correct (the tallies of confirmed cases are integers), there is no need for any data type conversion.

### ***One Final Check***

As a final sanity check, we print the cleaned dataset.

In [23]:
confirmed

Unnamed: 0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,0,...,55557,55575,55580,55604,55617,55646,55664,55680,55696,55707
1,Albania,41.153300,20.168300,0,0,0,0,0,0,0,...,96838,97909,99062,100246,101285,102306,103327,104313,105229,106215
2,Algeria,28.033900,1.659600,0,0,0,0,0,0,0,...,111418,111600,111764,111917,112094,112279,112461,112622,112805,112960
3,Andorra,42.506300,1.521800,0,0,0,0,0,0,0,...,10610,10645,10672,10699,10712,10739,10775,10799,10822,10849
4,Angola,-11.202700,17.873900,0,0,0,0,0,0,0,...,20452,20478,20499,20519,20548,20584,20640,20695,20759,20782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269,Vietnam,14.058324,108.277199,0,2,2,2,2,2,2,...,2347,2362,2368,2383,2392,2403,2412,2421,2426,2432
270,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,0,...,171154,171717,172315,173635,174969,176377,177768,179293,180848,181909
271,Yemen,15.552727,48.516388,0,0,0,0,0,0,0,...,2154,2157,2157,2165,2176,2187,2221,2255,2267,2269
272,Zambia,-13.133897,27.849332,0,0,0,0,0,0,0,...,72467,73203,73894,74503,75027,75582,76484,77171,77639,78202


### 3. <a name = "Recov2" style = "color:black;">Time Series of Recoveries</a>

Looking at the shape of the dataset, we find that the daily cumulative tally of recoveries related to COVID-19 has 259 values and 407 columns corresponding to the following details:
- Province/state
- Country/region
- Latitude of the province/state (or the country/region, if the province/state is not specified)
- Longitude of the province/state (or the country/region, if the province/state is not specified)
- (Cumulative) number of recoveries per observation date (remaining columns)

We also print the number of null values per column and the data types in order to identify areas that need to be addressed during data preparation.

In [24]:
print("Size/Shape: ", recovered.shape, "\n") 
print("Checking for null values:\n", recovered.isnull().sum(), "\n")
print("Data types:\n", recovered.dtypes)

Size/Shape:  (259, 407) 

Checking for null values:
 Province/State    190
Country/Region      0
Lat                 0
Long                0
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


### ***Removal of Duplicates***

We discard duplicates since they are most likely instances of misencoding.

In [25]:
recovered = recovered.drop_duplicates()

print("AFTER DROPPING DUPLICATES\n")
print("Size/Shape: ", recovered.shape, "\n") 
print("Checking for null values:\n", recovered.isnull().sum(), "\n")
print("Data types:\n", recovered.dtypes)

AFTER DROPPING DUPLICATES

Size/Shape:  (259, 407) 

Checking for null values:
 Province/State    190
Country/Region      0
Lat                 0
Long                0
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


Notice that the shape of the dataset did not change, evincing that the dataset does not contain any duplicates.

### ***Removal of Unneeded Columns & Handling of Null Values***

We decide to drop the column <code>Province/State</code>, from where most of the null values come. There are two primary reasons supporting this manner of handling the null values:
- It is impossible to supply the missing data since they refer to exact geographic locations, thus barring any form of interpolation.
- Close to three-fourths (73.36%) of the entries do not have a specified province/state.

<i>Note that, unlike the previous dataset, there are no null values related to coordinates.</i>

In [26]:
recovered.drop(columns = ['Province/State'], axis = 1, inplace = True)

### ***Standardization of Country/Region Names***

Using the list that we constructed at the start of our data preparation, we standardize the names of the countries/regions, as well as replace the names of cruise ships with a catch-all term. Since this standardization involves the removal of certain stray characters (like single quotes, asterisks, and parentheses), we proceed using regular expressions, thus explaining the <code>re.escape</code> found in the argument of <code>str.replace</code>.

To verify that the names have been standardized, we print the (updated) unique countries/regions.

In [27]:
for name in standardized_names:
    recovered['Country/Region'] = recovered['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    recovered['Country/Region'] = recovered['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

print(recovered['Country/Region'].unique())

['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola'
 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'
 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'
 'Belgium' 'Belize' 'Benin' 'Bhutan' 'Bolivia' 'Bosnia and Herzegovina'
 'Botswana' 'Brazil' 'Brunei' 'Bulgaria' 'Burkina Faso' 'Burma' 'Burundi'
 'Cape Verde' 'Cambodia' 'Cameroon' 'Canada' 'Central African Republic'
 'Chad' 'Chile' 'China' 'Colombia' 'Comoros'
 'Democratic Republic of the Congo' 'Republic of the Congo' 'Costa Rica'
 'Ivory Coast' 'Croatia' 'Cuba' 'Cyprus' 'Czech Republic' 'Denmark'
 'Cruise Ships' 'Djibouti' 'Dominica' 'Dominican Republic' 'Ecuador'
 'Egypt' 'El Salvador' 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Eswatini'
 'Ethiopia' 'Fiji' 'Finland' 'France' 'Gabon' 'Gambia' 'Georgia' 'Germany'
 'Ghana' 'Greece' 'Grenada' 'Guatemala' 'Guinea' 'Guinea-Bissau' 'Guyana'
 'Haiti' 'Vatican' 'Honduras' 'Hungary' 'Iceland' 'India' 'Indonesia'
 'Iran' 'Iraq' 'Ireland' 'Israel' 'Italy' 'Ja

### ***Conversion of Data Types***

Since all the data types are correct (the tallies of recoveries are integers), there is no need for any data type conversion.

### ***One Final Check***

As a final sanity check, we print the cleaned dataset.

In [28]:
recovered

Unnamed: 0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,0,...,48798,48803,48820,48834,48895,48967,49086,49281,49285,49288
1,Albania,41.153300,20.168300,0,0,0,0,0,0,0,...,60675,61605,62533,63329,64318,65403,66309,67158,68007,68969
2,Algeria,28.033900,1.659600,0,0,0,0,0,0,0,...,76640,76797,76940,77076,77225,77382,77537,77683,77842,77976
3,Andorra,42.506300,1.521800,0,0,0,0,0,0,0,...,10101,10146,10170,10206,10245,10285,10319,10356,10394,10429
4,Angola,-11.202700,17.873900,0,0,0,0,0,0,0,...,18972,18991,19005,19013,19190,19207,19221,19238,19307,19315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
254,Vietnam,14.058324,108.277199,0,0,0,0,0,0,0,...,1605,1627,1627,1702,1717,1760,1804,1804,1839,1844
255,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,0,...,159369,160172,160763,161410,162025,162757,163795,164557,165205,166119
256,Yemen,15.552727,48.516388,0,0,0,0,0,0,0,...,1432,1432,1432,1432,1432,1432,1433,1434,1434,1435
257,Zambia,-13.133897,27.849332,0,0,0,0,0,0,0,...,65051,66013,66943,67944,68928,69436,69803,70800,72635,73609


### 4. <a name = "Deaths2" style = "color:black;">Time Series of Deaths</a>

Looking at the shape of the dataset, we find that the daily cumulative tally of deaths related to COVID-19 has 274 values and 407 columns corresponding to the following details:
- Province/state
- Country/region
- Latitude of the province/state (or the country/region, if the province/state is not specified)
- Longitude of the province/state (or the country/region, if the province/state is not specified)
- (Cumulative) number of deaths per observation date (remaining columns)

We also print the number of null values per column and the data types in order to identify areas that need to be addressed during data preparation.

In [29]:
print("Size/Shape: ", deaths.shape, "\n") 
print("Checking for null values:\n", deaths.isnull().sum(), "\n")
print("Data types:\n", deaths.dtypes)

Size/Shape:  (274, 407) 

Checking for null values:
 Province/State    189
Country/Region      0
Lat                 1
Long                1
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


### ***Removal of Duplicates***

We discard duplicates since they are most likely instances of misencoding.

In [30]:
deaths = deaths.drop_duplicates()

print("AFTER DROPPING DUPLICATES\n")
print("Size/Shape: ", deaths.shape, "\n") 
print("Checking for null values:\n", deaths.isnull().sum(), "\n")
print("Data types:\n", deaths.dtypes)

AFTER DROPPING DUPLICATES

Size/Shape:  (274, 407) 

Checking for null values:
 Province/State    189
Country/Region      0
Lat                 1
Long                1
1/22/20             0
                 ... 
2/23/21             0
2/24/21             0
2/25/21             0
2/26/21             0
2/27/21             0
Length: 407, dtype: int64 

Data types:
 Province/State     object
Country/Region     object
Lat               float64
Long              float64
1/22/20             int64
                   ...   
2/23/21             int64
2/24/21             int64
2/25/21             int64
2/26/21             int64
2/27/21             int64
Length: 407, dtype: object


Notice that the shape of the dataset did not change, evincing that the dataset does not contain any duplicates.

### ***Removal of Unneeded Columns & Handling of Null Values***

We decide to drop the column <code>Province/State</code>, from where most of the null values come. There are two primary reasons supporting this manner of handling the null values:
- It is impossible to supply the missing data since they refer to exact geographic locations, thus barring any form of interpolation.
- Majority (68.98%) of the entries do not have a specified province/state.

<i>Note that the null values related to the coordinates will be handled separately.</i>

In [31]:
deaths.drop(columns = ['Province/State'], axis = 1, inplace = True)

### ***Standardization of Country/Region Names***

Using the list that we constructed at the start of our data preparation, we standardize the names of the countries/regions, as well as replace the names of cruise ships with a catch-all term. Since this standardization involves the removal of certain stray characters (like single quotes, asterisks, and parentheses), we proceed using regular expressions, thus explaining the <code>re.escape</code> found in the argument of <code>str.replace</code>.

To verify that the names have been standardized, we print the (updated) unique countries/regions.

In [32]:
for name in standardized_names:
    confirmed['Country/Region'] = confirmed['Country/Region'].str.replace(re.escape(name[0]), name[1])
    
for ship in cruise_ships:
    confirmed['Country/Region'] = confirmed['Country/Region'].str.replace(re.escape(ship[0]), ship[1])

print(confirmed['Country/Region'].unique())

['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola'
 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'
 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'
 'Belgium' 'Belize' 'Benin' 'Bhutan' 'Bolivia' 'Bosnia and Herzegovina'
 'Botswana' 'Brazil' 'Brunei' 'Bulgaria' 'Burkina Faso' 'Burma' 'Burundi'
 'Cape Verde' 'Cambodia' 'Cameroon' 'Canada' 'Central African Republic'
 'Chad' 'Chile' 'China' 'Colombia' 'Comoros'
 'Democratic Republic of the Congo' 'Republic of the Congo' 'Costa Rica'
 'Ivory Coast' 'Croatia' 'Cuba' 'Cyprus' 'Czech Republic' 'Denmark'
 'Cruise Ships' 'Djibouti' 'Dominica' 'Dominican Republic' 'Ecuador'
 'Egypt' 'El Salvador' 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Eswatini'
 'Ethiopia' 'Fiji' 'Finland' 'France' 'Gabon' 'Gambia' 'Georgia' 'Germany'
 'Ghana' 'Greece' 'Grenada' 'Guatemala' 'Guinea' 'Guinea-Bissau' 'Guyana'
 'Haiti' 'Vatican' 'Honduras' 'Hungary' 'Iceland' 'India' 'Indonesia'
 'Iran' 'Iraq' 'Ireland' 'Israel' 'Italy' 'Ja

### ***Handling of Null Values Related to Coordinates***

The 52<sup>nd</sup> entry of the dataset, which is related to the country Canada, has null values for its latitude and longitude. Since the province/state is given as Diamond Princess (a cruise ship), it is impossible to pinpoint its exact coordinates. Therefore, as a fill-in, we place the coordinates of the centroid of Canada: 56.130366 (latitude) and -106.346771 (longitude).

The coordinates were taken from the following dataset maintained by Google: https://developers.google.com/public-data/docs/canonical/countries_csv, which will also be used later in data visualization.

In [33]:
# https://developers.google.com/public-data/docs/canonical/countries_csv
deaths['Lat'] = deaths['Lat'].replace(np.nan, 56.130366)
deaths['Long'] = deaths['Long'].replace(np.nan, -106.346771)

To verify that the null values related to this entry's coordinates are handled properly, we print it.

In [34]:
deaths.loc[[52]]

Unnamed: 0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
52,Canada,56.130366,-106.346771,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### ***Conversion of Data Types***

Since all the data types are correct (the tallies of deaths are integers), there is no need for any data type conversion.

### ***One Final Check***

As a final sanity check, we print the cleaned dataset.

In [35]:
deaths

Unnamed: 0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,2/18/21,2/19/21,2/20/21,2/21/21,2/22/21,2/23/21,2/24/21,2/25/21,2/26/21,2/27/21
0,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,0,...,2430,2430,2430,2432,2433,2435,2436,2438,2442,2443
1,Albania,41.153300,20.168300,0,0,0,0,0,0,0,...,1617,1636,1653,1666,1681,1696,1715,1736,1756,1775
2,Algeria,28.033900,1.659600,0,0,0,0,0,0,0,...,2950,2954,2958,2961,2964,2967,2970,2973,2977,2979
3,Andorra,42.506300,1.521800,0,0,0,0,0,0,0,...,107,107,107,107,109,110,110,110,110,110
4,Angola,-11.202700,17.873900,0,0,0,0,0,0,0,...,498,498,498,499,499,500,501,502,504,506
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269,Vietnam,14.058324,108.277199,0,0,0,0,0,0,0,...,35,35,35,35,35,35,35,35,35,35
270,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,0,...,1956,1961,1971,1976,1986,1994,1999,2008,2019,2025
271,Yemen,15.552727,48.516388,0,0,0,0,0,0,0,...,618,618,618,619,619,620,624,625,627,631
272,Zambia,-13.133897,27.849332,0,0,0,0,0,0,0,...,991,1002,1016,1020,1031,1040,1051,1059,1066,1081


# <a name = "References" style = "color:black;">REFERENCES</a>

- Adams, A., Li, W., Zhang, C., & Chen, X. (2020). The disguised pandemic: the importance of data normalization in COVID-19 web mapping. <i>Public Health, 183</i>, 36-37. doi:10.1016/j.puhe.2020.04.034 <br/>
- Hallare, K. (2020, September 6). UP expert: PH’s virus curve already flattened, but no cause for excitement yet. <i>Philippine Daily Inquirer.</i> https://newsinfo.inquirer.net/1331933/up-expert-phs-virus-curve-already-flattened-but-no-cause-for-excitement-yet#ixzz6q9VP5MyI
- Sanderson, M., Hudson, I.L., & Osborn, M. (2020, April 7). <i>The bar necessities: 5 ways to understand coronavirus graphs.</i> The Conversation. https://theconversation.com/the-bar-necessities-5-ways-to-understand-coronavirus-graphs-135537 <br/>
- Nau, R. (2020). <i>Stationarity and differincing</i>. Duke University. http://people.duke.edu/~rnau/411diff.htm
- World Health Organization. (2020). <i>Estimating mortality from COVID-19.</i> https://www.who.int/news-room/commentaries/detail/estimating-mortality-from-covid-19