# COVID-19, Public Safety Measures, and Public and Fiscal Health in G20 Countries

DATA 604 (L02): Working with Data at Scale

Authors: 
- Paul Croome (30198881)
- Rodrigo Rosales Alvarez (30148393)
- Ann Siddiqui (30043081)
- Josh Olea (30082147)
- Kane Smith (30179486)

Professor: Dr. Katie Ovens

University of Calgary

December 13th, 2022

# Table of Contents
1. [Introduction](#introduction)
2. [Guiding Questions](#guiding-questions)
3. [Indiviudal Datasets](#ind-data)
4. [Packages](#packages)
5. [Data Load](#data-load)
6. [Data Cleaning](#data-cleaning)
    1. [Check NAs](#check-na)
    2. [Check Duplicates](#check-dup)
    3. [Convert Times to DateTime](#date-time)
    4. [Data Info](#data-info)
7. [Create Columns](#create-columns) 
8. [Analysis](#analysis)
    1. [Question 1: How did COVID-19 impact the performance of the financial markets in G20 countries?](#q-1)
    2. [Question 2: Did government and societal healthcare actions influence the prevalence and severity of COVID-19 in G20 countries?](#q-2)
    3. [Question 3: Is the relationship, if any, between financial market performance and the prevalence and severity of COVID-19 moderated by the healthcare measures taken by a G20 county?](#q-3)
10. [Conclusion](#conclusion)
11. [Refrences](#refrences)

### Introduction <a name="introduction"></a>
The domain of our project covers both economic and healthcare-related indicators of the wellbeing of G20 countries during the coronavirus disease 2019 (COVID-19) pandemic. In particular, we will be examining data related to the prevalence and severity of the COVID-19 pandemic, the governmental and societal measures taken to reduce the spread of the disease, and the performance of key stock market indices during the pandemic. These data were all daily reported between January 2020 and October 2022.

This is an interesting and important topic of study because, in our increasingly interconnected world, contagious diseases can be transmitted over vast distances remarkably easily. Even small, remote outbreaks of diseases anywhere in the world can swiftly turn into a global pandemic, which can then cause devastation on personal, societal, and worldwide scales. 

### Guiding Questions <a name="guiding-questions"></a>
1. How did COVID-19 impact the performance of the financial markets in G20 countries? 
    - What is the relationship between the prevalence and severity of COVID-19 (i.e., new COVID-19 cases, hospitalizations, ICU admissions, and deaths) in a country and the performance of that country’s strongest stock exchange index?
<br> <br>
2. Did government and societal healthcare actions influence the prevalence and severity of COVID-19 in G20 countries? 
    - Is there a correlation between vaccination, booster-shot, and policy response rates and the prevalence and severity of COVID-19 in the G20 countries?
<br> <br>  
3. Is the relationship, if any, between financial market performance and the prevalence and severity of COVID-19 moderated by the healthcare measures taken by a G20 county?

### Individual Datasets <a name="ind-data"></a>
The first dataset we will use consists of diverse information related to the COVID-19 pandemic, including a country’s daily rates of COVID-19 diagnoses, hospitalizations, deaths, vaccinations, and booster shots. We will use features of these data to determine the prevalence and severity of the COVID-19 pandemic for each of the G20 countries on each day between January 1, 2020 and October 26, 2022. We chose to examine the G20 countries for this project because we believe this subset of countries will have reliably reported data. In addition, by focusing our investigative scope, we will be able to more deeply explore the data. This dataset is contained in a CSV file and is licensed for open access under the Creative Commons BY license.

As we have the COVID-19 in a CSV file, we created an empty Table in MySQL that had the same structure as our CSV file, the name of the Table is *ll02-3.covid_data*.
~~~mysql
CREATE TABLE IF NOT EXISTS ´l02-3´.covid_data (
  iso_code VARCHAR(255) NOT NULL,
  continent VARCHAR(255) NOT NULL,
  location VARCHAR(255) NOT NULL,
  `date` DATETIME NOT NULL,
  `total_cases` DOUBLE DEFAULT NULL,
  `new_cases` DOUBLE DEFAULT NULL,
  `new_cases_smoothed` DOUBLE DEFAULT NULL,
  `total_deaths` DOUBLE DEFAULT NULL,
  `new_deaths` DOUBLE DEFAULT NULL,
  `new_deaths_smoothed` DOUBLE DEFAULT NULL,
  `total_cases_per_million` DOUBLE DEFAULT NULL,
  `new_cases_per_million` DOUBLE DEFAULT NULL,
  `new_cases_smoothed_per_million` DOUBLE DEFAULT NULL,
  `total_deaths_per_million` DOUBLE DEFAULT NULL,
  `new_deaths_per_million` DOUBLE DEFAULT NULL,
  `new_deaths_smoothed_per_million` DOUBLE DEFAULT NULL,
  `reproduction_rate` DOUBLE DEFAULT NULL,
  `icu_patients` DOUBLE DEFAULT NULL,
  `icu_patients_per_million` DOUBLE DEFAULT NULL,
  `hosp_patients` DOUBLE DEFAULT NULL,
  `hosp_patients_per_million` DOUBLE DEFAULT NULL,
  `weekly_icu_admissions` DOUBLE DEFAULT NULL,
  `weekly_icu_admissions_per_million` DOUBLE DEFAULT NULL,
  `weekly_hosp_admissions` DOUBLE DEFAULT NULL,
  `weekly_hosp_admissions_per_million` DOUBLE DEFAULT NULL,
  `total_tests` DOUBLE DEFAULT NULL,
  `new_tests` DOUBLE DEFAULT NULL,
  `total_tests_per_thousand` DOUBLE DEFAULT NULL,
  `new_tests_per_thousand` DOUBLE DEFAULT NULL,
  `new_tests_smoothed` DOUBLE DEFAULT NULL,
  `new_tests_smoothed_per_thousand` DOUBLE DEFAULT NULL,
  `positive_rate` DOUBLE DEFAULT NULL,
  `tests_per_case` DOUBLE DEFAULT NULL,
  `tests_units` DOUBLE DEFAULT NULL,
  `total_vaccinations` DOUBLE DEFAULT NULL,
  `people_vaccinated` DOUBLE DEFAULT NULL,
  `people_fully_vaccinated` DOUBLE DEFAULT NULL,
  `total_boosters` DOUBLE DEFAULT NULL,
  `new_vaccinations` DOUBLE DEFAULT NULL,
  `new_vaccinations_smoothed` DOUBLE DEFAULT NULL,
  `total_vaccinations_per_hundred` DOUBLE DEFAULT NULL,
  `people_vaccinated_per_hundred` DOUBLE DEFAULT NULL,
  `people_fully_vaccinated_per_hundred` DOUBLE DEFAULT NULL,
  `total_boosters_per_hundred` DOUBLE DEFAULT NULL,
  `new_vaccinations_smoothed_per_million` DOUBLE DEFAULT NULL,
  `new_people_vaccinated_smoothed` DOUBLE DEFAULT NULL,
  `new_people_vaccinated_smoothed_per_hundred` DOUBLE DEFAULT NULL,
  `stringency_index` DOUBLE DEFAULT NULL,
  `population_density` DOUBLE DEFAULT NULL,
  `median_age` DOUBLE DEFAULT NULL,
  `aged_65_older` DOUBLE DEFAULT NULL,
  `aged_70_older` DOUBLE DEFAULT NULL,
  `gdp_per_capita` DOUBLE DEFAULT NULL,
  `extreme_poverty` DOUBLE DEFAULT NULL,
  `cardiovasc_death_rate` DOUBLE DEFAULT NULL,
  `diabetes_prevalence` DOUBLE DEFAULT NULL,
  `female_smokers` DOUBLE DEFAULT NULL,
  `male_smokers` DOUBLE DEFAULT NULL,
  `handwashing_facilities` DOUBLE DEFAULT NULL,
  `hospital_beds_per_thousand` DOUBLE DEFAULT NULL,
  `life_expectancy` DOUBLE DEFAULT NULL,
  `human_development_index` DOUBLE DEFAULT NULL,
  `population` DOUBLE DEFAULT NULL,
  `excess_mortality_cumulative_absolute` DOUBLE DEFAULT NULL,
  `excess_mortality_cumulative` DOUBLE DEFAULT NULL,
  `excess_mortality` DOUBLE DEFAULT NULL,
  `excess_mortality_cumulative_per_million` DOUBLE DEFAULT NULL
  ) ENGINE=InnoDB;
~~~

After the empty Table was created we populated it with the CSV file with the following code:

~~~mysql
load data local infile "path_to_the_file/owid-covid-data.csv"
into table ´l02-3´.`covid_data`
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\n'
ignore 1 rows;
~~~

Lastly we created a new table named *g20_covid* with the same columns as the first table only including a **WHERE** statement to only select the contries we are interested, in this case the G20 countries.

~~~mysql
INSERT INTO p´l02-3´.g20_covid
SELECT * FROM ´l02-3´.covid_data
WHERE location IN 
    ('Argentina', 'Australia', 'Brazil', 'Canada', 
    'China', 'European Union', 'France', 'Germany', 
    'Indonesia', 'India', 'Italy', 'Japan', 
    'Mexico', 'Russia', 'Saudi Arabia', 'South Africa', 
    'South Korea', 'Turkey', 'United Kingdom', 'United States');
~~~

The second series of datasets we are going to use contain the market indices for the G20 countries, and are contained in CSV files. This series of open source datasets were obtained from and licensed by multiple sources such as Yahoo Finance, Tradingeconomics.com, Investing.com, and S&P Global Inc. Important information found in these datasets are the overall market performance of each index calculated daily. We plan to use these data to quantify the strength of each G20 country’s financial market, and to determine how the market performed with respect to the severity of the pandemic in each country.

We followed the same procedure we did with the COVID-19 dataset to upload all 20 Market Indices; creating first an empty table for each of the countries. 

~~~mysql
CREATE TABLE IF NOT EXISTS `l02-3`.`index_mexico` (
 `Date` DATE,
  Adj_Close DOUBLE,
);
~~~

After creating the table, we populated it with the following code:

~~~mysql
load data local infile "path_to_the_file/currency/Index_Mexico.csv"
into table `l02-3`.index_mexico
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\r\n'
ignore 1 rows;
~~~

We followed the same procedure for each of the G20 countries.

Lastly, the third series of datasets we will use consist of the daily exchange rates for each country’s currency, which we require in order to convert each currency into its current value of USD. Thus, we will be able to analyze the performance of each country’s market index in a uniform manner. This series of open source datasets were also obtained from and licensed by sources such as Yahoo Finance, Tradingeconomics.com, Investing.com, and S&P Global Inc. These datasets are contained in CSV files.

Using the following code, we converted each country's index dataset into USD, and then joined them onto a list of days from 2017-01-01 to 2022-10-31 with a left join. This is to ensure we have every date in our table since some dates are missing from the index and currency datasets due to weekends and holidays where markets are not open. We then unioned all of the tables for each country to make our final index table (g20_index_usd) long instead of wide. This is mainly to make it easier to vizualize but also for subsequent queries to run slightly more efficient. 

~~~mysql
CREATE TABLE g20_index_usd (
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Argentina" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_argentina AS t1
LEFT JOIN index_argentina AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Australia" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_australia AS t1
LEFT JOIN index_australia AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Brazil" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_brazil AS t1
LEFT JOIN index_brazil AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Canada" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_canada AS t1
LEFT JOIN index_canada AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"China" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_china AS t1
LEFT JOIN index_china AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"European Union" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_eu AS t1
LEFT JOIN index_eu AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"France" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_france AS t1
LEFT JOIN index_france AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Germany" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_germany AS t1
LEFT JOIN index_germany AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION  
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"India" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_india AS t1
LEFT JOIN index_india AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Indonesia" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_indonesia AS t1
LEFT JOIN index_indonesia AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Italy" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_italy AS t1
LEFT JOIN index_italy AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Japan" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_japan AS t1
LEFT JOIN index_japan AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Mexico" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_mexico AS t1
LEFT JOIN index_mexico AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Russia" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_russia AS t1
LEFT JOIN index_russia AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Saudi Arabia" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_saudi_arabia AS t1
LEFT JOIN index_saudi_arabia AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"South Africa" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_south_africa AS t1
LEFT JOIN index_south_africa AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"South Korea" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_south_korea AS t1
LEFT JOIN index_south_korea AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"Turkey" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_turkey AS t1
LEFT JOIN index_turkey AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT t1.date,"United Kingdom" AS Country, t2.adj_close/case when t1.Adj_Close = 0 then null ELSE t1.Adj_Close END AS close
FROM currency_uk AS t1
LEFT JOIN index_uk AS t2
ON t1.Date = t2.date) AS j
ON d.date = j.date)
UNION 
(SELECT d.date, j.Country, j.close AS USD
FROM dates as d
LEFT JOIN
(SELECT date, "United States" AS Country, adj_close AS close
FROM index_us) AS j
ON j.date = d.date)
)
~~~


### Packages <a name="packages"></a>

In [1]:
# Importing the Libraries
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt # visualizations
import seaborn as sns # visualizations
import math # use math operators
import sqlalchemy as sq # use of sql commands
import plotly.express as px # interactive graphs
import plotly.graph_objects as go # interactive graphs
from plotly.subplots import make_subplots # interactive graphs

# Control the general style of the plots
sns.set_style('whitegrid')

### Data Load <a name="data-load"></a>

In [None]:
# Reading csv as dataframe
g20_index_data = pd.read_csv("UNION.csv")
g20_covid = pd.read_csv("g20_covid.csv")

### Data Cleaning <a name="data-cleaning"></a>

In [None]:
# Cleaning Dataset

# Cleaning the date column to have only dates and not timestamps
g20_covid['date'] = pd.to_datetime(g20_covid['date'], format="%Y/%m/%d")
g20_index_data['date'] = pd.to_datetime(g20_index_data['date'], format="%Y/%m/%d")

# Replacing the missing values of the indexes with the last seen value
g20_index_data['USD'] = g20_index_data['USD'].replace(0, None)
g20_index_data.ffill(inplace=True)

# As we only have COVID data from february of 2020 we are going to cut g20_indexes to start the data from february of 2018
g20_index_data = g20_index_data[g20_index_data['date'] >= pd.to_datetime('2018-02-01', format="%Y/%m/%d")]

# Feature Engineering
# Creating a difference column for Index Data
# g20_index_data['difference'] = g20_index_data['USD'].diff()

# Creating a percentage of change for Index Data
# g20_index_data['pct_change'] = g20_index_data['USD'].pct_change()

# Creating a specific column from each part of the Date column
g20_covid['year'] = g20_covid['date'].dt.year
g20_covid['month'] = g20_covid['date'].dt.month
g20_covid['day'] = g20_covid['date'].dt.day_of_year
g20_covid['year_month'] = g20_covid['date'].dt.to_period('M')
g20_index_data['year'] = g20_index_data['date'].dt.year
g20_index_data['month'] = g20_index_data['date'].dt.month
g20_index_data['day'] = g20_index_data['date'].dt.day_of_year
g20_index_data['year_month'] = g20_index_data['date'].dt.to_period('M')


### Creating Columns <a name="create-columns"></a>

## Analysis <a name="analysis"></a>

### Question 1: How did COVID-19 impact the performance of the financial markets in G20 countries?  <a name="q-1"></a>

#### a) What is the relationship between the prevalence and severity of COVID-19 (i.e., new COVID-19 cases, hospitalizations, ICU admissions, and deaths) in a country and the performance of that country’s strongest stock exchange index?

### Question 2: Did government and societal healthcare actions influence the prevalence and severity of COVID-19 in G20 countries? <a name="q-2"></a>

#### a) Is there a correlation between vaccination, booster-shot, and policy response rates and the prevalence and severity of COVID-19 in the G20 countries?

### Question 3: Is the relationship, if any, between financial market performance and the prevalence and severity of COVID-19 moderated by the healthcare measures taken by a G20 county? <a name="q-3"></a>

### Conclusion <a name="conclusion"></a>

### Refrences <a name="refrences"></a>