# Worldwide Effort During the Coronavirus Pandemic

Kaitlyn Won <br> CMSC320 Summer 2020 <br> Published July 21, 2020  

## 1. Introduction

During the past 4 months of quarantine, our news feeds have been flooded with daily media coverage of the coronavirus pandemic and its statistics. As we progress through the summer months, there has been an increasing interest in what the road to recovery looks like in the months ahead and in the measures that have been adopted thus far. We hear of new policies being implemented everyday during these "unprecedented times," as many have phrased to describe the crisis. In some countries, there has been significant reductions in cases due to well-adopted measurements. On the other hand, as the cases in the US continues to rise, the nation is on track as the epicenter of the pandemic, and there is an abundance of news outlets discussing the relatively minimal efforts and slow progress made by the US government. 
<br> <br>
Since there are plenty of public data sets for the pandemic, I wanted to create a self-defined measure of <b> effort </b> because the available data sets mostly concern the effects of the coronavirus compared to the efforts. For instance, the Bureau of Labor Statistics measured effects using the current population <a href="https://www.bls.gov/covid19/measuring-the-effects-of-the-coronavirus-covid-19-pandemic-using-the-current-population-survey.htm">survey</a>. I will utilize this self-defined measurement to visualize and thereby gain insight into the relative efforts made by countries through quantitative lens. Also, I hope to inform the public of the United States' efforts in comparison to the world.   

### 1.1 Tools

In [24]:
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
import seaborn as sns
import matplotlib.pyplot as plt

## 2. Data Collection

First, I started by searching the Internet for official data sets. I will be using a <a href="https://ourworldindata.org/coronavirus-source-data">data set</a> from a scientific online publication called Our World in Data (OWID). The OWID data consists of a variety of coronavirus statistics, and the ACAPS <a href="https://www.acaps.org/covid19-government-measures-dataset">data set</a> holds a combination of qualitative and quantitative information on governmental measures, of which I am interested in the initial policy implementation dates. For both data sets, extracting the data and converting them into data frames was fairly easy because of they were stored in well-structured csv and excel form, respectively.

In [2]:
owid = pd.read_csv('owid-covid-data.csv')
acaps = pd.read_excel('acaps_covid19_government_measures_dataset_0.xlsx', sheet_name='Database')

### 2.1 OWID Data

In [3]:
owid.head(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,total_deaths,new_deaths,total_cases_per_million,new_cases_per_million,...,aged_70_older,gdp_per_capita,extreme_poverty,cvd_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy
0,AFG,Asia,Afghanistan,2019-12-31,0.0,0.0,0.0,0.0,0.0,0.0,...,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83
1,AFG,Asia,Afghanistan,2020-01-01,0.0,0.0,0.0,0.0,0.0,0.0,...,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83
2,AFG,Asia,Afghanistan,2020-01-02,0.0,0.0,0.0,0.0,0.0,0.0,...,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83
3,AFG,Asia,Afghanistan,2020-01-03,0.0,0.0,0.0,0.0,0.0,0.0,...,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83
4,AFG,Asia,Afghanistan,2020-01-04,0.0,0.0,0.0,0.0,0.0,0.0,...,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83


The <b>stringency index</b> is one the columns in the OWID data above (not printed), and it is an existing measure that quantifies the strictness of a country's coronavirus policies on an ordinal scale. Note that the stringency index does not serve to evaluate the effort of a country's response to the pandemic. The measure is calculated by considering the several indicators, which include the following, and further details can be found on the OWID GitHub repository <a href="https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md">codebook</a>:
<ul>
    <li>C – containment and cloure policies</li>
    <li>E – economic policies</li>
    <li>H – health system policies</li>
    <li>M – miscellaneous policies</li>
</ul>


### 2.2 ACAPS Data

In [4]:
acaps.head(5)

Unnamed: 0,ID,COUNTRY,ISO,ADMIN_LEVEL_NAME,PCODE,REGION,LOG_TYPE,CATEGORY,MEASURE,TARGETED_POP_GROUP,COMMENTS,NON_COMPLIANCE,DATE_IMPLEMENTED,SOURCE,SOURCE_TYPE,LINK,ENTRY_DATE,Alternative source
0,1,Afghanistan,AFG,,,Asia,Introduction / extension of measures,Public health measures,Health screenings in airports and border cross...,No,,,2020-02-12,Ministry of Health,Government,https://moph.gov.af/en/moph-held-emergency-mee...,2020-03-14,
1,2,Afghanistan,AFG,Kabul,,Asia,Introduction / extension of measures,Public health measures,Isolation and quarantine policies,No,,,2020-02-12,Ministry of Health,Government,https://moph.gov.af/en/moph-held-emergency-mee...,2020-03-14,
2,3,Afghanistan,AFG,,,Asia,Introduction / extension of measures,Public health measures,Awareness campaigns,No,,,2020-02-12,Ministry of Health,Government,https://moph.gov.af/en/moph-held-emergency-mee...,2020-03-14,
3,4,Afghanistan,AFG,,,Asia,Introduction / extension of measures,Governance and socio-economic measures,Emergency administrative structures activated ...,No,,,2020-02-12,Ministry of Health,Government,https://moph.gov.af/en/moph-held-emergency-mee...,2020-03-14,
4,5,Afghanistan,AFG,,,Asia,Introduction / extension of measures,Social distancing,Limit public gatherings,No,Nevruz festival cancelled,,2020-03-12,AA,Media,https://www.aa.com.tr/en/asia-pacific/coronavi...,2020-03-14,


## 3. Data Processing

Now that I have converted the data sets into individual data frames, it is time to perform some processing before combining the data sets into a single data frame to analyze.

### 3.1 Implementation Date of First Policy

From the ACAPS data, I am interested in extracting the implementation date of the first policy of each country, which indicates when a country started taking measures against the coronavirus. The data set has information on all of the major policies implemented per country, and these are listed in chronological order. Hence, filtering the first policies is simple. 

In [5]:
# Group by ISO and insert first row of each into a new data frame. 
acaps_first_dates = acaps.groupby('ISO').first()

### 3.2 Filtering Data 

Now, I will drop some of the columns of the OWID data frame because they are repetitive aggregations of other columns or are not directly relevant to measuring effort. Similarly in the ACAPS data frame, I am only interested in the country and implementation dates of the first policies because I am focusing on "when" a country began to take action over the "what," which the stringency index already covers. 

In [6]:
owid.drop(owid.columns[np.r_[8:12,14:18,22:25,26,28:34]], axis=1, inplace=True)
acaps_first_dates = acaps_first_dates[['COUNTRY','DATE_IMPLEMENTED']]

Because records preceding the first coronavirus case are not relevant to the analysis, I will remove these rows from the OWID data frame as well. 

In [7]:
owid = owid[owid.total_cases > 0]

### 3.3 Data Column Types

Since dates are sometimes stored as strings, I will check that the ACAPS first policy implementation date and OWID entry dates are Timestamp objects and convert as necessary.

In [8]:
# Random row value chosen to output type
print(type(acaps_first_dates.at['AFG', 'DATE_IMPLEMENTED']))
print(type(owid.at[56, 'date']))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'str'>


In [9]:
for i, v in owid.iterrows():
    s = owid.at[i, 'date']
    owid.at[i, 'date'] = datetime.strptime(s, '%Y-%m-%d')
    owid.at[i, 'date'] = pd.to_datetime(s, format='%Y-%m-%d')

In [10]:
for i, v in acaps_first_dates.iterrows():
    s = acaps_first_dates.at[i, 'DATE_IMPLEMENTED']
    acaps_first_dates.at[i, 'DATE_IMPLEMENTED'] = pd.to_datetime(s, format='%Y-%m-%d')

### 3.4 Combining Data 

Let's combine the data columns into a single data frame. First, the indicies and overlapping columns (ISO code and country) should be formatted in a uniform way.

In [11]:
# Set acaps_first_dates index as a column 
acaps_first_dates['ISO'] = acaps_first_dates.index
acaps_first_dates.reset_index(inplace = True, drop = True) 

# Rename acaps_first_dates columns to follow naming conventions of OWID data frame
acaps_first_dates.columns = ['location', 'date_implemented', 'iso_code']

Since the two data sets are from different sources, I will check if they cover the same countries. As shown directly below, the two data frames do not account for the same number of countries in the world. Hence, there will be missing data for first policy implementation dates for some countries. 

In [12]:
print(len(owid.iso_code.value_counts()))
print(len(acaps_first_dates.iso_code.value_counts()))

210
193


In [13]:
# Combined data frame created, based on the OWID data
df = owid
df['first_policy_date'] = ' '

# ACAPS first policy implementation dates added to corresponding columns of combined dataframe
for i, r in df.iterrows():
    if r.iso_code in acaps_first_dates.iso_code.unique():
        df.at[i, 'first_policy_date'] = acaps_first_dates[acaps_first_dates.iso_code == r.iso_code].date_implemented.array[0].date()
    else:
        df.at[i, 'first_policy_date'] = np.nan

Since the OWID data provides global information in addition to those pertaining individual countries, I split the data frame to filter out the global information. This final step of data processing will facilitate comparisons to total values.

In [14]:
df = df[((df.location != 'International') & (df.location != 'World'))]

# Reset index
df.reset_index(drop=True)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,total_deaths,new_deaths,total_tests,new_tests,tests_units,stringency_index,population,population_density,gdp_per_capita,cvd_death_rate,first_policy_date
0,AFG,Asia,Afghanistan,2020-02-25 00:00:00,1.0,1.0,0.0,0.0,,,,8.33,38928341.0,54.422,1803.987,597.029,2020-02-12
1,AFG,Asia,Afghanistan,2020-02-26 00:00:00,1.0,0.0,0.0,0.0,,,,8.33,38928341.0,54.422,1803.987,597.029,2020-02-12
2,AFG,Asia,Afghanistan,2020-02-27 00:00:00,1.0,0.0,0.0,0.0,,,,8.33,38928341.0,54.422,1803.987,597.029,2020-02-12
3,AFG,Asia,Afghanistan,2020-02-28 00:00:00,1.0,0.0,0.0,0.0,,,,8.33,38928341.0,54.422,1803.987,597.029,2020-02-12
4,AFG,Asia,Afghanistan,2020-02-29 00:00:00,1.0,0.0,0.0,0.0,,,,8.33,38928341.0,54.422,1803.987,597.029,2020-02-12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26669,ZWE,Africa,Zimbabwe,2020-07-10 00:00:00,926.0,41.0,12.0,3.0,36005.0,659.0,tests performed,,14862927.0,42.729,1899.775,307.846,2020-03-12
26670,ZWE,Africa,Zimbabwe,2020-07-11 00:00:00,942.0,16.0,13.0,1.0,,,,,14862927.0,42.729,1899.775,307.846,2020-03-12
26671,ZWE,Africa,Zimbabwe,2020-07-12 00:00:00,982.0,40.0,18.0,5.0,,,,,14862927.0,42.729,1899.775,307.846,2020-03-12
26672,ZWE,Africa,Zimbabwe,2020-07-13 00:00:00,985.0,3.0,18.0,0.0,,,,,14862927.0,42.729,1899.775,307.846,2020-03-12


## 4. Exploratory Data Analysis 

### 4.1 Calculating Effort

After all the data filtering and combinations, I now have a data frame to utilize for analysis. With these raw numbers and dates, I want to measure governmental effort during the pandemic, which is going to be evaluated based on the average of two rankings that focus on how effective the policies were and how well did a country make use of their resources at hand.

#### 4.1.1 Policy Effectiveness

To rank based on policy effectiveness, I am going to do so based on the linear trend in increase per day of the population percentages of cases and deaths for each country following the date of their first policy implementation. We cannot rank solely by the case and death counts because the raw numbers themselves do not provide the context, for there is a directly proportional relationship between raw quantity and population size. Hence, it is the percentages that provide insight.

In [22]:
# Filter out SettingWithCopyWarning that appears when summing data frame columns
warnings.filterwarnings('ignore')

df['cases_and_deaths_pct'] = ((df['total_cases'] + df['total_deaths']) / df['population']) * 100
post_data = df[(not df.date is np.nan) & (df.date >= df.first_policy_date)]

policy_effectiveness = pd.DataFrame(post_data.iso_code.unique(), columns=['iso_code'])
policy_effectiveness['trends'] = np.float64(0) # placeholder

# Dates were transformed into epoch time because trend can only be found between scalar values
for c, v in policy_effectiveness.iterrows():
    c_df = post_data[post_data.iso_code == v.iso_code]
    c_df['date_epoch'] = c_df['date'].apply(lambda x: datetime.timestamp(x))
    m, b = np.polyfit(c_df.date_epoch, c_df.cases_and_deaths_pct, 1)
    policy_effectiveness.at[c, 'trends'] = m

policy_effectiveness['rank'] = policy_effectiveness.trends.rank()  

policy_effectiveness.head(10)

Unnamed: 0,iso_code,trends,rank
0,AFG,8.932889e-09,101.0
1,ALB,9.342012e-09,105.0
2,DZA,3.988554e-09,80.0
3,AGO,1.147948e-10,7.0
4,ATG,5.868778e-09,91.0
5,ARG,1.659428e-08,127.0
6,ARM,9.98973e-08,174.0
7,AUS,2.784082e-09,63.0
8,AUT,1.398343e-08,122.0
9,AZE,1.854493e-08,129.0


#### 4.1.2 Use of Resources

While policy effectiveness plays a significant role in evaluating effort, it is not enough to cover the whole context needed to be considered to yield accurate values. Countries have differing economic standings and these make a critical impact on the options and limitations they have in order to take measures against the coronavirus. To rank based on how well did a country make use of their resources at hand, I am going to utilize the stringency index and GDP per capita.   