# Question #
1. Pick a country and evaluate the Covid19 situation in the country.
2. Evaluate the potential hidden cases (e.g., case positivity rate) and deaths (e.g., estimated infection fatality rate, excess death). 
3. Explore the relationship between the country's Covid19 cases and deaths and government health intervention policies (e.g., vaccination rate, closure), as well as Google community mobility reports. 
4. Brainstorm on an investment strategy for the country. Lab 5 is part II of this strategy note.

In [None]:
# math operations
from numpy import inf

# time operations
from datetime import timedelta

# for numerical analyiss
import numpy as np

# to store and process data in dataframe
import pandas as pd

# basic visualization package
import matplotlib.pyplot as plt

# advanced ploting
import seaborn as sns

# interactive visualization
import plotly.express as px
import plotly.graph_objs as go


# import plotly.figure_factory as ff
# from plotly.subplots import make_subplots

# for offline ploting
from plotly.offline import plot, iplot, init_notebook_mode

init_notebook_mode(connected=True)

# hide warnings
import warnings

warnings.filterwarnings("ignore")

# color pallette
cnf, dth, rec, act = "#393e46", "#ff2e63", "#21bf73", "#fe9801"

!ls ../input/corona-virus-report


# Set up Data #

In [None]:
country_wise = pd.read_csv('../input/corona-virus-report/country_wise_latest.csv')
# Replace missing values '' with NAN and then 0
country_wise = country_wise.replace('', np.nan).fillna(0)

# Deep dive into the DataFrame
country_wise.info()
country_wise.head(10)

In [None]:
# Grouped by day, country
# =======================

full_grouped = pd.read_csv('../input/corona-virus-report/full_grouped.csv')
# full_grouped = pd.read_csv("data/full_grouped.csv")
full_grouped.info()
full_grouped.head(10)

# Convert Date from Dtype 'Object' (or String) to Dtype 'Datetime'
full_grouped["Date"] = pd.to_datetime(full_grouped["Date"])

#Convert US to United States
full_grouped.replace(to_replace='US',value='United States',inplace=True)

full_grouped.info()

In [None]:
# Day wise
# ========

day_wise = pd.read_csv('../input/corona-virus-report/day_wise.csv')
# day_wise = pd.read_csv("data/day_wise.csv")
day_wise["Date"] = pd.to_datetime(day_wise["Date"])
day_wise.info()
day_wise.head(10)

In [None]:
# Worldometer data
# ================

worldometer_data = pd.read_csv('../input/corona-virus-report/worldometer_data.csv')
# worldometer_data = pd.read_csv('data/worldometer_data.csv')

# Replace missing values '' with NAN and then 0
# What are the alternatives? Drop or impute. Do they make sense in this context?
worldometer_data = worldometer_data.replace('', np.nan).fillna(0)
worldometer_data['Case Positivity'] = round(worldometer_data['TotalCases']/worldometer_data['TotalTests'],5)
worldometer_data['Case Fatality'] = round(worldometer_data['TotalDeaths']/worldometer_data['TotalCases'],5)

# Case Positivity is infinity when there is zero TotalTests due to division by zero
worldometer_data[worldometer_data['Case Positivity'] == inf] = 0

# Qcut is quantile cut. Here we specify three equally sized bins and label them low, medium, and high, respectively.
worldometer_data ['Case Positivity Bin']= pd.qcut(worldometer_data['Case Positivity'], q=3, labels=['low', 'medium', 'high'])

# Population Structure
worldometer_pop_struc = pd.read_csv('../input/covid19-worldometer-snapshots-since-april-18/population_structure_by_age_per_contry.csv')
# worldometer_pop_struc = pd.read_csv('data/population_structure_by_age_per_contry.csv')

# Replace missing values with zeros
worldometer_pop_struc = worldometer_pop_struc.fillna(0)
#worldometer_pop_struc.info()

# Merge worldometer_data with worldometer_pop_struc
# Inner means keep only common key values in both datasets
worldometer_data = worldometer_data.merge(worldometer_pop_struc,how='inner',left_on='Country/Region', right_on='Country')

# Keep observations where column 'Country/Region' is not 0
worldometer_data = worldometer_data[worldometer_data['Country/Region'] != 0]

#change UK to United Kingdom
worldometer_data.replace(to_replace='UK', value='United Kingdom', inplace=True)
worldometer_data.replace(to_replace='USA', value='United States', inplace=True)

# Inspect worldometer_data's metadata
worldometer_data.info()

# Inspect Data
# worldometer_data.info()
# worldometer_data.tail(20)
# worldometer_data['Case Positivity'].describe()


# Question 1 #

In [None]:
#plotting the active cases, death and recovered over time in Country
def plot_case(country):
    df = full_grouped[full_grouped['Country/Region']== country]
    temp = df.groupby("Date")["Recovered", "Deaths", "Active"].sum().reset_index()
    temp = temp.melt(
        id_vars="Date",
        value_vars=["Recovered", "Deaths", "Active"],
        var_name="Case",
        value_name="Count",
    )

    fig = px.line(
        temp,
        x="Date",
        y="Count",
        color="Case",
        height=600,
        width=700,
        title="Cases over time in " + str(country) ,
        color_discrete_sequence=[rec, dth, act],
    )
    fig.update_layout(xaxis_rangeslider_visible=True)
    fig.show()

In [None]:
#Evaluation for country of interest and plot the new case, new death and new recovered
country = "United States"
compar = [country, "Brazil", "Singapore"] #comparison

for i in compar:
    plot_case(i)


From Jan to Jul 2020, we can see from the graphs above how United State's covid trend was in comparison to Brazil and Singapore. The number of cases in the United States didn't stop rising throughout the period, Brazil started to plateu on Jul 2020 and Singapore reached its peak on May 2020. Both Brazil and Singapore seems to managed the Covid cases better than the United States as shown by the trend in their Recovered case counts.

# Question 2

In [None]:
#Set up Mortality Rate Data
#Full_grouped data only until 27 July
mortal = pd.read_csv('../input/excess-mortality-during-the-covid19-pandemic/excess_mortality.csv')
mortal = mortal.filter(['location', 'date','deaths_2021_all_ages','deaths_2020_all_ages', 'average_deaths_2015_2019_all_ages'])
mortal = mortal[mortal['date'] <= '2020-07-31'] #To see the trend for the same time period
#print(mortal.date.max())
mortal = mortal.groupby('location').sum().reset_index()
#mortal['delta20'] = mortal['deaths_2020_all_ages'] - mortal['average_deaths_2015_2019_all_ages']
mortal.head()

In [None]:
#look at total data, find the positivity rate, match it with population
#country is already defined in question 1
compar = [country,'France','Italy']

#Set the Mortality Rate
mortal_compar = mortal[mortal['location'].isin(compar)]

#Estimate Hidden Cases and Deaths
tot = worldometer_data[worldometer_data['Country/Region'].isin(compar)].reset_index()
tot = tot.filter(['Country/Region','Case Positivity','Population','Case Fatality','TotalCases','TotalDeaths','Fraction age 65+ years'])
tot = tot.merge(mortal_compar, how='inner', left_on='Country/Region', right_on='location')
tot.drop(labels='location', axis=1, inplace=True)


avg_fatal_rate = worldometer_data['Case Fatality'].mean()
avg_pos_rate = worldometer_data['Case Positivity'].mean()


#Estimate hidden case
tot['Cases_Recorded'] = tot['TotalCases']
tot['Deaths_Recorded'] = tot['TotalDeaths']
tot['exp_tot_case'] = tot['Case Positivity']*tot['Population']
tot['hidden_case'] = tot['exp_tot_case']-tot['TotalCases']

fig_case = px.histogram(tot, x='Country/Region', y=['Cases_Recorded','hidden_case'], barmode = 'group', title= 'Total Recorded Cases and Estimated Hidden Cases')
fig_case.update_layout(yaxis_title_text = 'Number of Case')
fig_case.show()

'''
#METHOD 1
#predict expected case from current positivity rate
tot['exp_tot_death'] = tot['Case Fatality']*tot['exp_tot_case']*tot['Fraction age 65+ years']
tot['hidden_death'] = tot['exp_tot_death']-tot['TotalDeaths']

fig_death = px.histogram(tot, x='Country/Region', y=['TotalDeaths','hidden_death'], barmode = 'group', title= 'Total Recorded Deaths and Estimated Hidden Deaths Method 1')
fig_death.update_layout(yaxis_title_text = 'Number of Death')
fig_death.show()
'''

#METHOD 2 Hidden Death

tot['hidden_death'] = tot['Case Positivity']*(tot['deaths_2020_all_ages']-tot['TotalDeaths'])

fig_death2 = px.histogram(tot, x='Country/Region', y=['Deaths_Recorded','hidden_death'], barmode = 'group', title= 'Total Recorded Deaths and Estimated Hidden Deaths')
fig_death2.update_layout(yaxis_title_text = 'Number of Death')
fig_death2.show()


print("Average Fatality Rate per Case Worldwide is " + str(round(avg_fatal_rate,2)*100) +'%')
print("Average Positive Rate Population Case Worldwide is " + str(round(avg_pos_rate,2)*100) +'%')

tot_ = tot.filter(['Country/Region','Population','Cases_Recorded','hidden_case','Deaths_Recorded','hidden_death'])
tot_.set_index('Country/Region').astype('int').style


#print(tot['exp_tot_case'].head())
#print(tot['exp_tot_death'].head())
#print(tot['Case Positivity'].head())
#print(tot['Case Fatality'].head())
#print(tot['diff_exp_case'])
#print(tot['diff_exp_death'])
#tot.round(2)

To find the hidden cases, the positivity rate is extrapolated to the population number. Assuming the whole population is tested and the positivity rate remains same, the hidden cases in US are estimated to be 21,364,311.

To find the hidden deaths, the positivity rate per test is extrapolated to the total death recorded (excl. death due to covid). This assumes the remainder death count will give the same positivity rate if they are tested. For the total death recorded, only the proportion of those with the age 65+ is being considered as they have the most probability of dying due to complication that may involve Covid. With that, the hidden deaths in the US are estimated to be 137,014. 

Both the hidden cases and deaths are estimated for the duration from Jan 2020 to Jul 2020.

# Question 3 #

In [None]:
# Declare Public Health emergency: Feb 3
# Declare National Emergency March 13
# California Lockdown March 19
pub_emer = '2020-02-03'
cal_ld = '2020-03-19'


df = full_grouped[full_grouped['Country/Region']== 'United States']
temp0 = df.groupby("Date")["Recovered", "Deaths", "Active"].sum().reset_index()
temp = temp0.melt(
    id_vars="Date",
    value_vars=["Recovered", "Deaths", "Active"],
    var_name="Case",
    value_name="Count",
)

fig = px.line(
    temp,
    x="Date",
    y="Count",
    color="Case",
    height=600,
    width=700,
    title="Cases over time in United States" ,
    color_discrete_sequence=[rec, dth, act],
)
#fig.update_layout(xaxis_rangeslider_visible=True)
fig.add_vline(x=str(pub_emer), line_width = 2, line_dash='dash', line_color='blue')
fig.add_vline(x=str(cal_ld), line_width = 2, line_dash='dash', line_color='green')
#fig.show()

#Google Mobility Index

mob = pd.read_csv('../input/mobilityreport/2020_US_Region_Mobility_Report.csv')
mob = mob[(mob['date'] <= '2020-07-27') & (mob['sub_region_1'].isna()) ]
mob = mob.iloc[:,8:]
mob['date'] = pd.to_datetime(mob['date'])


us_trend = temp0.merge(mob,left_on="Date",right_on='date').drop(labels=['date','Recovered','Deaths'],axis=1)
us_trend["Active"] = us_trend["Active"].pct_change()*100
us_trend["Workplace_index"] = us_trend['workplaces_percent_change_from_baseline']
us_trend['Residential_index'] = us_trend['residential_percent_change_from_baseline']

fig2 = px.scatter(us_trend, x='Date', y=['Active','Workplace_index','Residential_index'],trendline="rolling", trendline_options=dict(window=7))
fig2.update_traces(marker_size=0.01)
fig2.update_layout(legend= {'itemsizing': 'constant'})
fig2.update_layout(title= 'United States Covid 19 Active Cases and Mobility Index 7-Day MA',yaxis_title_text='Percent Change (%)')
#fig2.add_vline(x=str(pub_emer), line_width = 2, line_dash='dash', line_color='blue')
fig2.add_vline(x=str(cal_ld), line_width = 2, line_dash='dash', line_color='green')
#fig2.add_vrect(x0=start,x1=end,fillcolor='blue',opacity=0.1,annotation_text='Circuit Breaker',annotation_position="top left")
#fig2.show()

In [None]:
#List when different country started Lockdown
# Singapore CB 7 Apr-1Jun
start = '2020-04-07'
end = '2020-06-01'
#sg_cb = pd.DataFrame()
#sg_cb['dates'] = pd.date_range(start,end)

df = full_grouped[full_grouped['Country/Region']== 'Singapore']
temp0 = df.groupby("Date")["Recovered", "Deaths", "Active"].sum().reset_index()
temp = temp0.melt(
    id_vars="Date",
    value_vars=["Recovered", "Deaths", "Active"],
    var_name="Case",
    value_name="Count",
)

fig3 = px.line(
    temp,
    x="Date",
    y="Count",
    color="Case",
    height=600,
    width=700,
    title="Cases over time in Singapore" ,
    color_discrete_sequence=[rec, dth, act],
)
#fig.update_layout(xaxis_rangeslider_visible=True)
fig3.add_vrect(x0=start,x1=end,fillcolor='blue',opacity=0.1,annotation_text='Circuit Breaker',annotation_position="top left")
#fig3.show()

#Google Mobility Index
mob = pd.read_csv('../input/mobilityreport/2020_SG_Region_Mobility_Report.csv')
mob = mob.iloc[:,8:]
mob = mob[mob['date'] <= '2020-07-27']
mob['date'] = pd.to_datetime(mob['date'])


sg_trend = temp0.merge(mob,left_on="Date",right_on='date').drop(labels=['date','Recovered','Deaths'],axis=1)
sg_trend["Active"] = sg_trend["Active"].pct_change()*100
sg_trend["Workplace_index"] = sg_trend['workplaces_percent_change_from_baseline']
sg_trend['Residential_index'] = sg_trend['residential_percent_change_from_baseline']

fig4 = px.scatter(sg_trend, x='Date', y=['Active','Workplace_index','Residential_index'],trendline="rolling", trendline_options=dict(window=7))
fig4.update_traces(marker_size=0.01)
fig4.update_layout(legend= {'itemsizing': 'constant'})
fig4.update_layout(title= 'Singapore Covid 19 Active Cases and Mobility Index 7-Day MA',yaxis_title_text='Percent Change (%)')
fig4.add_vrect(x0=start,x1=end,fillcolor='blue',opacity=0.1,annotation_text='Circuit Breaker',annotation_position="top left")

fig.show()
fig3.show()
fig2.show()
fig4.show()


In United States, the government did put any drastic country-wise measures. The US government declares public health emergency on Feb 2020 (blue dashed line) and the state of California was the first to impose a lockdown on March 2020 (green dash line), but the active case number remains rising. However, looking at their mobility index, it can be seen that the mobility in workspace area decreases significantly as the state of Californaia announced the lockdown. This trend suggest public awareness on the pandemic with companies and individuals taking the initiative to reduce activity in the workplace and to start working from home. Despite all the government effort and self-initiatives, the covid cases in the United States was still on the rise throughout the defined time period.

On the other side of the world, Singapore's government action succesfuly reduced the number of active cases.Looking at the cases over time in Singapore, the duration for Circuit Breaker is shaded in Blue. It can be seen how the cases peaked during the Circuit Breaker and steadily decreases even beyond June 2020. This shows the right timing of the government to halt the alarming rate of increase of covid cases. This trend is also reflected by the 7-Day Moving Average trend between the active cases and the mobility index. When Circuit Breaker took place, the workplace mobility drops significantly and the residentila mobility increased. This represents how people started to work from more due to the pandemic and the government action. After the Circuit Breaker, when the covid active cases is decreasing, people started to go back to the workplace as shown by the increase in workplace index and decrease in residentila index.

# Question 4 Investment Strategy #

The country of our choice to invest in is United States of America (USA). The American economy has been opening up gradually even with the growing number of cases, thus reopening stocks are expected to provide good returns in the coming year. But the rise in number of cases can force governments to go back to Lockdown regimes, thus Stay-home stocks provide a good hedge component to the portfolio. Also, medical communities around the globe are suggesting third dose of most vaccinations, thus we include Pfizer and Moderna in the analysis to benefit in that scenario. 

We have chosen a group of stocks which include both Reopening plays as well as stocks that we believe will perform better if the Covid situation worsens. This choice was made to maintain a neutral position so that the effect of Covid situation on portfolio returns is minimized. 

Reopening stocks usually include stocks from industries which have been severely impacted due to Covid-19 but haven't recovered to their pre-covid levels yet. Some examples of such industries are airlines and tourism. To benefit from the reopening up of the economy, we have included American Airlines (AAL) and Marriott International, Inc (MAR).

In addition, this group of stocks include Tech giants like Google and Apple, stock prices of which have recovered since the market crash in 2020, but are still expected to provide decent returns in the coming year because of their stable cash flows and operating margins.