# Bootcamp Problem Set 3

Group 7 
- Dharshen Mahalingam - A0220515W
- Fanyanqing Sun - A0220382R
- Simon Kleppe - A0220490R
- Sun Shaofei - A0220536N

#### This Problem set investigates how COVID affects the financial market in Germany and people's daily lives.


**For Financial Indicators, we adopted the following ETF and Stock Indexes,**

**EQUITY RELATED**

1)SDEU.L （iShares Germany Govt Bond UCITS ETF EUR (Dist)）
    ->The performance of the Barclays Germany Treasury Bond Index with underlying bonds of AAA ratings that has highest possible credit worthiness.
    ->292M Euro asset under management

2)EWG -> 80% of assets in the securities of large and mid capitalisation companies under Frankfurt SE

3)EWGS -> EWG Small. Measures the performance of equity securities of small capitalisation companies

**BOND RELATED**

4)IBGM.MI (Govt Bond 7-10yr UCITS ETF EUR (Dist))
    -> Euro government bonds at investment grade with more than 7 years of calculated life and min asset value of 2M Euro
    
5)SDEU.L
    -> The ETF invests in physical index securities to Euro demoninated German government bonds with credit ratings that are the same as the country rating, with minimum remaining TTM of 1 year and 300M assets

**For non-Financial Indicator, we adopted the power consumption in Germany during the COVID period**

In [None]:
#Install the yfinance package and import other packagesa

!pip install yfinance
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import yfinance as yf
import matplotlib as mpl

import seaborn as sns
import statsmodels.api as sm
from statsmodels import regression

#Import the 5 indexes from Yahoo Finance as data
symbols_list = ["IBGM.MI","DAX","EWG","EWGS","SDEU.L"]
start = dt.datetime(2019,9,1)
end = dt.datetime(2020,8,31)
data = yf.download(symbols_list, start=start, end=end)
data.head()

# filter column adjusted close
df = data['Adj Close']
df.head()

In [None]:
#The daily closing price of the 5 assets
plt.style.use('fivethirtyeight')
plt.figure(figsize=(30,10))

df['DAX'].plot()
df['EWG'].plot()
df['EWGS'].plot()
df['IBGM.MI'].plot()
df['SDEU.L'].plot()
plt.ylabel("Daily Closing Price")
plt.legend()

plt.show()
print(plt.style.available)

In [None]:
# Percentage Change for returns for the 5 assets

# Remove first NaN
plt.style.use('fivethirtyeight')
df_pct =df.pct_change()[1:]
df_pct.head()

plt.figure(figsize=(30,10))
df_pct['DAX'].plot()
df_pct['EWG'].plot()
df_pct['EWGS'].plot()
df_pct['IBGM.MI'].plot()
df_pct['SDEU.L'].plot()
plt.ylabel("% Change of Returns")
plt.legend()
plt.show()


In [None]:
#Import COVID Germany Data (Daily from Jan to Sep)
covid = pd.read_csv('../input/germany-covid19-janseptember/Germany COVID-19.csv')
covid = covid[['Date','Confirmed']]
covid["Date"] = pd.to_datetime(covid["Date"])
covid.tail()
_ = covid.plot(x = 'Date', y = 'Confirmed', kind = 'scatter', color = 'r')

In [None]:
#Import Power Consumption Data (Non-Fin Indicator) and parse the data
pw_consumption = pd.read_csv('../input/western-europe-power-consumption/de.csv')
pw_consumption['Date'] = pd.to_datetime(pw_consumption['end'])
pw_consumption['Date'] = pd.to_datetime(pw_consumption['Date']).dt.date
pw_consumption['Date'] = pd.to_datetime(pw_consumption['Date'])
pw_consumption = pw_consumption[['Date', 'load']]


#Resample data to days and sum the different time range
pw_consumption = pw_consumption.resample('D', on = 'Date').sum()

pw_consumption = pw_consumption.sort_values('Date', ascending = False)

#Leave the data of the recent 1 year
pw_consumption = pw_consumption[:366]

pw_consumption.plot()


In [None]:
#Create a base line percentage change of the different assets and merge with the power consumption data (Remove the top NaN)

baseline=df_pct.merge(pw_consumption,how="left",on="Date")
baseline =baseline[1:]
baseline.head(20)

In [None]:
#Import City Info (Adapted from Bootcamp 3A)
cty_info = pd.read_csv('../input/countryinfo/covid19countryinfo.csv').rename(columns={'country':'Country'})

# Filter observations with aggregate country-level information
# The column region for region-level observations is populated
cty_info = cty_info[cty_info.region.isnull()]

# Convert string data type to floating data type
# Remove comma from the fields
cty_info['healthexp'] = cty_info[~cty_info['healthexp'].isnull()]['healthexp'].str.replace(',','').astype('float')
cty_info['gdp2019'] = cty_info[~cty_info['gdp2019'].isnull()]['gdp2019'].str.replace(',','').astype('float')

# Convert to date objects with to_datetime method
gov_actions = ['quarantine', 'schools', 'gathering', 'nonessential', 'publicplace']

for gov_action in gov_actions:
    cty_info[gov_action] = pd.to_datetime(cty_info[gov_action], format = '%m/%d/%Y')
    
# Filter columns of interest
# Note: feel free to explore other variables or datasets
cty_info = cty_info[['Country','quarantine', 'schools', 'publicplace', 'gatheringlimit', 'gathering', 'nonessential']]

# cty_info.describe()
cty_info.info()
#cty_info.head(20)


In [None]:
#Import COVID Full Table Info

full_table = pd.read_csv('../input/corona-virus-report/covid_19_clean_complete.csv')
full_table['Date'] = pd.to_datetime(full_table['Date'])

# Examine DataFrame (object type, shape, columns, dtypes)
full_table.info()

#type(full_table)
#full_table.shape
#full_table.columns
#full_table.dtypes

# Deep dive into the DataFrame
full_table.head()

In [None]:
#Merge COVID Info with Germany's Financial and Daily Life indicators

full_grouped = pd.read_csv('../input/corona-virus-report/full_grouped.csv')
full_grouped['Date'] = pd.to_datetime(full_grouped['Date'])
#full_grouped.loc[full_grouped['Country/Region'] == 'US', 'Country/Region'] = 'USA'


# Correct country names in worldometer to make them consistent with dataframe full_grouped column Country/Region before merging 


# Draw population and country-level data

full_grouped = full_grouped.merge(cty_info, how = 'left', left_on = 'Country/Region', right_on = 'Country')

# Backfill data
full_grouped = full_grouped.fillna(method='ffill')

# Create post-invention indicators
gov_actions = ['quarantine', 'schools', 'gathering', 'nonessential', 'publicplace']

for gov_action in gov_actions:
    full_grouped['post_'+gov_action] = full_grouped['Date'] >= full_grouped[gov_action]
    full_grouped['day_rel_to_' + gov_action] = (full_grouped['Date'] - full_grouped[gov_action]).dt.days

# Create percent changes in covid19 outcomes
covid_outcomes = ['Confirmed', 'Deaths', 'Recovered', 'Active']

for covid_outcome in covid_outcomes:
    full_grouped['pct_change_' + covid_outcome] = full_grouped.groupby(['Country/Region'])[covid_outcome].pct_change()
    full_grouped[full_grouped['pct_change_' + covid_outcome] == np.inf] = 0

# Replace space in variable names with '_'
full_grouped.columns = full_grouped.columns.str.replace(' ', '_')
    
full_grouped.info()
#full_grouped.tail(20)
#print(full_grouped.iloc[0,0])
# full_grouped[full_grouped["quarantine"] != None]["Country/Region"].unique()
full_grouped=full_grouped[full_grouped['Country/Region'] == 'Germany']
full_grouped=full_grouped.drop(columns=["Country/Region"])

full_grouped.tail()

# full_grouped.describe()

In [None]:
full_grouped["Date"] = pd.to_datetime(full_grouped["Date"])

baseline_merged=baseline.merge(full_grouped, how="inner",on="Date")

baseline_merged.head()



Preparation Complete.

First of all, we investigate how government intervention helps with the new confirmed COVID cases in Germany

In [None]:
#Define a function to plot government interventions' effect

def plot_gov_action (covid_outcome, gov_action):
    fig = px.scatter(baseline_merged[baseline_merged[gov_action] != None], x = 'day_rel_to_' + gov_action \
                     , y=covid_outcome, \
                     title='N days from ' + gov_action, height=300)
    fig.update_layout(yaxis=dict(range=[-0.2,1]))
    fig.show()

In [None]:
#Investigate after government intervention, how the rise of the confirm cases changes
import plotly as py
import plotly.io as pio
import plotly.express as px
pio.templates.default = "ggplot2"
plot_gov_action('pct_change_Confirmed', 'quarantine')
plot_gov_action('pct_change_Confirmed', 'schools')
plot_gov_action('pct_change_Confirmed', 'gathering')

**Regression Analysis**

Secondly, we will use regression model to investigate how COVID has affected 
- The financial market in Germany (Stock and Bond Market)
- The daily lives of the people in Germany (Power consumption)

In [None]:
#Regression Scatter Plot of Covid Cases Against Short Term Bond
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'SDEU.L', data = baseline_merged)

X = baseline_covid[["pct_change_Recovered","pct_change_Active","pct_change_Deaths","pct_change_Confirmed"]]
y = baseline_covid["SDEU.L"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
#Covid_Short Term Bond

#Regression Scatter Plot of Covid Cases Against Power Consumption
baseline_merged['Date'] = pd.to_datetime(baseline_merged['Date'])
baseline_merged.head()
#baseline_covid = baseline_merged[baseline_merged['Date'] >= '2020-01-01']
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'SDEU.L', data = baseline_merged)
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against Long Term Bond
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'IBGM.MI', data = baseline_merged)

#OLS
X = baseline_covid[["pct_change_Recovered","pct_change_Active","pct_change_Deaths","pct_change_Confirmed"]]
y = baseline_covid["IBGM.MI"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against DAX
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'DAX', data = baseline_merged)

#OLS
X = baseline_covid[["pct_change_Recovered","pct_change_Active","pct_change_Deaths","pct_change_Confirmed"]]
y = baseline_covid["DAX"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against EWG (Large and Mid Capitalisation Companies in DAX)
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'EWG', data = baseline_merged, color = 'G')

#OLS
X = baseline_covid[["pct_change_Recovered","pct_change_Active","pct_change_Deaths","pct_change_Confirmed"]]
y = baseline_covid["EWG"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against EWGS (Small Capitalisation Companies in DAX)
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'EWGS', data = baseline_merged, color = 'B')


X = baseline_covid[["pct_change_Recovered","pct_change_Active","pct_change_Deaths","pct_change_Confirmed"]]
y = baseline_covid["EWGS"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
#Regression Scatter Plot of Covid Cases Against EWGS (Small Capitalisation Companies in DAX)
reg_covid_load = sns.regplot(x = 'pct_change_Confirmed', y = 'EWGS', data = baseline_merged, color = 'B')
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against Government Interventions (Fiscal Policy indicated by short-term bond price)

X = baseline_covid[["pct_change_Confirmed","SDEU.L"]]
y = baseline_covid["DAX"]

# Note the difference in argument order
X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

# Print out the statistics
print(model.summary())

In [None]:
#Regression Scatter Plot of Covid Cases Against Power Consumption 


baseline_merged['Date'] = pd.to_datetime(baseline_merged['Date'])
baseline_merged.head()
baseline_covid = baseline_merged[baseline_merged['Date'] >= '2020-01-01']
reg_covid_load = sns.regplot(x = 'Confirmed', y = 'load', data = baseline_covid, color = 'B')

#Regression OLS Analysis


X = baseline_covid["Confirmed"]
y = baseline_covid["load"]

X = sm.add_constant(X)
model = sm.OLS(y.astype(float), X.astype(float), missing='drop').fit()
predictions = model.predict(X.astype(float)) # make the predictions by the model

print(model.summary())

**Inferential Statistics**

After analysing the regression plots and OLS, we will conduct an hypothesis testing on if Government Intervention on COVID has an influence on the power consumption in Germany,

> H0: The average daily power consumption after the government intervention is the same as that of before.

> H1: The average daily power consumption after the government intervention is different as that of before.

In [None]:
#Create 2 series of power consumption data before and after the government interventions
pw_consump1=baseline_covid[baseline_covid["Date"]<="2020-03-21"]
pw_consump2=baseline_covid[baseline_covid["Date"]>="2020-03-21"]

In [None]:
#Hypothesis testing at Alpha level of 5%

from scipy import stats
from statsmodels.stats import weightstats as stests
ztest ,pval1 = stests.ztest(pw_consump1["load"], x2=pw_consump2['load'], value=0,alternative='two-sided')
print(float(pval1))
if pval1<0.05:
    print("Reject null hypothesis H0: 'The average daily power consumption after the government intervention is the same as that of before'")
else:
    print("Accept null hypothesis H0: 'The average daily  power consumption after the government intervention is the same as that of before'")

In [None]:
#Conduct an Chi Square Test to analyse if Power Consumption After Intervention has any dependency on that of before intervention.

#Create 3 bins for power comsumptions before and after the intervention and merge into a contingency table
pw_consume = pd.DataFrame()
pw_consume=pd.concat([pw_consump1["load"],pw_consump2["load"]],axis=1, ignore_index = True)
bin_labels_3 = ['Bin 1', 'Bin 2', 'Bin 3']

pw_consume['before_quarantine'] = pd.qcut(pw_consump1["load"], q=3, labels=bin_labels_3)
pw_consume['after_quarantine'] = pd.qcut(pw_consump2["load"], q=3, labels=bin_labels_3)

contingency_table = [[pw_consume.groupby('before_quarantine')[0].mean(),pw_consume.groupby('after_quarantine')[1].mean()]]
contingency_table

In [None]:
#Conduct Chi Square Test

from scipy.stats import chi2_contingency 
stat, p, dof, expected = chi2_contingency(contingency_table) 
  
# interpret p-value 
alpha = 0.05
print("p value is " + str(p)) 
if p <= alpha: 
    print('Reject Ho, Power Consumption Before and After the Intervention are Dependent') 
else: 
    print('Fail to reject H0 as Power Consumption Before and After the Intervention are Independent') 

#End Note

Most of our data, actually all of our data in this analysis are time series which progress over time. It means that within each data, there's an auto-regression. 

While an regression model can show some relationship between the data we analysed today, there are actually some assumptions that need to be fulfilled.
* > Linear relationship
* > Multivariate normality
* > No or little multicollinearity
* > No auto-correlation
* > Homoscedasticity

Regression model may not be the best fit to the analysis above even though it shows some relationships

We will investigate further into the daily life affected by COVID in Germany and would try to adopt time series analysis in the final project.
