# 2. To what extent does the aggregate stock market respond to COVID-related development (e.g., surge in number of news/searches mentioning COVID) and key events (e.g., initial virus outbreak, vaccine progress)?

In [1]:
import pandas as pd
import numpy as np
import pickle


# Load related raw data and prepare dataset for correlation calculation

spy_ret = pd.read_csv('plotly/dense_return2020.csv', index_col=0)
spy_ret = spy_ret[['SPY']]
spy_ret.index = pd.to_datetime(spy_ret.index)

covid_search = pd.read_csv('plotly/covid_search_trend.csv', index_col=0)
covid_search.index = pd.to_datetime(covid_search.index)
wkly_return = spy_ret.resample('W').sum()
wkly_search = covid_search.resample('W').sum()
spy_covid = pd.concat([wkly_return, wkly_search], axis=1)
spy_covid.columns = ['spy_return%', 'COVID_search']
spy_covid['spy_return%'] = spy_covid['spy_return%']*100
spy_covid = spy_covid.dropna()

### Calculate correlation for lead-lag effect

In [50]:
spy_covid['COVID_search_change'] = spy_covid['COVID_search'].pct_change()
spy_covid = spy_covid.replace(np.inf, 0.0)
for i in range(1, 4):
    spy_covid[f'COVID_search_change_lag{i}'] = spy_covid['COVID_search_change'].shift(i) 

In [52]:
spy_covid.corr()['spy_return%']

spy_return%                 1.000000
COVID_search                0.015647
COVID_search_change        -0.176430
COVID_search_change_lag1   -0.478648
COVID_search_change_lag2   -0.307344
COVID_search_change_lag3   -0.044758
Name: spy_return%, dtype: float64

In [63]:
spy_covid['2020-03':'2020-10']

Unnamed: 0_level_0,spy_return%,COVID_search,COVID_search_change,COVID_search_change_lag1,COVID_search_change_lag2,COVID_search_change_lag3
data_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-03-01,-6.446397,8.0,1.0,3.0,0.0,0.0
2020-03-08,3.084463,32.0,3.0,1.0,3.0,0.0
2020-03-15,-8.227173,75.0,1.34375,3.0,1.0,3.0
2020-03-22,-20.3153,100.0,0.333333,1.34375,3.0,1.0
2020-03-29,11.821819,98.0,-0.02,0.333333,1.34375,3.0
2020-04-05,2.804299,87.0,-0.112245,-0.02,0.333333,1.34375
2020-04-12,12.17378,80.0,-0.08046,-0.112245,-0.02,0.333333
2020-04-19,1.98612,70.0,-0.125,-0.08046,-0.112245,-0.02
2020-04-26,-3.657767,70.0,0.0,-0.125,-0.08046,-0.112245
2020-05-03,1.421883,65.0,-0.071429,0.0,-0.125,-0.08046


In [53]:
spy_covid['2020-03':'2020-10'].corr()['spy_return%']

spy_return%                 1.000000
COVID_search               -0.007853
COVID_search_change        -0.203605
COVID_search_change_lag1   -0.504357
COVID_search_change_lag2   -0.341984
COVID_search_change_lag3   -0.043277
Name: spy_return%, dtype: float64

# Granger's Causality test: does COVID surge cause stock market sell-off?

Using Granger’s causality test, one can test the null hypothesis that past changes in COVID searches do not have additional explainatory power in forecasting future stock market (proxied by SPY) returns, once we control for stock market's past returns. In mathematical terms, the test is done using the following augmented regression. 

$$ret_t = \alpha + \{ a_1 \dot ret_{t-1} + a_2 \dot ret_{t-2}  + ... + a_p \dot ret_{t-p} \} + \{ b_1 \dot \Delta COVID_{t-1} + b_2 \dot \Delta COVID_{t-2}  + ... + b_p \dot \Delta COVID_{t-p} \}$$

With a given maximum lag $p$, we test the null hypothesis that $\{b_1, b_2, ..., b_p\}$ collectively do not statistically significantly add explanatory power to the regression including only past values of stock returns, according to an F-test (whose null hypothesis is no explanatory power jointly added by the x's). If we can reject this null hypothesis, as supported by high tstats in the F-test (or equivalently, small p value), then one can claim that changes in COVID severity (as reflected in change in COVID searches) Granger cause future stock market changes. 

To facilitate our test, we use **statsmodels.tsa.stattools.grangercausalitytest**. We include at most 4 lags of weekly observations in our model (which represents 1 month of maximum lag). The response vairbale is SPY weekly percentage changes, and the additional explanatory variable is percentage change in COVID searches. 


In [8]:
from statsmodels.tsa.stattools import grangercausalitytests

In [40]:
before_mar = grangercausalitytests(spy_covid['2020-01-01':'2020-03-11'][['spy_return%', 'COVID_search_change']].dropna(), maxlag=1)
mar_oct = grangercausalitytests(spy_covid['2020-03-11':'2020-10-31'][['spy_return%', 'COVID_search_change']].dropna(), maxlag=1)
after_oct = grangercausalitytests(spy_covid['2020-10-31':][['spy_return%', 'COVID_search_change']].dropna(), maxlag=1)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=4.4155  , p=0.2828  , df_denom=1, df_num=1
ssr based chi2 test:   chi2=17.6622 , p=0.0000  , df=1
likelihood ratio test: chi2=6.7571  , p=0.0093  , df=1
parameter F test:         F=4.4155  , p=0.2828  , df_denom=1, df_num=1

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=35.2336 , p=0.0000  , df_denom=29, df_num=1
ssr based chi2 test:   chi2=38.8785 , p=0.0000  , df=1
likelihood ratio test: chi2=25.4474 , p=0.0000  , df=1
parameter F test:         F=35.2336 , p=0.0000  , df_denom=29, df_num=1

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.0048  , p=0.9470  , df_denom=7, df_num=1
ssr based chi2 test:   chi2=0.0068  , p=0.9343  , df=1
likelihood ratio test: chi2=0.0068  , p=0.9343  , df=1
parameter F test:         F=0.0048  , p=0.9470  , df_denom=7, df_num=1


In [64]:
display(mar_oct[1][0])

{'ssr_ftest': (35.23362837897905, 1.8998711253296256e-06, 29.0, 1),
 'ssr_chi2test': (38.878486487149296, 4.510239062428552e-10, 1),
 'lrtest': (25.447393603099016, 4.546114185322828e-07, 1),
 'params_ftest': (35.23362837897904, 1.8998711253296324e-06, 29.0, 1.0)}

We can see that from March to October, we have strong t-stats and very small p-value, which concludes that changes in COVID searches Granger cause future stock market changes during this period. For instance, in the below cell we show the Granger test with maximum lag of 1. Under the Chi square test (ssr_chi2test), the test statistics is significant (38.88) and p-value is small (4.51e-10)

In [62]:
print(
    "Before March:\n",
    before_mar[1][1][1].summary(),
    "\n\n\nMarch to October:\n",
    mar_oct[1][1][1].summary(),
    "\n\n\nAfter October:\n",
    after_oct[1][1][1].summary())

Before March:
                             OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.816
Model:                            OLS   Adj. R-squared:                  0.447
Method:                 Least Squares   F-statistic:                     2.214
Date:                Sun, 31 Jan 2021   Prob (F-statistic):              0.429
Time:                        18:16:17   Log-Likelihood:                -7.9274
No. Observations:                   4   AIC:                             21.85
Df Residuals:                       1   BIC:                             20.01
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.5397      0.482     

  "anyway, n=%i" % int(n))


We can also find the tstats of the coefficients used in the underlying augmented regression. For **March to October**, after controlling for past values of SPY changes ($x1$: -3.2), the loading on past COVID search change is statistically significant ($x2$: -5.9)