# Pre-modeling tests for Global Vector Autoregression (GVAR) framework

## 1. Granger causality
Granger causality in time series is based on the following research hypotheses:
- An effect can only occur after a cause
- Knowledge of the cause improves the prediction of the effect
This set of hypotheses can be tested by computing the forecast error variance in (a) the case where all time series are included in the prediction and (b) the case where the supposed causal time series is left out. If leaving out the causal time series increases the variance of the forecast error, then the left-out time series can be said to be Granger-causal of the other one. The statistical significance of the comparison can be measured via the F-statistic. (See [Stokes and Purdon, 2017)][1]).

Also, keep in mind that the causality test needs to be done in concert with stationarity tests. It is possible that time-series that have not been appropriately differenced will not provide enough evidence to fail to reject the null hypothesis (that there is no Granger causality).

One approach is to use the `causality` function in the `vars` library to demonstrate Granger causality testing. Further documentation on the function is available [here][2]. Another is to use the `grangers` [library][3].

Given the above definition, a prediction has to be made before the Granger causality test can be run. Thus, any approach would require estimating a VAR model first. The relevant functions perform this automatically, however. It makes sense, though, to test for cointegration and stationarity (and adjust or transform the time series accordingly) before conducting the Granger causality test.

[1]: https://www.pnas.org/content/114/34/E7063
[2]: https://rdrr.io/cran/vars/man/causality.html
[3]: https://github.com/MatFar88/grangers

In [2]:
library(vars)
library(reshape2)

Loading required package: MASS

Loading required package: strucchange

Loading required package: zoo


Attaching package: ‘zoo’


The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric


Loading required package: sandwich

Loading required package: urca

Loading required package: lmtest



In [4]:
df <- read.csv('../../data/tidy/cases_mobility_activity.csv') #note that file path may change

## PREPROCESSING STEPS IN PREVIOUS FORMAT
df <- df[, c(2, 4, 8:191)] #note that this will be irrelevant once we are working with a clean dataframe

melted.df <- melt(df, id.vars = c('region', 'transportation_type'))
m.data <- dcast(melted.df, region + variable~transportation_type)
colnames(m.data) = c('Country', 'Date', 'cov', 'car', 'groc', 'parks', 'home', 'reta', 'tran', 'tstop', 'walk', 'work' )

#m.data$Country <- as.factor(m.data$Country)
head(m.data)
#Convert numbers to numeric
for (i in seq(3, ncol(m.data) ) ) {
    m.data[,i] = as.numeric(m.data[,i], na.pass=TRUE)
}

# Correct Google (add 100 to baseline)
m.data[,c( 'groc', 'parks', 'home', 'reta',  'tstop', 'work' )] = m.data[,c( 'groc', 'parks', 'home', 'reta',  'tstop', 'work' )] + 100

# Remove selected variables
endovars <-  c( 'cov', 'home', 'tstop', 'work' )
m.data <- subset(m.data, select = c('Country', endovars) )
#m.data <- na.omit(m.data)

Unnamed: 0_level_0,Country,Date,cov,car,groc,parks,home,reta,tran,tstop,walk,work
Unnamed: 0_level_1,<chr>,<fct>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,Albania,X1.13.2020,0,100.0,,,,,,,100.0,
2,Albania,X1.14.2020,0,95.3,,,,,,,100.68,
3,Albania,X1.15.2020,0,101.43,,,,,,,98.93,
4,Albania,X1.16.2020,0,97.2,,,,,,,98.46,
5,Albania,X1.17.2020,0,103.55,,,,,,,100.85,
6,Albania,X1.18.2020,0,112.67,,,,,,,100.13,


“NAs introduced by coercion”


In [5]:
head(m.data)

Unnamed: 0_level_0,Country,cov,home,tstop,work
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
1,Albania,0,,,
2,Albania,0,,,
3,Albania,0,,,
4,Albania,0,,,
5,Albania,0,,,
6,Albania,0,,,


In [6]:
df.Arg = m.data[m.data$Country == 'Argentina',]
rownames(df.Arg) <- 1:nrow(df.Arg) #renumber rows
df.Arg <- subset(df.Arg, select = -Country) #remove Country column
df.Arg[2:184,c("cov")] = diff(df.Arg[1:184,c("cov")]) # first difference the covid cases (since these are cumulative)
# df.Arg[rowSums(is.na(df.Arg)) > 0,] # find where the NAs are (remove for this test)
# you can see here where smoothing helps; simply removing NAs will skip time rows
# we see that there are no NAs between rows 34 and 181

df.Arg = df.Arg[35:100,]
df.Arg[2:66,c("cov")] = diff(df.Arg[1:66,c("cov")]) # difference again for stationarity, perhaps.
df.Arg[2:66,c("home")] = diff(df.Arg[1:66,c("home")]) # difference again for stationarity, perhaps.
df.Arg[2:66,c("tstop")] = diff(df.Arg[1:66,c("tstop")]) # difference again for stationarity, perhaps.
df.Arg[2:66,c("work")] = diff(df.Arg[1:66,c("work")]) # difference again for stationarity, perhaps.

#df.Arg = log(df.Arg) #take logs

In [7]:
diff(df.Arg[1:30,'work'])

In [130]:
df.Arg[rowSums(is.na(df.Arg)) > 0,] # Check again for NAs

cov,home,tstop,work
<dbl>,<dbl>,<dbl>,<dbl>


In [12]:
var.2c <- VAR(df.Arg, p = 10, type = "const")
res <- causality(var.2c, cause = "cov")

#use a robust HC variance-covariance matrix for the Granger test:
causality(var.2c, cause = "cov", vcov.=vcovHC(var.2c))

#use a wild-bootstrap procedure to for the Granger test
## Not run: causality(var.2c, cause = "cov", boot=TRUE, boot.runs=1000)

$Granger

	Granger causality H0: cov do not Granger-cause home tstop work

data:  VAR object var.2c
F-Test = 0.39345, df1 = 30, df2 = 60, p-value = 0.9967


$Instant

	H0: No instantaneous causality between: cov and home tstop work

data:  VAR object var.2c
Chi-squared = 8.7271, df = 3, p-value = 0.03315



In [30]:
res$Granger[1]

0
0.9106098


High p-values imply no Granger causality.

## 2. Cointegration

## 3. Stationarity

## 4. Smoothing (kernel transformations)
The purpose of smoothing is to remove extreme points, zeros and other shifts that might introduce unnecessary noise in the model.