# Granger Causality Test

(Sourced from: https://www.machinelearningplus.com/time-series/time-series-analysis-python/)

Granger causality test is used to determine if one time series will be useful to forecast another.

- It is based on the idea that if X causes Y, then the forecast of Y based on previous values of Y AND the previous values of X should outperform the forecast of Y based on previous values of Y alone.

- So, understand that Granger causality should not be used to test if a lag of Y causes Y. Instead, it is generally used on exogenous (not Y lag) variables only.

- Its mathematical formulation is based on linear regression modeling of stochastic processes (Granger 1969). More complex extensions to nonlinear cases exist, however these extensions are often more difficult to apply in practice.

It is nicely implemented in the `statsmodel` package.

- It accepts a 2D array with 2 columns as the main argument. The values are in the first column and the predictor (X) is in the second column.

- **The Null hypothesis is: the series in the second column, does not Granger cause the series in the first.** If the P-Values are less than a significance level (0.05) then you reject the null hypothesis and conclude that the said lag of X is indeed useful.

- The second argument `maxlag` says till how many lags of Y should be included in the test.

In [1]:
# import libraries
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests

  import pandas.util.testing as tm


In [2]:
# load data
url = "https://raw.githubusercontent.com/selva86/datasets/master/a10.csv"
df = pd.read_csv(url, parse_dates=["date"])
df.loc[:, "month"] = df.loc[:, "date"].dt.month
df

Unnamed: 0,date,value,month
0,1991-07-01,3.526591,7
1,1991-08-01,3.180891,8
2,1991-09-01,3.252221,9
3,1991-10-01,3.611003,10
4,1991-11-01,3.565869,11
...,...,...,...
199,2008-02-01,21.654285,2
200,2008-03-01,18.264945,3
201,2008-04-01,23.107677,4
202,2008-05-01,22.912510,5


In [3]:
# Granger Causality Test
grangercausalitytests(df.loc[:, ["value", "month"]], maxlag=2)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=54.7797 , p=0.0000  , df_denom=200, df_num=1
ssr based chi2 test:   chi2=55.6014 , p=0.0000  , df=1
likelihood ratio test: chi2=49.1426 , p=0.0000  , df=1
parameter F test:         F=54.7797 , p=0.0000  , df_denom=200, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=162.6989, p=0.0000  , df_denom=197, df_num=2
ssr based chi2 test:   chi2=333.6567, p=0.0000  , df=2
likelihood ratio test: chi2=196.9956, p=0.0000  , df=2
parameter F test:         F=162.6989, p=0.0000  , df_denom=197, df_num=2


{1: ({'lrtest': (49.14260233004984, 2.38014300604565e-12, 1),
   'params_ftest': (54.77967483557343, 3.661425871353366e-12, 200.0, 1.0),
   'ssr_chi2test': (55.60136995810727, 8.876175235021185e-14, 1),
   'ssr_ftest': (54.779674835573665, 3.661425871352945e-12, 200.0, 1)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7fefc53a76d0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7fefc1997710>,
   array([[0., 1., 0.]])]),
 2: ({'lrtest': (196.9955927718221, 1.670900349911483e-43, 2),
   'params_ftest': (162.6989179987324, 1.9133235086856426e-42, 197.0, 2.0),
   'ssr_chi2test': (333.65666432227357, 3.5267600881278635e-73, 2),
   'ssr_ftest': (162.6989179987324, 1.9133235086856426e-42, 197.0, 2)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7fefb0bf0d10>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7fefc1997950>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])])}

In the above case, the P-Values are Zero for all tests. So the ‘month’ indeed can be used to forecast the Air Passengers.