### Time series analyses IIb (2014)
Testing for Granger causality.

**References**: 
- https://stats.stackexchange.com/questions/160278/testing-for-granger-causality
- https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.grangercausalitytests.html
- https://www.machinelearningplus.com/time-series/time-series-analysis-python/
- https://www.int-res.com/abstracts/meps/v318/p187-201/

I'm testing the significance of lags between each of Group A:
1. nsmz (small zooplankton)
2. nmdz (medium zooplankton)
3. nlgz (large zooplankton)

...and Group B: 
1. no3 (nitrate)
2. po4 (phosphate)
3. sio4 (silicate)
4. nsm (small phytoplankton)
5. nlg (large phytoplankton)

The data called here is sourced from the semi-daily output of COBALT. Lags up to 10 (5 days) will be tested between each pair. The data is in 2-D array format. 

#### Import modules

In [1]:
from statsmodels.tsa.stattools import grangercausalitytests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil import rrule, parser
from statsmodels.tsa.stattools import adfuller
from random import sample 
%matplotlib inline

#### Import CSVs

In [2]:
%cd /home/lindsay/hioekg-compare-years/

nsmz_df_2014=pd.read_csv('nsmz_semidaily_df_2014.csv')
nmdz_df_2014=pd.read_csv('nmdz_semidaily_df_2014.csv')
nlgz_df_2014=pd.read_csv('nlgz_semidaily_df_2014.csv')
nsm_df_2014=pd.read_csv('nsm_semidaily_df_2014.csv')
nlg_df_2014=pd.read_csv('nlg_semidaily_df_2014.csv')
no3_df_2014=pd.read_csv('no3_semidaily_df_2014.csv')
po4_df_2014=pd.read_csv('po4_semidaily_df_2014.csv')
sio4_df_2014=pd.read_csv('sio4_semidaily_df_2014.csv')

/home/lindsay/hioekg-compare-years


All series are stationary based on the tests conducted in the IIa notebook.

#### Create dfs for each predictor

In [3]:
no32014 = no3_df_2014.concentration
po42014 = po4_df_2014.concentration
sio42014 = sio4_df_2014.concentration
nsm2014 = nsm_df_2014.concentration
nlg2014 = nlg_df_2014.concentration

### nsmz GC tests: 2014

In [4]:
df = nsmz_df_2014.concentration

# Concatenate all dfs...
no3concat2014 = [no3_df_2014.iloc[:,3],df]
po4concat2014 = [po4_df_2014.iloc[:,3],df]
sio4concat2014 = [sio4_df_2014.iloc[:,3],df]
nsmconcat2014 = [nsm_df_2014.iloc[:,3],df]
nlgconcat2014 = [nlg_df_2014.iloc[:,3],df]

# And again
no32014 = pd.concat(no3concat2014, axis=1)
po42014 = pd.concat(po4concat2014, axis=1)
sio42014 = pd.concat(sio4concat2014, axis=1)
nsm2014 = pd.concat(nsmconcat2014, axis=1)
nlg2014 = pd.concat(nlgconcat2014, axis=1)

# Replace all negative values with NaNs
no32014=no32014.assign(concentration = no32014.concentration.where(no32014.concentration.ge(0)))
po42014=po42014.assign(concentration = po42014.concentration.where(po42014.concentration.ge(0)))
sio42014=sio42014.assign(concentration = sio42014.concentration.where(sio42014.concentration.ge(0)))
nsm2014=nsm2014.assign(concentration = nsm2014.concentration.where(nsm2014.concentration.ge(0)))
nlg2014=nlg2014.assign(concentration = nlg2014.concentration.where(nlg2014.concentration.ge(0)))

# Rename cols
no32014.columns = ['no3','nsmz']
po42014.columns = ['po4','nsmz']
sio42014.columns = ['sio4','nsmz']
nsm2014.columns = ['nsm','nsmz']
nlg2014.columns = ['nlg','nsmz']

#### Group into half day increments
Each half day frame is of len=10076. I want to group these measurements into semi-daily means. 

In [5]:
# Group by half day mean increments
no32014=no32014.groupby(np.arange(len(no32014.index))//10076).mean()
po42014=po42014.groupby(np.arange(len(po42014.index))//10076).mean()
sio42014=sio42014.groupby(np.arange(len(sio42014.index))//10076).mean()
nsm2014=nsm2014.groupby(np.arange(len(nsm2014.index))//10076).mean()
nlg2014=nlg2014.groupby(np.arange(len(nlg2014.index))//10076).mean()

In [6]:
# Drop any remaining NaNs
no32014=no32014.dropna()
po42014=po42014.dropna()
sio42014=sio42014.dropna()
nsm2014=nsm2014.dropna()
nlg2014=nlg2014.dropna()

In [7]:
grangercausalitytests(no32014[['nsmz', 'no3']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.9214  , p=0.3374  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=0.9252  , p=0.3361  , df=1
likelihood ratio test: chi2=0.9246  , p=0.3363  , df=1
parameter F test:         F=0.9214  , p=0.3374  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=0.2545  , p=0.7753  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=0.5127  , p=0.7739  , df=2
likelihood ratio test: chi2=0.5125  , p=0.7740  , df=2
parameter F test:         F=0.2545  , p=0.7753  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=1.5002  , p=0.2132  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=4.5450  , p=0.2083  , df=3
likelihood ratio test: chi2=4.5306  , p=0.2096  , df=3
parameter F test:         F=1.5002  , p=0.2132  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=0.7641  , p=0.5488  

{1: ({'ssr_ftest': (0.9213666246689326, 0.3374404184817288, 716.0, 1),
   'ssr_chi2test': (0.9252270993532995, 0.3361062392385677, 1),
   'lrtest': (0.9246323069273785, 0.33626161211633565, 1),
   'params_ftest': (0.9213666246689387, 0.3374404184817288, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2abcb00>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2abcf60>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (0.2545423458740267, 0.7753416430619682, 713.0, 2),
   'ssr_chi2test': (0.512654710624267, 0.7738885907810003, 2),
   'lrtest': (0.51247177899495, 0.7739593783686839, 2),
   'params_ftest': (0.2545423451959132, 0.7753416435873771, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbe037eac8>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a2a358>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'ssr_ftest': (1.5002033647

In [8]:
grangercausalitytests(po42014[['nsmz', 'po4']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=4.1833  , p=0.0412  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=4.2009  , p=0.0404  , df=1
likelihood ratio test: chi2=4.1886  , p=0.0407  , df=1
parameter F test:         F=4.1833  , p=0.0412  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=2.9122  , p=0.0550  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=5.8652  , p=0.0533  , df=2
likelihood ratio test: chi2=5.8413  , p=0.0539  , df=2
parameter F test:         F=2.9122  , p=0.0550  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=4.4186  , p=0.0043  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=13.3865 , p=0.0039  , df=3
likelihood ratio test: chi2=13.2630 , p=0.0041  , df=3
parameter F test:         F=4.4186  , p=0.0043  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=3.0377  , p=0.0169  

{1: ({'ssr_ftest': (4.183327018306233, 0.04118848678550117, 716.0, 1),
   'ssr_chi2test': (4.200854924807516, 0.04040360516286974, 1),
   'lrtest': (4.188630484932219, 0.0406959602665086, 1),
   'params_ftest': (4.1833270183062075, 0.041188486785500786, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2ab6438>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a015f8>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (2.9121601178681185, 0.055005057618120086, 713.0, 2),
   'ssr_chi2test': (5.865163996155145, 0.05325934493365864, 2),
   'lrtest': (5.841338124660979, 0.0538976143707038, 2),
   'params_ftest': (2.9121601121644796, 0.0550050579293044, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a13048>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a13128>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'ssr_ftest': (4.418593

In [9]:
grangercausalitytests(sio42014[['nsmz', 'sio4']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=3.7014  , p=0.0548  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=3.7169  , p=0.0539  , df=1
likelihood ratio test: chi2=3.7073  , p=0.0542  , df=1
parameter F test:         F=3.7014  , p=0.0548  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=2.7880  , p=0.0622  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=5.6151  , p=0.0604  , df=2
likelihood ratio test: chi2=5.5933  , p=0.0610  , df=2
parameter F test:         F=2.7880  , p=0.0622  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=3.6065  , p=0.0132  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=10.9261 , p=0.0121  , df=3
likelihood ratio test: chi2=10.8437 , p=0.0126  , df=3
parameter F test:         F=3.6065  , p=0.0132  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=2.8196  , p=0.0244  

{1: ({'ssr_ftest': (3.7013854439044938, 0.05476341629953282, 716.0, 1),
   'ssr_chi2test': (3.716894042133144, 0.053864480267573636, 1),
   'lrtest': (3.7073197217396228, 0.05417431831500656, 1),
   'params_ftest': (3.701385443904464, 0.054763416299538965, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a17128>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2a1f320>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (2.7879965764376533, 0.062215484401581804, 713.0, 2),
   'ssr_chi2test': (5.61509548915073, 0.06035281158268692, 2),
   'lrtest': (5.593252959231904, 0.06101555302918831, 2),
   'params_ftest': (2.7879965727427756, 0.062215484629676895, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29a4518>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29a4940>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'ssr_ftest': (3.6

In [10]:
grangercausalitytests(nsm2014[['nsmz', 'nsm']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=28.5502 , p=0.0000  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=28.6698 , p=0.0000  , df=1
likelihood ratio test: chi2=28.1130 , p=0.0000  , df=1
parameter F test:         F=28.5502 , p=0.0000  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=14.4967 , p=0.0000  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=29.1967 , p=0.0000  , df=2
likelihood ratio test: chi2=28.6186 , p=0.0000  , df=2
parameter F test:         F=14.4967 , p=0.0000  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=6.2137  , p=0.0004  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=18.8249 , p=0.0003  , df=3
likelihood ratio test: chi2=18.5820 , p=0.0003  , df=3
parameter F test:         F=6.2137  , p=0.0004  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=4.8792  , p=0.0007  

{1: ({'ssr_ftest': (28.550183991100614, 1.2281773351248224e-07, 716.0, 1),
   'ssr_chi2test': (28.669807667040978, 8.58315445045704e-08, 1),
   'lrtest': (28.112964129566535, 1.1443671851130588e-07, 1),
   'params_ftest': (28.550183991100464, 1.2281773351249685e-07, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29a41d0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29b36a0>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (14.49666704086768, 6.742424302843667e-07, 713.0, 2),
   'ssr_chi2test': (29.19665339507151, 4.5711689201886073e-07, 2),
   'lrtest': (28.618646398692363, 6.1029511642932e-07, 2),
   'params_ftest': (14.496667036970132, 6.742424328095763e-07, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29b9898>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29b9cc0>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'

In [11]:
grangercausalitytests(nlg2014[['nsmz', 'nlg']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.7051  , p=0.4014  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=0.7080  , p=0.4001  , df=1
likelihood ratio test: chi2=0.7077  , p=0.4002  , df=1
parameter F test:         F=0.7051  , p=0.4014  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=1.3475  , p=0.2606  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=2.7139  , p=0.2574  , df=2
likelihood ratio test: chi2=2.7088  , p=0.2581  , df=2
parameter F test:         F=1.3475  , p=0.2606  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=2.4295  , p=0.0642  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=7.3603  , p=0.0613  , df=3
likelihood ratio test: chi2=7.3227  , p=0.0623  , df=3
parameter F test:         F=2.4295  , p=0.0642  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=1.2109  , p=0.3048  

{1: ({'ssr_ftest': (0.7050871194276964, 0.4013598912415166, 716.0, 1),
   'ssr_chi2test': (0.7080413950677565, 0.400094789030334, 1),
   'lrtest': (0.707692998865241, 0.40021074581952687, 1),
   'params_ftest': (0.705087119427691, 0.4013598912415166, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29c0828>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29cb860>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (1.3474913324601796, 0.26055244297160257, 713.0, 2),
   'ssr_chi2test': (2.7138815615888046, 0.2574471607786841, 2),
   'lrtest': (2.708765512503305, 0.25810655995449056, 2),
   'params_ftest': (1.3474913332448135, 0.2605524427679356, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29cbc18>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb29cbcf8>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'ssr_ftest': (2.429468727

### nmdz GC tests: 2014

In [12]:
df = nmdz_df_2014.concentration

# Concatenate all dfs...
no3concat2014 = [no3_df_2014.iloc[:,3],df]
po4concat2014 = [po4_df_2014.iloc[:,3],df]
sio4concat2014 = [sio4_df_2014.iloc[:,3],df]
nsmconcat2014 = [nsm_df_2014.iloc[:,3],df]
nlgconcat2014 = [nlg_df_2014.iloc[:,3],df]

# And again
no32014 = pd.concat(no3concat2014, axis=1)
po42014 = pd.concat(po4concat2014, axis=1)
sio42014 = pd.concat(sio4concat2014, axis=1)
nsm2014 = pd.concat(nsmconcat2014, axis=1)
nlg2014 = pd.concat(nlgconcat2014, axis=1)

# Replace all negative values with NaNs
no32014=no32014.assign(concentration = no32014.concentration.where(no32014.concentration.ge(0)))
po42014=po42014.assign(concentration = po42014.concentration.where(po42014.concentration.ge(0)))
sio42014=sio42014.assign(concentration = sio42014.concentration.where(sio42014.concentration.ge(0)))
nsm2014=nsm2014.assign(concentration = nsm2014.concentration.where(nsm2014.concentration.ge(0)))
nlg2014=nlg2014.assign(concentration = nlg2014.concentration.where(nlg2014.concentration.ge(0)))

# Rename cols
no32014.columns = ['no3','nmdz']
po42014.columns = ['po4','nmdz']
sio42014.columns = ['sio4','nmdz']
nsm2014.columns = ['nsm','nmdz']
nlg2014.columns = ['nlg','nmdz']

# Group by half day mean increments
no32014=no32014.groupby(np.arange(len(no32014.index))//10076).mean()
po42014=po42014.groupby(np.arange(len(po42014.index))//10076).mean()
sio42014=sio42014.groupby(np.arange(len(sio42014.index))//10076).mean()
nsm2014=nsm2014.groupby(np.arange(len(nsm2014.index))//10076).mean()
nlg2014=nlg2014.groupby(np.arange(len(nlg2014.index))//10076).mean()

# Drop any remaining NaNs
no32014=no32014.dropna()
po42014=po42014.dropna()
sio42014=sio42014.dropna()
nsm2014=nsm2014.dropna()
nlg2014=nlg2014.dropna()

In [17]:
#grangercausalitytests(no32014[['nmdz', 'no3']], maxlag=10)
#grangercausalitytests(po42014[['nmdz', 'po4']], maxlag=10)
#grangercausalitytests(sio42014[['nmdz', 'sio4']], maxlag=10)
#grangercausalitytests(nsm2014[['nmdz', 'nsm']], maxlag=10)
grangercausalitytests(nlg2014[['nmdz', 'nlg']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=25.1682 , p=0.0000  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=25.2736 , p=0.0000  , df=1
likelihood ratio test: chi2=24.8396 , p=0.0000  , df=1
parameter F test:         F=25.1682 , p=0.0000  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=29.5361 , p=0.0000  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=59.4864 , p=0.0000  , df=2
likelihood ratio test: chi2=57.1504 , p=0.0000  , df=2
parameter F test:         F=29.5361 , p=0.0000  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=13.0452 , p=0.0000  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=39.5214 , p=0.0000  , df=3
likelihood ratio test: chi2=38.4706 , p=0.0000  , df=3
parameter F test:         F=13.0452 , p=0.0000  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=7.2720  , p=0.0000  

{1: ({'ssr_ftest': (25.16817888385056, 6.633682352264831e-07, 716.0, 1),
   'ssr_chi2test': (25.27363214733038, 4.974630228074723e-07, 1),
   'lrtest': (24.839576746104285, 6.230502984590504e-07, 1),
   'params_ftest': (25.16817888385064, 6.633682352264831e-07, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2932d68>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2949a90>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (29.536092263560157, 4.746503036679404e-13, 713.0, 2),
   'ssr_chi2test': (59.486435470508255, 1.2097194505413717e-13, 2),
   'lrtest': (57.15037927218145, 3.890028845240831e-13, 2),
   'params_ftest': (29.536092187660074, 4.74650336937542e-13, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2949e80>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb2949da0>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: ({'ssr

### nlgz GC tests: 2014

In [18]:
df = nlgz_df_2014.concentration

# Concatenate all dfs...
no3concat2014 = [no3_df_2014.iloc[:,3],df]
po4concat2014 = [po4_df_2014.iloc[:,3],df]
sio4concat2014 = [sio4_df_2014.iloc[:,3],df]
nsmconcat2014 = [nsm_df_2014.iloc[:,3],df]
nlgconcat2014 = [nlg_df_2014.iloc[:,3],df]

# And again
no32014 = pd.concat(no3concat2014, axis=1)
po42014 = pd.concat(po4concat2014, axis=1)
sio42014 = pd.concat(sio4concat2014, axis=1)
nsm2014 = pd.concat(nsmconcat2014, axis=1)
nlg2014 = pd.concat(nlgconcat2014, axis=1)

# Replace all negative values with NaNs
no32014=no32014.assign(concentration = no32014.concentration.where(no32014.concentration.ge(0)))
po42014=po42014.assign(concentration = po42014.concentration.where(po42014.concentration.ge(0)))
sio42014=sio42014.assign(concentration = sio42014.concentration.where(sio42014.concentration.ge(0)))
nsm2014=nsm2014.assign(concentration = nsm2014.concentration.where(nsm2014.concentration.ge(0)))
nlg2014=nlg2014.assign(concentration = nlg2014.concentration.where(nlg2014.concentration.ge(0)))

# Rename cols
no32014.columns = ['no3','nlgz']
po42014.columns = ['po4','nlgz']
sio42014.columns = ['sio4','nlgz']
nsm2014.columns = ['nsm','nlgz']
nlg2014.columns = ['nlg','nlgz']

# Group by half day mean increments
no32014=no32014.groupby(np.arange(len(no32014.index))//10076).mean()
po42014=po42014.groupby(np.arange(len(po42014.index))//10076).mean()
sio42014=sio42014.groupby(np.arange(len(sio42014.index))//10076).mean()
nsm2014=nsm2014.groupby(np.arange(len(nsm2014.index))//10076).mean()
nlg2014=nlg2014.groupby(np.arange(len(nlg2014.index))//10076).mean()

# Drop any remaining NaNs
no32014=no32014.dropna()
po42014=po42014.dropna()
sio42014=sio42014.dropna()
nsm2014=nsm2014.dropna()
nlg2014=nlg2014.dropna()

In [23]:
#grangercausalitytests(no32014[['nlgz', 'no3']], maxlag=10)
#grangercausalitytests(po42014[['nlgz', 'po4']], maxlag=10)
#grangercausalitytests(sio42014[['nlgz', 'sio4']], maxlag=10)
#grangercausalitytests(nsm2014[['nlgz', 'nsm']], maxlag=10)
grangercausalitytests(nlg2014[['nlgz', 'nlg']], maxlag=10)


Granger Causality
number of lags (no zero) 1
ssr based F test:         F=14.4349 , p=0.0002  , df_denom=716, df_num=1
ssr based chi2 test:   chi2=14.4953 , p=0.0001  , df=1
likelihood ratio test: chi2=14.3512 , p=0.0002  , df=1
parameter F test:         F=14.4349 , p=0.0002  , df_denom=716, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=13.1909 , p=0.0000  , df_denom=713, df_num=2
ssr based chi2 test:   chi2=26.5668 , p=0.0000  , df=2
likelihood ratio test: chi2=26.0871 , p=0.0000  , df=2
parameter F test:         F=13.1909 , p=0.0000  , df_denom=713, df_num=2

Granger Causality
number of lags (no zero) 3
ssr based F test:         F=2.2645  , p=0.0797  , df_denom=710, df_num=3
ssr based chi2 test:   chi2=6.8605  , p=0.0765  , df=3
likelihood ratio test: chi2=6.8279  , p=0.0776  , df=3
parameter F test:         F=2.2645  , p=0.0797  , df_denom=710, df_num=3

Granger Causality
number of lags (no zero) 4
ssr based F test:         F=1.8936  , p=0.1097  

{1: ({'ssr_ftest': (14.434852660853533, 0.00015741993719896834, 716.0, 1),
   'ssr_chi2test': (14.495333887086161, 0.0001405071492081263, 1),
   'lrtest': (14.351152574123262, 0.0001516868077912555, 1),
   'params_ftest': (14.434852660852929, 0.00015741993719901477, 716.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb28ad860>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb28bacc0>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (13.190883612230545, 2.369805434090028e-06, 713.0, 2),
   'ssr_chi2test': (26.56677260471678, 1.702545110998454e-06, 2),
   'lrtest': (26.087070237539592, 2.164036957562551e-06, 2),
   'params_ftest': (13.190882665789314, 2.369807596944561e-06, 713.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb28bacf8>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ffbb28c3080>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])]),
 3: (

### Conclusions (for 2014):
1. **nsmz**:
    - **no3** is not significant
    - **po4** is significant between 1.5-3 days
    - **sio4** is significant between 1.5-2.5 days
    - **nsm** is significant up to 5 days
    - **nlg** is not significant
    
2. **nmdz**:
    - **no3** is significant at 1 day 
    - **po4** is significant at 1 day 
    - **sio4** is significant at 1 day 
    - **nsm** is significant up to 1.5 days
    - **nlg** is significant up to 5 days
3. **nlgz**: 
    - **no3** is not significant
    - **po4** is significant > 1 day
    - **sio4** is significant > 1 day
    - **nsm** is significant up to 5 days (though at day 3, the p-values minimally exceed 0.05)
    - **nlg** is significant up to 1 day and at 4.5-5 days

### Conclusions (for 2013):
1. **nsmz**:
    - **no3** is significant up to 3.5 days
    - **po4** is significant up to 5 days
    - **sio4** is significant up to 4.5 days
    - **nsm** is significant up to 5 days
    - **nlg** is significant between 1.5-5 days
    
2. **nmdz**:
    - **no3** is significant up to 0.5 days
    - **po4** is significant up to 1 day
    - **sio4** is significant up to 0.5 days
    - **nsm** is significant up to 3 days
    - **nlg** is significant up to 5 days
3. **nlgz**: 
    - **no3** is significant up to 1 day
    - **po4** is significant >1 day (though at day 2.5 and 4, p-values exceed 0.05 threshold)
    - **sio4** is not significant
    - **nsm** is significant up to 5 days, though at day 4, p-values exceed 0.05 threshold
    - **nlg** is significant up to 5 days