# statsmodels/statsmodels

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
474 lines (314 sloc) 13.4 KB
.. module:: statsmodels.tsa.vector_ar.var_model
:synopsis: Vector autoregressions


.. currentmodule:: statsmodels.tsa.vector_ar.var_model



## Vector Autoregressions :mod:tsa.vector_ar

:mod:statsmodels.tsa.vector_ar contains methods that are useful for simultaneously modeling and analyzing multiple time series using :ref:Vector Autoregressions (VAR) <var> and :ref:Vector Error Correction Models (VECM) <vecm>.

### VAR(p) processes

We are interested in modeling a T \times K multivariate time series Y, where T denotes the number of observations and K the number of variables. One way of estimating relationships between the time series and their lagged values is the vector autoregression process:

Y_t = A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t

u_t \sim {\sf Normal}(0, \Sigma_u)


where A_i is a K \times K coefficient matrix.

We follow in large part the methods and notation of Lutkepohl (2005), which we will not develop here.

#### Model fitting

Note

The classes referenced below are accessible via the :mod:statsmodels.tsa.api module.

To estimate a VAR model, one must first create the model using an ndarray of homogeneous or structured dtype. When using a structured or record array, the class will use the passed variable names. Otherwise they can be passed explicitly:

.. ipython:: python
:suppress:

import pandas as pd
pd.options.display.max_rows = 10
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')


.. ipython:: python
:okwarning:

# some example data
import numpy as np
import pandas
import statsmodels.api as sm
from statsmodels.tsa.api import VAR, DynamicVAR

# prepare the dates index
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
from statsmodels.tsa.base.datetools import dates_from_str
quarterly = dates_from_str(quarterly)

mdata = mdata[['realgdp','realcons','realinv']]
mdata.index = pandas.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()

# make a VAR model
model = VAR(data)



Note

The :class:VAR class assumes that the passed time series are stationary. Non-stationary or trending data can often be transformed to be stationary by first-differencing or some other method. For direct analysis of non-stationary time series, a standard stable VAR(p) model is not appropriate.

To actually do the estimation, call the fit method with the desired lag order. Or you can have the model select a lag order based on a standard information criterion (see below):

.. ipython:: python

results = model.fit(2)

results.summary()



Several ways to visualize the data using matplotlib are available.

Plotting input time series:

.. ipython:: python

@savefig var_plot_input.png
results.plot()



Plotting time series autocorrelation function:

.. ipython:: python

@savefig var_plot_acorr.png
results.plot_acorr()



#### Lag order selection

Choice of lag order can be a difficult problem. Standard analysis employs likelihood test or information criteria-based order selection. We have implemented the latter, accessible through the :class:VAR class:

.. ipython:: python

model.select_order(15)



When calling the fit function, one can pass a maximum number of lags and the order criterion to use for order selection:

.. ipython:: python

results = model.fit(maxlags=15, ic='aic')



#### Forecasting

The linear predictor is the optimal h-step ahead forecast in terms of mean-squared error:

y_t(h) = \nu + A_1 y_t(h - 1) + \cdots + A_p y_t(h - p)


We can use the forecast function to produce this forecast. Note that we have to specify the "initial value" for the forecast:

.. ipython:: python

lag_order = results.k_ar
results.forecast(data.values[-lag_order:], 5)



The forecast_interval function will produce the above forecast along with asymptotic standard errors. These can be visualized using the plot_forecast function:

.. ipython:: python

@savefig var_forecast.png
results.plot_forecast(10)



#### Class Reference

.. module:: statsmodels.tsa.vector_ar
:synopsis: Vector autoregressions and related tools


.. currentmodule:: statsmodels.tsa.vector_ar


.. autosummary::
:toctree: generated/

var_model.VAR
var_model.VARProcess
var_model.VARResults



#### Post-estimation Analysis

Several process properties and additional results after estimation are available for vector autoregressive processes.

.. autosummary::
:toctree: generated/

var_model.LagOrderResults
hypothesis_test_results.HypothesisTestResults
hypothesis_test_results.NormalityTestResults
hypothesis_test_results.WhitenessTestResults



### Impulse Response Analysis

Impulse responses are of interest in econometric studies: they are the estimated responses to a unit impulse in one of the variables. They are computed in practice using the MA(\infty) representation of the VAR(p) process:

Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}


We can perform an impulse response analysis by calling the irf function on a VARResults object:

.. ipython:: python

irf = results.irf(10)



These can be visualized using the plot function, in either orthogonalized or non-orthogonalized form. Asymptotic standard errors are plotted by default at the 95% significance level, which can be modified by the user.

Note

Orthogonalization is done using the Cholesky decomposition of the estimated error covariance matrix \hat \Sigma_u and hence interpretations may change depending on variable ordering.

.. ipython:: python

@savefig var_irf.png
irf.plot(orth=False)



Note the plot function is flexible and can plot only variables of interest if so desired:

.. ipython:: python

@savefig var_realgdp.png
irf.plot(impulse='realgdp')



The cumulative effects \Psi_n = \sum_{i=0}^n \Phi_i can be plotted with the long run effects as follows:

.. ipython:: python

@savefig var_irf_cum.png
irf.plot_cum_effects(orth=False)



#### Reference

.. autosummary::
:toctree: generated/

irf.IRAnalysis



### Forecast Error Variance Decomposition (FEVD)

Forecast errors of component j on k in an i-step ahead forecast can be decomposed using the orthogonalized impulse responses \Theta_i:

\omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h)

\mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j


These are computed via the fevd function up through a total number of steps ahead:

.. ipython:: python

fevd = results.fevd(5)

fevd.summary()



They can also be visualized through the returned :class:FEVD object:

.. ipython:: python

@savefig var_fevd.png
results.fevd(20).plot()



#### Reference

.. autosummary::
:toctree: generated/

var_model.FEVD



### Statistical tests

A number of different methods are provided to carry out hypothesis tests about the model results and also the validity of the model assumptions (normality, whiteness / "iid-ness" of errors, etc.).

#### Granger causality

One is often interested in whether a variable or group of variables is "causal" for another variable, for some definition of "causal". In the context of VAR models, one can say that a set of variables are Granger-causal within one of the VAR equations. We will not detail the mathematics or definition of Granger causality, but leave it to the reader. The :class:VARResults object has the test_causality method for performing either a Wald (\chi^2) test or an F-test.

.. ipython:: python

results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')



### Dynamic Vector Autoregressions

Note

To use this functionality, pandas must be installed. See the pandas documentation for more information on the below data structures.

One is often interested in estimating a moving-window regression on time series data for the purposes of making forecasts throughout the data sample. For example, we may wish to produce the series of 2-step-ahead forecasts produced by a VAR(p) model estimated at each point in time.

.. ipython:: python

np.random.seed(1)
import pandas.util.testing as ptest
ptest.N = 500
data = ptest.makeTimeDataFrame().cumsum(0)
data

var = DynamicVAR(data, lag_order=2, window_type='expanding')



The estimated coefficients for the dynamic model are returned as a :class:pandas.Panel object, which can allow you to easily examine, for example, all of the model coefficients by equation or by date:

.. ipython:: python
:okwarning:

import datetime as dt

var.coefs

# all estimated coefficients for equation A
var.coefs.minor_xs('A').info()

# coefficients on 11/30/2001
var.coefs.major_xs(dt.datetime(2001, 11, 30)).T



Dynamic forecasts for a given number of steps ahead can be produced using the forecast function and return a :class:pandas.DataMatrix object:

.. ipython:: python

var.forecast(2)



The forecasts can be visualized using plot_forecast:

.. ipython:: python

@savefig dvar_forecast.png
var.plot_forecast(2)



#### Reference

.. autosummary::
:toctree: generated/

hypothesis_test_results.HypothesisTestResults
hypothesis_test_results.CausalityTestResults
hypothesis_test_results.NormalityTestResults
hypothesis_test_results.WhitenessTestResults



### Vector Error Correction Models (VECM)

Vector Error Correction Models are used to study short-run deviations from one or more permanent stochastic trends (unit roots). A VECM models the difference of a vector of time series by imposing structure that is implied by the assumed number of stochastic trends. :class:VECM is used to specify and estimate these models.

A VECM(k_{ar}-1) has the following form

\Delta y_t = \Pi y_{t-1} + \Gamma_1 \Delta y_{t-1} + \ldots
+ \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + u_t


where

\Pi = \alpha \beta'


as described in chapter 7 of [1].

A VECM(k_{ar} - 1) with deterministic terms has the form

\Delta y_t = \alpha \begin{pmatrix}\beta' & \eta'\end{pmatrix} \begin{pmatrix}y_{t-1} \\
D^{co}_{t-1}\end{pmatrix} + \Gamma_1 \Delta y_{t-1} + \dots + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + C D_t + u_t.


In D^{co}_{t-1} we have the deterministic terms which are inside the cointegration relation (or restricted to the cointegration relation). \eta is the corresponding estimator. To pass a deterministic term inside the cointegration relation, we can use the exog_coint argument. For the two special cases of an intercept and a linear trend there exists a simpler way to declare these terms: we can pass "ci" and "li" respectively to the deterministic argument. So for an intercept inside the cointegration relation we can either pass "ci" as deterministic or np.ones(len(data)) as exog_coint if data is passed as the endog argument. This ensures that D_{t-1}^{co} = 1 for all t.

We can also use deterministic terms outside the cointegration relation. These are defined in D_t in the formula above with the corresponding estimators in the matrix C. We specify such terms by passing them to the exog argument. For an intercept and/or linear trend we again have the possibility to use deterministic alternatively. For an intercept we pass "co" and for a linear trend we pass "lo" where the o stands for outside.

The following table shows the five cases considered in [2]. The last column indicates which string to pass to the deterministic argument for each of these cases.

Case Intercept Slope of the linear trend deterministic
I 0 0 "nc"
II - \alpha \beta^T \mu 0 "ci"
III \neq 0 0 "co"
IV \neq 0 - \alpha \beta^T \gamma "coli"
V \neq 0 \neq 0 "colo"

#### Reference

.. autosummary::
:toctree: generated/

vecm.VECM
vecm.coint_johansen
vecm.select_order
vecm.select_coint_rank
vecm.VECMResults
vecm.CointRankResults


##### References
 [1] Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. Springer.
 [2] Johansen, S. 1995. Likelihood-Based Inference in Cointegrated * *Vector Autoregressive Models. Oxford University Press.