Permalink
Fetching contributors…
Cannot retrieve contributors at this time
474 lines (314 sloc) 13.4 KB
orphan:
.. module:: statsmodels.tsa.vector_ar.var_model
   :synopsis: Vector autoregressions

.. currentmodule:: statsmodels.tsa.vector_ar.var_model

Vector Autoregressions :mod:`tsa.vector_ar`

:mod:`statsmodels.tsa.vector_ar` contains methods that are useful for simultaneously modeling and analyzing multiple time series using :ref:`Vector Autoregressions (VAR) <var>` and :ref:`Vector Error Correction Models (VECM) <vecm>`.

VAR(p) processes

We are interested in modeling a T \times K multivariate time series Y, where T denotes the number of observations and K the number of variables. One way of estimating relationships between the time series and their lagged values is the vector autoregression process:

Y_t = A_1 Y_{t-1} + \ldots + A_p Y_{t-p} + u_t
u_t \sim {\sf Normal}(0, \Sigma_u)

where A_i is a K \times K coefficient matrix.

We follow in large part the methods and notation of Lutkepohl (2005), which we will not develop here.

Model fitting

Note

The classes referenced below are accessible via the :mod:`statsmodels.tsa.api` module.

To estimate a VAR model, one must first create the model using an ndarray of homogeneous or structured dtype. When using a structured or record array, the class will use the passed variable names. Otherwise they can be passed explicitly:

.. ipython:: python
    :suppress:

    import pandas as pd
    pd.options.display.max_rows = 10
    import matplotlib
    import matplotlib.pyplot as plt
    matplotlib.style.use('ggplot')

.. ipython:: python
   :okwarning:

    # some example data
    import numpy as np
    import pandas
    import statsmodels.api as sm
    from statsmodels.tsa.api import VAR, DynamicVAR
    mdata = sm.datasets.macrodata.load_pandas().data

    # prepare the dates index
    dates = mdata[['year', 'quarter']].astype(int).astype(str)
    quarterly = dates["year"] + "Q" + dates["quarter"]
    from statsmodels.tsa.base.datetools import dates_from_str
    quarterly = dates_from_str(quarterly)

    mdata = mdata[['realgdp','realcons','realinv']]
    mdata.index = pandas.DatetimeIndex(quarterly)
    data = np.log(mdata).diff().dropna()

    # make a VAR model
    model = VAR(data)

Note

The :class:`VAR` class assumes that the passed time series are stationary. Non-stationary or trending data can often be transformed to be stationary by first-differencing or some other method. For direct analysis of non-stationary time series, a standard stable VAR(p) model is not appropriate.

To actually do the estimation, call the fit method with the desired lag order. Or you can have the model select a lag order based on a standard information criterion (see below):

.. ipython:: python

    results = model.fit(2)

    results.summary()


Several ways to visualize the data using matplotlib are available.

Plotting input time series:

.. ipython:: python

    @savefig var_plot_input.png
    results.plot()


Plotting time series autocorrelation function:

.. ipython:: python

    @savefig var_plot_acorr.png
    results.plot_acorr()


Lag order selection

Choice of lag order can be a difficult problem. Standard analysis employs likelihood test or information criteria-based order selection. We have implemented the latter, accessible through the :class:`VAR` class:

.. ipython:: python

    model.select_order(15)

When calling the fit function, one can pass a maximum number of lags and the order criterion to use for order selection:

.. ipython:: python

    results = model.fit(maxlags=15, ic='aic')

Forecasting

The linear predictor is the optimal h-step ahead forecast in terms of mean-squared error:

y_t(h) = \nu + A_1 y_t(h - 1) + \cdots + A_p y_t(h - p)

We can use the forecast function to produce this forecast. Note that we have to specify the "initial value" for the forecast:

.. ipython:: python

    lag_order = results.k_ar
    results.forecast(data.values[-lag_order:], 5)

The forecast_interval function will produce the above forecast along with asymptotic standard errors. These can be visualized using the plot_forecast function:

.. ipython:: python

   @savefig var_forecast.png
   results.plot_forecast(10)

Class Reference

.. module:: statsmodels.tsa.vector_ar
   :synopsis: Vector autoregressions and related tools

.. currentmodule:: statsmodels.tsa.vector_ar

.. autosummary::
   :toctree: generated/

   var_model.VAR
   var_model.VARProcess
   var_model.VARResults


Post-estimation Analysis

Several process properties and additional results after estimation are available for vector autoregressive processes.

.. autosummary::
   :toctree: generated/

   var_model.LagOrderResults
   hypothesis_test_results.HypothesisTestResults
   hypothesis_test_results.NormalityTestResults
   hypothesis_test_results.WhitenessTestResults


Impulse Response Analysis

Impulse responses are of interest in econometric studies: they are the estimated responses to a unit impulse in one of the variables. They are computed in practice using the MA(\infty) representation of the VAR(p) process:

Y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}

We can perform an impulse response analysis by calling the irf function on a VARResults object:

.. ipython:: python

    irf = results.irf(10)

These can be visualized using the plot function, in either orthogonalized or non-orthogonalized form. Asymptotic standard errors are plotted by default at the 95% significance level, which can be modified by the user.

Note

Orthogonalization is done using the Cholesky decomposition of the estimated error covariance matrix \hat \Sigma_u and hence interpretations may change depending on variable ordering.

.. ipython:: python

    @savefig var_irf.png
    irf.plot(orth=False)


Note the plot function is flexible and can plot only variables of interest if so desired:

.. ipython:: python

    @savefig var_realgdp.png
    irf.plot(impulse='realgdp')

The cumulative effects \Psi_n = \sum_{i=0}^n \Phi_i can be plotted with the long run effects as follows:

.. ipython:: python

    @savefig var_irf_cum.png
    irf.plot_cum_effects(orth=False)


Reference

.. autosummary::
   :toctree: generated/

   irf.IRAnalysis

Forecast Error Variance Decomposition (FEVD)

Forecast errors of component j on k in an i-step ahead forecast can be decomposed using the orthogonalized impulse responses \Theta_i:

\omega_{jk, i} = \sum_{i=0}^{h-1} (e_j^\prime \Theta_i e_k)^2 / \mathrm{MSE}_j(h)
\mathrm{MSE}_j(h) = \sum_{i=0}^{h-1} e_j^\prime \Phi_i \Sigma_u \Phi_i^\prime e_j

These are computed via the fevd function up through a total number of steps ahead:

.. ipython:: python

    fevd = results.fevd(5)

    fevd.summary()

They can also be visualized through the returned :class:`FEVD` object:

.. ipython:: python

    @savefig var_fevd.png
    results.fevd(20).plot()


Reference

.. autosummary::
   :toctree: generated/

   var_model.FEVD

Statistical tests

A number of different methods are provided to carry out hypothesis tests about the model results and also the validity of the model assumptions (normality, whiteness / "iid-ness" of errors, etc.).

Granger causality

One is often interested in whether a variable or group of variables is "causal" for another variable, for some definition of "causal". In the context of VAR models, one can say that a set of variables are Granger-causal within one of the VAR equations. We will not detail the mathematics or definition of Granger causality, but leave it to the reader. The :class:`VARResults` object has the test_causality method for performing either a Wald (\chi^2) test or an F-test.

.. ipython:: python

    results.test_causality('realgdp', ['realinv', 'realcons'], kind='f')

Normality

Whiteness of residuals

Dynamic Vector Autoregressions

Note

To use this functionality, pandas must be installed. See the pandas documentation for more information on the below data structures.

One is often interested in estimating a moving-window regression on time series data for the purposes of making forecasts throughout the data sample. For example, we may wish to produce the series of 2-step-ahead forecasts produced by a VAR(p) model estimated at each point in time.

.. ipython:: python

    np.random.seed(1)
    import pandas.util.testing as ptest
    ptest.N = 500
    data = ptest.makeTimeDataFrame().cumsum(0)
    data

    var = DynamicVAR(data, lag_order=2, window_type='expanding')

The estimated coefficients for the dynamic model are returned as a :class:`pandas.Panel` object, which can allow you to easily examine, for example, all of the model coefficients by equation or by date:

.. ipython:: python
   :okwarning:

    import datetime as dt

    var.coefs

    # all estimated coefficients for equation A
    var.coefs.minor_xs('A').info()

    # coefficients on 11/30/2001
    var.coefs.major_xs(dt.datetime(2001, 11, 30)).T

Dynamic forecasts for a given number of steps ahead can be produced using the forecast function and return a :class:`pandas.DataMatrix` object:

.. ipython:: python

    var.forecast(2)

The forecasts can be visualized using plot_forecast:

.. ipython:: python

    @savefig dvar_forecast.png
    var.plot_forecast(2)

Reference

.. autosummary::
   :toctree: generated/

   hypothesis_test_results.HypothesisTestResults
   hypothesis_test_results.CausalityTestResults
   hypothesis_test_results.NormalityTestResults
   hypothesis_test_results.WhitenessTestResults

Vector Error Correction Models (VECM)

Vector Error Correction Models are used to study short-run deviations from one or more permanent stochastic trends (unit roots). A VECM models the difference of a vector of time series by imposing structure that is implied by the assumed number of stochastic trends. :class:`VECM` is used to specify and estimate these models.

A VECM(k_{ar}-1) has the following form

\Delta y_t = \Pi y_{t-1} + \Gamma_1 \Delta y_{t-1} + \ldots
               + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + u_t

where

\Pi = \alpha \beta'

as described in chapter 7 of [1].

A VECM(k_{ar} - 1) with deterministic terms has the form

\Delta y_t = \alpha \begin{pmatrix}\beta' & \eta'\end{pmatrix} \begin{pmatrix}y_{t-1} \\
             D^{co}_{t-1}\end{pmatrix} + \Gamma_1 \Delta y_{t-1} + \dots + \Gamma_{k_{ar}-1} \Delta y_{t-k_{ar}+1} + C D_t + u_t.

In D^{co}_{t-1} we have the deterministic terms which are inside the cointegration relation (or restricted to the cointegration relation). \eta is the corresponding estimator. To pass a deterministic term inside the cointegration relation, we can use the exog_coint argument. For the two special cases of an intercept and a linear trend there exists a simpler way to declare these terms: we can pass "ci" and "li" respectively to the deterministic argument. So for an intercept inside the cointegration relation we can either pass "ci" as deterministic or np.ones(len(data)) as exog_coint if data is passed as the endog argument. This ensures that D_{t-1}^{co} = 1 for all t.

We can also use deterministic terms outside the cointegration relation. These are defined in D_t in the formula above with the corresponding estimators in the matrix C. We specify such terms by passing them to the exog argument. For an intercept and/or linear trend we again have the possibility to use deterministic alternatively. For an intercept we pass "co" and for a linear trend we pass "lo" where the o stands for outside.

The following table shows the five cases considered in [2]. The last column indicates which string to pass to the deterministic argument for each of these cases.

Case Intercept Slope of the linear trend deterministic
I 0 0 "nc"
II - \alpha \beta^T \mu 0 "ci"
III \neq 0 0 "co"
IV \neq 0 - \alpha \beta^T \gamma "coli"
V \neq 0 \neq 0 "colo"

Reference

.. autosummary::
   :toctree: generated/

   vecm.VECM
   vecm.coint_johansen
   vecm.select_order
   vecm.select_coint_rank
   vecm.VECMResults
   vecm.CointRankResults


References
[1]Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis. Springer.
[2]Johansen, S. 1995. Likelihood-Based Inference in Cointegrated * *Vector Autoregressive Models. Oxford University Press.