# Multivariate TS Forecasting: Vector Auto-Regression (VAR)

It is a generalized version of the autoregression model to forecast multiple parallel stationary time series. It comprises one equation per variable in the system. The right-hand side of each equation includes a constant and lags of all of the variables in the system. There are two decisions we have to make when using a VAR to forecast, namely:

- how many variables (K), and
- how many lags (p) should be included in the system.

The number of coefficients to be estimated in a VAR is equal to K+pK² (or 1+pK per equation). For example, for a VAR with K=5 variables and p=2 lags, there are 11 coefficients per equation, giving a total of 55 coefficients to be estimated. The more coefficients that need to be estimated, the larger the estimation error.

Therefore, it is advisable to keep K small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags (p) to be included.

### Example 1: Y1 & Y2 (Lag 1)

For example, the system of equations for a VAR(1) (i.e. lag 1) model with two time series (variables `Y1` and `Y2`) is as follows:

$ Y_{1,t} = \alpha_{1} + \beta_{11,1} Y_{1,t-1} + \beta_{12,1} Y_{2,t-1} + \epsilon_{1,t}$

$ Y_{2,t} = \alpha_{2} + \beta_{21,1} Y_{1,t-1} + \beta_{22,1} Y_{2,t-1} + \epsilon_{2,t}$


- $Y_{1,t-1}$ & $Y_{2,t-1}$: first lag of time series $Y_1$ and $Y_2$
- The above equation is referred to as a VAR(1) model, because, each equation is of order 1, that is, it contains up to one lag of each of the predictors (Y1 and Y2).
- Since the Y terms in the equations are interrelated, the Y’s are considered as endogenous variables, rather than as exogenous predictors.

### Example 2: Y1, Y2, & Y3 (Lag 2)

$ Y_{1,t} = \alpha_{1} + \beta_{11,1} Y_{1,t-1} + \beta_{12,1} Y_{2,t-1} + \beta_{13,1} Y_{3,t-1} + \beta_{11,2} Y_{1,t-2} + \beta_{12,2} Y_{2,t-2} + \beta_{13,2} Y_{3,t-2} + \epsilon_{1,t}$

$ Y_{2,t} = \alpha_{2} + \beta_{21,1} Y_{1,t-1} + \beta_{22,1} Y_{2,t-1} + \beta_{23,1} Y_{3,t-1} + \beta_{21,2} Y_{1,t-2} + \beta_{22,2} Y_{2,t-2} + \beta_{23,2} Y_{3,t-2} + \epsilon_{2,t}$

$ Y_{3,t} = \alpha_{3} + \beta_{31,1} Y_{1,t-1} + \beta_{32,1} Y_{2,t-1} + \beta_{33,1} Y_{3,t-1} + \beta_{31,2} Y_{1,t-2} + \beta_{32,2} Y_{2,t-2} + \beta_{33,2} Y_{3,t-2} + \epsilon_{3,t}$

The VAR can also be implemented using VARMAX function in Statsmodels which allows estimation of VAR, VMA, VARMA, and VARMAX models through the order argument.

## Import libraries

In [1]:
import numpy as np
from statsmodels.tsa.vector_ar.var_model import VAR

  import pandas.util.testing as tm


## Load data

In [2]:
# generate random data
v1 = np.arange(10) + np.random.normal(loc=1, scale=0.5, size=10)
v2 = v1 + np.random.normal(loc=1, scale=0.5, size=10)
data = np.column_stack((v1, v2))
data

array([[0.71351521, 1.96163036],
       [1.86483075, 2.21992456],
       [3.2339081 , 4.14695737],
       [4.04250609, 5.60739667],
       [4.97257699, 5.93527076],
       [5.33924381, 6.32092579],
       [6.44514184, 6.86437087],
       [8.00074845, 9.00373449],
       [8.25558248, 8.80281284],
       [9.67540189, 9.96447416]])

## VAR Model Implementation

In [3]:
# only demonstration (summary values not important here)
model = VAR(data)
model_fit = model.fit()
print(model_fit.summary())

  Summary of Regression Results   
Model:                         VAR
Method:                        OLS
Date:           Fri, 20, Aug, 2021
Time:                     14:16:04
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                   -2.74096
Nobs:                     9.00000    HQIC:                  -3.15619
Log likelihood:          -6.61488    FPE:                  0.0596366
AIC:                     -2.87245    Det(Omega_mle):       0.0335456
--------------------------------------------------------------------
Results for equation y1
           coefficient       std. error           t-stat            prob
------------------------------------------------------------------------
const         1.830064         0.526112            3.478           0.001
L1.y1         1.613919         0.386988            4.170           0.000
L1.y2        -0.665045         0.397198           -1.674           0.094

Results for equation 

## Make Prediction

In [4]:
# just one prediction so start=end
yhat = model_fit.forecast(model_fit.y, steps=1)
print(yhat)

[[10.81855508 11.44606767]]


  obj = getattr(results, attr)
