Till now, we have studied time series analysis from a univariate perspective, where the past realizations explain the current value at displacement `t`. The model `Vector Autoregression` proposes the cross-variable dynamics among N different equations. To motivate, the classical homoscedasticity ordinary linear regression epitomises the new time-series proposal.      


A simple linear regression is specified as it follows:   

$$y_t = \beta_0 + \beta_1 x_t + \epsilon_t, \quad \epsilon_t \sim WN(0, \sigma^2), $$

where $x_t$ is known as the explanatory variable for $y_t$, the outcome. OLS is a model for correlation, not causation. Armed with that, it time to postulate the `Vector Autoregressions`. 


------------------
`VAR`

In a model VAR(p), we regress the endogenous variable on p displacements of itself, and p lags of every other variable - with trends, seasonals shall be included. Assuming a two-variable VAR(1), we have:

$$ y_{1,t} = \phi_{11} y_{1,t-1} + \phi_{12} y_{2,t-1} + \epsilon_{1,t}$$
$$ y_{2,t} = \phi_{21} y_{1,t-1} + \phi_{22} y_{2,t-1} + \epsilon_{2,t}.$$


One trailblazing element about `VAR` is the shocks can be correlated. The disturbance variance-covariance is:

$$ \epsilon_{1,t} \sim WN(0, \sigma_1^2) $$
$$ \epsilon_{2,t} \sim WN(0, \sigma_2^2) $$
$$ cov(\epsilon_{1,t}, \epsilon_{2,t}) = \sigma_{12}. $$



The foremost Python library for statistical learning is the `statsmodels` has an API for VAR learning. Lets check this out.

In [2]:
import pandas as pd

from statsmodels.tsa.api import VAR

data = pd.read_csv('DailyDelhiClimateData/DailyDelhiClimateTrain.csv')
data['date'] = pd.to_datetime(data['date'])

In [5]:
model = VAR(data[['meantemp', 'humidity', 'wind_speed', 'meanpressure']])

results = model.fit(2)

results.summary()

  Summary of Regression Results   
Model:                         VAR
Method:                        OLS
Date:           Mon, 28, Nov, 2022
Time:                     20:45:37
--------------------------------------------------------------------
No. of Equations:         4.00000    BIC:                    17.8062
Nobs:                     1460.00    HQIC:                   17.7244
Log likelihood:          -21153.9    FPE:                4.74797e+07
AIC:                      17.6758    Det(Omega_mle):     4.63268e+07
--------------------------------------------------------------------
Results for equation meantemp
                     coefficient       std. error           t-stat            prob
----------------------------------------------------------------------------------
const                  -0.513812         0.508413           -1.011           0.312
L1.meantemp             0.858908         0.034791           24.688           0.000
L1.humidity             0.009779         0.007393

VAR(n) model is an important tool for assessing associativity between features, a.k.a "$x_i$ contains information for predicting $y_j$". Statmodels implements `t-stat`, but `F-test` is commonly used to hypothesis-testing the parameters.   

