The purpose of this notebook is to showcase the use of the `regularizedVAR` model from `sktime`.
We focus on its equivalence with `statsmodels`'s `VAR` model to affirm its implementation correctness.

In [6]:
import numpy as np
import pandas as pd
from statsmodels.tsa.api import VAR
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.regularized_var import RegularizedVAR

# Data Generation

First, we generate some synthetic multivariate time series data.

In [7]:
np.random.seed(42)
n_obs = 100
time_series_1 = np.random.randn(n_obs) + 1000
time_series_2 = np.random.randn(n_obs) + 100
time_series_3 = np.random.randn(n_obs) + 10

y = pd.DataFrame({'ts1': time_series_1, 'ts2': time_series_2, 'ts3': time_series_3,})

# Split the data into training and testing sets
y_train, y_test = temporal_train_test_split(y, test_size=5)

# Models Instatiation, Fitting and Forecasting

We then instantiate both `RegularizedVAR` model (from `sktime`) and `VAR` model (from `statsmodels`).
We fit both models to the training data and use them to forecast.

In [8]:
# Define the forecasting horizon and lags
fh = ForecastingHorizon(y_test.index, is_relative=False)
LAGS = 2

# Instantiate and fit the custom RegularizedVAR model, use it for forecasting 
custom_var = RegularizedVAR(lags=LAGS, L1_penalty=0.0, L2_penalty=0.0) # no regularization applied
custom_var.fit(y_train)
y_pred_custom = custom_var.predict(fh=fh)

# Fit the VAR model using statsmodels for comparison
model = VAR(y_train)
results = model.fit(LAGS)
y_pred_statsmodels = results.forecast(y_train.values[-LAGS:], steps=len(fh))

# Model Comparison

We now compare both models in terms of the fitted coefficients as well as their forecasts.
We see that they are essentially identical.

In [9]:
# Compare coefficients 
print("Coefficients from custom VAR model:")
print(custom_var.coefficients)
print("\nCoefficients from statsmodels VAR:")
print(results.coefs)

Coefficients from custom VAR model:
[[[-0.04379616 -0.14362626 -0.02868124]
  [-0.02691002 -0.12388731 -0.09702357]
  [-0.02884349 -0.12771343 -0.1133134 ]]

 [[-0.0488787  -0.0536462  -0.01244942]
  [ 0.11919954 -0.05735081 -0.13221751]
  [ 0.0930699   0.09744015 -0.0391535 ]]]

Coefficients from statsmodels VAR:
[[[-0.04379616 -0.14362626 -0.02868124]
  [-0.02691002 -0.12388731 -0.09702357]
  [-0.02884349 -0.12771343 -0.11331339]]

 [[-0.0488787  -0.0536462  -0.01244942]
  [ 0.11919954 -0.05735081 -0.13221751]
  [ 0.0930699   0.09744015 -0.0391535 ]]]


In [10]:
# Compare intercepts
print("\nIntercept from custom VAR model:")
print(custom_var.intercept)
print("\nIntercept from statsmodels VAR:")
print(results.intercept)


Intercept from custom VAR model:
[1112.71314061   28.21219049  -49.62208191]

Intercept from statsmodels VAR:
[1112.71313927   28.21219144  -49.62208114]


In [11]:
# Compare forecasts
print("\nForecast from custom VAR model:")
print(pd.DataFrame(y_pred_custom, columns=y.columns))
print("\nForecast from statsmodels:")
print(pd.DataFrame(y_pred_statsmodels, columns=y.columns))


Forecast from custom VAR model:
           ts1         ts2        ts3
95  999.954060   99.959997   9.815920
96  999.918200  100.005662  10.046023
97  999.907865  100.103676  10.047782
98  999.890628  100.054323  10.027466
99  999.894279  100.055787  10.045088

Forecast from statsmodels:
          ts1         ts2        ts3
0  999.954060   99.959997   9.815920
1  999.918200  100.005662  10.046023
2  999.907865  100.103676  10.047782
3  999.890628  100.054323  10.027466
4  999.894279  100.055787  10.045088
