 # Multivariate HAR Model (MHAR-ReVar) with LASSO Regularization



 Corsi (2009) introduced the Heterogeneous Autoregressive (HAR) model, which extends traditional autoregressive (AR) models by incorporating realized volatility measures over multiple time horizons (e.g., daily, weekly, monthly).


 Here, we focus on a **Multivariate HAR model using realized variances (MHAR-ReVar)** to jointly model the dynamics of realized variances across electricity markets of France, Spain and Portugal.


 However, MHAR models often suffer from a very high-dimensional parameter space due to multiple lags and cross-variable interactions. To address this complexity, we employ the Least Absolute Shrinkage and Selection Operator (LASSO) introduced by Tibshirani (1996), which performs variable selection and regularization simultaneously. According to Audrino and Knaus (2016), LASSO is particularly effective when the data-generating process resembles a HAR model.


This notebook will load the PIT-transformed realized variances and prepare the data for MHAR-LASSO modeling.

 ## Step 1: Load PIT-transformed realized variances



 The data contains daily PIT-transformed realized variances for electricity markets of France, Spain and Portugal.

 Each row corresponds to a date, and columns represent markets.

In [None]:
# Import libraries

import pandas as pd
import numpy as np

pit_vars = pd.read_parquet("parquet_files/pit_transformed_variances.parquet")
pit_vars.index = pd.to_datetime(pit_vars.index)  # Ensure index is datetime
pit_vars.head(5)


Area,BZN|ES,BZN|FR,BZN|PT
2021-05-21,-1.228412,-0.469873,-1.221048
2021-05-22,-1.37437,-0.681604,-1.35234
2021-05-23,-1.322559,-0.425643,-1.2899
2021-05-24,-1.239586,-0.54104,-1.210123
2021-05-25,-1.747794,-1.202919,-1.747794


 ## Step 2: Construct HAR features



 The HAR model uses realized variance aggregated over different time horizons (lags):

 - Daily (lag 1)

 - Weekly (lag 7)

 - Monthly (lag 30)



 For each market's PIT-transformed variance series, we construct these lagged averages to use as predictors.

In [3]:
import numpy as np

def har_features(series, lags=[1, 7, 30]):
    """
    Construct HAR features as lagged averages of the series.
    Parameters:
      - series: pd.Series of PIT-transformed realized variances
      - lags: list of integers for aggregation windows (in days)
    Returns:
      - pd.DataFrame with columns 'HAR_lag1', 'HAR_lag7', 'HAR_lag30'
    """
    df = pd.DataFrame(index=series.index)
    for lag in lags:
        df[f'HAR_lag{lag}'] = series.rolling(window=lag).mean().shift(1)  # shift(1) to avoid lookahead bias
    return df

# Apply HAR feature construction for each market
har_features_all = pd.concat(
    [har_features(pit_vars[col]).add_prefix(col + '_') for col in pit_vars.columns],
    axis=1
)

har_features_all.head(10)


Unnamed: 0,BZN|ES_HAR_lag1,BZN|ES_HAR_lag7,BZN|ES_HAR_lag30,BZN|FR_HAR_lag1,BZN|FR_HAR_lag7,BZN|FR_HAR_lag30,BZN|PT_HAR_lag1,BZN|PT_HAR_lag7,BZN|PT_HAR_lag30
2021-05-21,,,,,,,,,
2021-05-22,-1.228412,,,-0.469873,,,-1.221048,,
2021-05-23,-1.37437,,,-0.681604,,,-1.35234,,
2021-05-24,-1.322559,,,-0.425643,,,-1.2899,,
2021-05-25,-1.239586,,,-0.54104,,,-1.210123,,
2021-05-26,-1.747794,,,-1.202919,,,-1.747794,,
2021-05-27,-1.960261,,,-1.330948,,,-2.111864,,
2021-05-28,-2.425978,-1.614137,,-1.449779,-0.871687,,-2.425978,-1.622721,
2021-05-29,-2.1815,-1.750293,,-1.65196,-1.040556,,-2.145383,-1.754769,
2021-05-30,-1.480303,-1.765426,,-1.270167,-1.124637,,-1.475119,-1.772309,


## Step 3: Prepare target variables and features



* Our targets are the PIT-transformed realized variances for each market on day \(t\).

* Features are the HAR lagged averages constructed from data up to day \(t-1\).

*  We drop rows with NaNs caused by rolling windows.

In [4]:
# Align targets and features
targets = pit_vars.loc[har_features_all.index]
print(targets)

# Drop rows with missing values in features or targets
valid_idx = har_features_all.dropna().index.intersection(targets.dropna().index)
X = har_features_all.loc[valid_idx]
y = targets.loc[valid_idx]

print(f"Feature matrix shape: {X.shape}")
print(f"Target matrix shape: {y.shape}")


Area          BZN|ES    BZN|FR    BZN|PT
2021-05-21 -1.228412 -0.469873 -1.221048
2021-05-22 -1.374370 -0.681604 -1.352340
2021-05-23 -1.322559 -0.425643 -1.289900
2021-05-24 -1.239586 -0.541040 -1.210123
2021-05-25 -1.747794 -1.202919 -1.747794
...              ...       ...       ...
2025-04-26 -1.111202  0.257660 -1.082629
2025-04-27  0.859605  0.573548  0.935061
2025-04-28  0.579711  0.854582  0.528997
2025-04-29 -0.608767  0.273879 -0.565366
2025-04-30  1.972282  1.221048 -1.293907

[1441 rows x 3 columns]
Feature matrix shape: (1411, 9)
Target matrix shape: (1411, 3)


 ## Step 4: Fit MHAR model with LASSO regularization



 Since this is a multivariate setup, we fit a separate LASSO regression per market.

 This approach reduces dimensionality while allowing sparsity in parameters.



 We use scikit-learn's `LassoCV` for automatic cross-validated hyperparameter selection.

In [5]:
from sklearn.linear_model import LassoCV
import numpy as np

lasso_models = {}
lasso_coefs = {}

for market in y.columns:
    print(f"Training LASSO model for {market}...")
    model = LassoCV(cv=5, max_iter=10000, random_state=42)
    model.fit(X, y[market])
    lasso_models[market] = model
    lasso_coefs[market] = pd.Series(model.coef_, index=X.columns)
    print(f"Best alpha for {market}: {model.alpha_:.5f}")
    print(f"Number of selected features: {(model.coef_ != 0).sum()}")


Training LASSO model for BZN|ES...
Best alpha for BZN|ES: 0.00168
Number of selected features: 7
Training LASSO model for BZN|FR...
Best alpha for BZN|FR: 0.00160
Number of selected features: 6
Training LASSO model for BZN|PT...
Best alpha for BZN|PT: 0.00148
Number of selected features: 7


 ## Step 5: Review selected features and coefficients for one market

In [6]:
market_to_inspect = y.columns[0]
print(f"LASSO coefficients for {market_to_inspect}:")
print(lasso_coefs[market_to_inspect][lasso_coefs[market_to_inspect] != 0])


LASSO coefficients for BZN|ES:
BZN|ES_HAR_lag1     0.245011
BZN|ES_HAR_lag7     0.091762
BZN|FR_HAR_lag1    -0.055596
BZN|FR_HAR_lag7     0.130531
BZN|PT_HAR_lag1     0.197276
BZN|PT_HAR_lag7     0.108132
BZN|PT_HAR_lag30    0.193342
dtype: float64


 ## Step 6: Save fitted models coefficients for future use

In [None]:
import pickle

with open('pickle_files/lasso_mhar_coefficients.pkl', 'wb') as f:
    pickle.dump(lasso_coefs, f)




Saved LASSO MHAR model coefficients.
