# LEAR one-shot replication (PJM)

Minimal, clean notebook to replicate the published LEAR forecasts for the first test day (2016-12-27) and compare to the provided benchmark CSV.

Note, we were also able to self-replicate, in getting at the same point estimates as produced in notebook: `Notebooks/02_forecasting_models.ipynb` (written to `forecasts_local`).

## window = 1092

In [1]:
import importlib
import pandas as pd
import numpy as np

from epftoolbox.data import read_data
import epftoolbox.models._lear as _lear

# Ensure latest LEAR logic is loaded
importlib.reload(_lear)
from epftoolbox.models._lear import LEAR

# Official train/test split for PJM (2 test years as in the benchmark)
df_train, df_test = read_data(dataset='PJM', years_test=2, path='datasets')

target_date = pd.to_datetime('2016-12-27 00:00:00')
calibration_window = 1092  # 3-year window (LEAR 1092)

# Data available up to target day; hide target day prices for daily recalibration
data_available = pd.concat([df_train, df_test.loc[:target_date + pd.Timedelta(hours=23)]])
data_available.loc[target_date:target_date + pd.Timedelta(hours=23), 'Price'] = np.nan

model = LEAR(calibration_window=calibration_window)
prediction = model.recalibrate_and_forecast_next_day(
    df=data_available,
    next_day_date=target_date,
    calibration_window=calibration_window,
)

print(f"Prediction for {target_date.date()} (LEAR {calibration_window} days):")
print(list(prediction[0]))



2025-12-10 16:57:35.773716: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-10 16:57:35.797791: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-12-10 16:57:35.797817: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-12-10 16:57:35.798498: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-10 16:57:35.802808: I tensorflow/core/platform/cpu_feature_guar

Prediction for 2016-12-27 (LEAR 1092 days):
[19.10084436280493, 18.264595052624454, 17.53075027173726, 17.434241164226606, 18.22410953667942, 19.85321601109398, 23.788848056333673, 25.228665817697696, 26.251532835112556, 27.419878156879065, 28.5286133462322, 28.063830938555824, 27.211335219793543, 26.745630089348808, 26.208890106713888, 26.338957625953665, 28.022927743149186, 35.886230846829605, 32.565618443229326, 31.739756707231756, 31.986765822896174, 28.9021473579299, 24.97601551255972, 22.516413026453357]


In [2]:
# Compare with published benchmark row
published = pd.read_csv('forecasts/Forecasts_PJM_DNN_LEAR_ensembles.csv')
published_row = published.loc[:24, 'LEAR 1092'].to_list()

print("Published LEAR 1092:")
print(published_row)



Published LEAR 1092:
[19.106, 18.2766, 17.4325, 17.44, 18.175, 19.8149, 23.511, 25.8432, 26.5383, 27.4225, 28.4752, 28.3429, 27.3322, 26.7224, 25.6884, 25.7518, 27.7477, 36.0018, 32.6576, 31.8707, 31.986, 28.6272, 24.8992, 22.5305, 21.9406]


## LEAR 56 (8 weeks)

Replicate the 56-day window variant for 2016-12-27.

Note on AIC/LARS: the built-in LEAR uses `LassoLarsIC` (AIC) per hour. That estimator needs `n_samples > n_features` (plus an intercept) after dropping the first week for lags. With a 56-day window, `n_features` ≈ 247 and usable days ≈ 49, so AIC can fail with “samples < features”. The code below keeps the 56-day intent, but if AIC raises, it automatically retries with the minimal window that satisfies `n_samples > n_features` so the fit can proceed without changing the core LEAR implementation.

In [3]:
calibration_window_56 = 56

data_available_56 = pd.concat([df_train, df_test.loc[:target_date + pd.Timedelta(hours=23)]])
data_available_56.loc[target_date:target_date + pd.Timedelta(hours=23), 'Price'] = np.nan

import warnings
from sklearn.exceptions import ConvergenceWarning
from epftoolbox.data import scaling
from sklearn.linear_model import Lasso

# Helper: LEAR variant that uses a fixed alpha instead of LARS/AIC (no code changes to _lear.py)
class LEARFixedAlpha(LEAR):
    def __init__(self, calibration_window, alpha_fixed):
        super().__init__(calibration_window=calibration_window)
        self._alpha_fixed = alpha_fixed

    def recalibrate(self, Xtrain, Ytrain):
        # Copied from LEAR.recalibrate, but using fixed alpha instead of LassoLarsIC
        [Ytrain], self.scalerY = scaling([Ytrain], 'Invariant')
        [Xtrain_no_dummies], self.scalerX = scaling([Xtrain[:, :-7]], 'Invariant')
        Xtrain[:, :-7] = Xtrain_no_dummies
        self.models = {}
        for h in range(24):
            model = Lasso(max_iter=2500, alpha=self._alpha_fixed)
            model.fit(Xtrain, Ytrain[:, h])
            self.models[h] = model

# Run with native AIC; if it fails (n_samples < n_features), retry with fixed alpha
try:
    model_56 = LEAR(calibration_window=calibration_window_56)
    prediction_56 = model_56.recalibrate_and_forecast_next_day(
        df=data_available_56,
        next_day_date=target_date,
        calibration_window=calibration_window_56,
    )
    alphas_56 = [model_56.models[h].alpha for h in range(24)]
    used_mode = "AIC"
    used_alpha = None
except ValueError:
    fixed_alpha = 1e-2  # slightly stronger shrinkage to help convergence
    warnings.filterwarnings("ignore", category=ConvergenceWarning)
    model_56 = LEARFixedAlpha(calibration_window=calibration_window_56, alpha_fixed=fixed_alpha)
    prediction_56 = model_56.recalibrate_and_forecast_next_day(
        df=data_available_56,
        next_day_date=target_date,
        calibration_window=calibration_window_56,
    )
    alphas_56 = [fixed_alpha] * 24
    used_mode = "fixed_alpha"
    used_alpha = fixed_alpha

print(f"Prediction for {target_date.date()} (LEAR {calibration_window_56} days, mode={used_mode}, alpha={used_alpha}):")
print([round(x, 4) for x in prediction_56[0]])

print("Alphas (per hour):")
print(alphas_56)

published_56 = published.loc[:24, 'LEAR 56'].to_list()
print("Published LEAR 56:")
print(published_56)



Prediction for 2016-12-27 (LEAR 56 days, mode=fixed_alpha, alpha=0.01):
[19.4149, 18.7822, 18.0082, 18.4203, 18.8302, 20.9372, 30.5142, 30.914, 29.8121, 30.1394, 29.7398, 28.61, 26.8479, 26.5137, 25.8102, 25.6235, 27.932, 37.8666, 34.6457, 34.8369, 33.6476, 28.759, 24.2697, 22.0278]
Alphas (per hour):
[0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Published LEAR 56:
[20.1291, 19.5229, 19.0085, 18.9757, 19.3679, 21.2564, 30.5556, 30.859, 29.8895, 30.2034, 30.031, 28.574, 26.8186, 26.2332, 25.5586, 25.5853, 27.9074, 38.0152, 35.327, 34.2065, 32.623000000000005, 28.5701, 24.3618, 21.9919, 21.1114]


### LEAR 84

In [4]:
calibration_window_84 = 84

data_available_84 = pd.concat([df_train, df_test.loc[:target_date + pd.Timedelta(hours=23)]])
data_available_84.loc[target_date:target_date + pd.Timedelta(hours=23), 'Price'] = np.nan

import warnings
from sklearn.exceptions import ConvergenceWarning
from epftoolbox.data import scaling
from sklearn.linear_model import Lasso

# Helper: LEAR variant that uses a fixed alpha instead of LARS/AIC (no code changes to _lear.py)
class LEARFixedAlpha(LEAR):
    def __init__(self, calibration_window, alpha_fixed):
        super().__init__(calibration_window=calibration_window)
        self._alpha_fixed = alpha_fixed

    def recalibrate(self, Xtrain, Ytrain):
        # Copied from LEAR.recalibrate, but using fixed alpha instead of LassoLarsIC
        [Ytrain], self.scalerY = scaling([Ytrain], 'Invariant')
        [Xtrain_no_dummies], self.scalerX = scaling([Xtrain[:, :-7]], 'Invariant')
        Xtrain[:, :-7] = Xtrain_no_dummies
        self.models = {}
        for h in range(24):
            model = Lasso(max_iter=2500, alpha=self._alpha_fixed)
            model.fit(Xtrain, Ytrain[:, h])
            self.models[h] = model

# Run with native AIC; if it fails (n_samples < n_features), retry with fixed alpha
try:
    model_84 = LEAR(calibration_window=calibration_window_84)
    prediction_84 = model_84.recalibrate_and_forecast_next_day(
        df=data_available_84,
        next_day_date=target_date,
        calibration_window=calibration_window_84,
    )
    alphas_84 = [model_84.models[h].alpha for h in range(24)]
    used_mode = "AIC"
    used_alpha = None
except ValueError:
    fixed_alpha = 1e-2  # slightly stronger shrinkage to help convergence
    warnings.filterwarnings("ignore", category=ConvergenceWarning)
    model_84 = LEARFixedAlpha(calibration_window=calibration_window_84, alpha_fixed=fixed_alpha)
    prediction_84 = model_84.recalibrate_and_forecast_next_day(
        df=data_available_84,
        next_day_date=target_date,
        calibration_window=calibration_window_84,
    )
    alphas_84 = [fixed_alpha] * 24
    used_mode = "fixed_alpha"
    used_alpha = fixed_alpha

print(f"Prediction for {target_date.date()} (LEAR {calibration_window_84} days, mode={used_mode}, alpha={used_alpha}):")
print([round(x, 4) for x in prediction_84[0]])

print("Alphas (per hour):")
print(alphas_84)

published_84 = published.loc[:24, 'LEAR 84'].to_list()
print("Published LEAR 84:")
print(published_84)



Prediction for 2016-12-27 (LEAR 84 days, mode=fixed_alpha, alpha=0.01):
[20.373, 19.1589, 18.5919, 18.911, 19.3572, 20.0788, 28.957, 31.821, 29.8934, 30.1757, 30.0602, 28.6828, 26.9199, 26.3157, 25.4317, 25.7163, 27.6039, 36.2432, 34.3367, 33.4044, 32.5971, 27.549, 24.6472, 22.7236]
Alphas (per hour):
[0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Published LEAR 84:
[20.7366, 19.5459, 18.9345, 19.2347, 19.877, 20.584, 30.0478, 33.2934, 30.1701, 30.5597, 30.4527, 29.0341, 27.233, 26.6065, 25.8365, 25.7167, 27.3243, 37.0358, 34.7939, 33.6156, 32.6022, 28.5273, 24.7556, 22.7229, 21.2841]


# DNN one-shot replication (PJM)