### Advanced Models for ETTh1 Forecasting Benchmark

This example demonstrates advanced forecasting models applied to the ETTh1 (Electricity Transformer Temperature, Hourly) benchmark, a widely adopted standard in the forecasting community. The ETTh1 dataset presents significant challenges with its high-frequency electricity transformer measurements, capturing complex temporal dynamics including daily and seasonal cycles, alongside irregular fluctuations that test state-of-the-art forecasting capabilities.

To ensure computational efficiency in this notebook, we use a subset of 1,000 rows and reduced `forecasting_horizon` and `observation_length` parameters compared to standard benchmark configurations.

### Prerequisites

In [None]:
import sys
import pathlib

import time

import pandas as pd

sys.path.append(pathlib.Path().resolve().parent.as_posix())

from inait import predict_test, score_test, plot, load_credentials

base_url, auth_key = load_credentials("../credentials.txt")

### Load the dataset


In [None]:
data_path = "../data/etth1_small.csv"
data = pd.read_csv(data_path, index_col=0)

# Configure prediction parameters
target_columns = data.columns.tolist()  # Use all columns as targets

forecasting_horizon = 12
observation_length = 24

test_size = 5  # we will evaluate the model performances on the last 5 steps

### Split the data into train/test 

The ETTh1 dataset contains hourly electricity transformer measurements across 7 variables, including electricity loads and temperatures. Our goal is to predict all 7 variables simultaneously for the next 12 hours using three progressively sophisticated models: `Inait-basic`, `Inait-advanced`, and `Inait-best`. Note that higher-performing models require longer computation times.

Model evaluation follows standard machine learning practices: we reserve a portion of the dataset as a test set for performance assessment, while using the remaining data for training. The test set remains unseen during training to provide an unbiased evaluation. We measure performance using Mean Absolute Error (MAE); lower MAE indicates better accuracy.

In [None]:
scores = []
predictions = []
for model in ["Inait-basic", "Inait-advanced", "Inait-best"]:
    prediction = predict_test(
        base_url=base_url,
        auth_key=auth_key,
        data=data,
        target_columns=target_columns,
        forecasting_horizon=forecasting_horizon,
        observation_length=observation_length,
        model=model,
        test_size=test_size,
    )["predictions"]
    predictions.append(prediction)
    scores.append(score_test(predictions=prediction, ground_truth=data, metric="mae"))
    time.sleep(1)

pd.DataFrame(
    scores, columns=["MAE"], index=["Inait basic", "Inait advanced", "Inait best"]
).round(4)

In [None]:
predictions = {
    "Inait-basic": predictions[0][-1],
    "Inait-advanced": predictions[1][-1],
    "Inait-best": predictions[2][-1],
}

plot(
    historical_data=data,
    predicted_data=predictions,
    # legend_title_predicted="Inait-best prediction",
)

### Results Analysis

The models successfully capture the ground truth patterns for most variables, demonstrating strong forecasting performance. While some variables show larger prediction errors, it's important to note that this is a simplified simulation optimized for notebook execution time.

Running the full benchmark configuration would yield the comprehensive results shown in the following comparison plot against state-of-the-art pretrained models from leading competitors.


<div align="center">
<img src="../assets/benchmark_comparison_etth1_inait.png" alt="Benchmark comparison results" style="width: 60%;">
</div>

### Comparison Against Open-Source Baseline Models

To provide context for our results, we'll compare the Inait models against traditional forecasting baselines implemented using open-source libraries. We evaluate two common baseline approaches: a naive model (which simply repeats the last observed values) and a linear regression model. While these tools are freely available, implementing them effectively still requires significant forecasting and data science expertise, as you will see in the multiple lines of code in the next cell.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.base import BaseEstimator, RegressorMixin


class NaiveBaseline(BaseEstimator, RegressorMixin):
    """Naive baseline that predicts the last observed value"""

    def __init__(self, strategy="last"):
        self.strategy = strategy

    def fit(self, X, y):
        # For naive baseline, we don't need to fit anything
        return self

    def predict(self, X):
        if self.strategy == "last":
            # Use the last observation_length values to predict
            # X shape: (n_samples, obs_len * n_features)
            # Extract last value for each feature and repeat for horizon
            n_samples, n_features_flat = X.shape
            n_features = len(target_columns)  # Assuming target_columns is available
            obs_len = n_features_flat // n_features

            # Reshape X to (n_samples, obs_len, n_features)
            X_reshaped = X.reshape(n_samples, obs_len, n_features)

            # Take last observation and repeat for forecasting horizon
            last_obs = X_reshaped[:, -1, :]  # Shape: (n_samples, n_features)

            # Repeat for forecasting horizon and flatten
            predictions = np.tile(last_obs, (1, forecasting_horizon))

            return predictions

        return np.zeros((X.shape[0], X.shape[1]))  # Fallback


def predict_sklearn(
    data,
    target_columns,
    forecasting_horizon,
    observation_length,
    estimator=None,
    train_size=None,
    test_size=None,
):
    """
    Forecast using sklearn estimators with column-wise window standardization
    """
    if estimator == "naive":
        estimator = NaiveBaseline()
    else:
        estimator = LinearRegression()

    if train_size is not None and test_size is not None:
        raise ValueError(
            "Both train_size and test_size cannot be specified at the same time. Please specify only one of them."
        )
    if train_size is not None:
        split_idx = int(len(data) * train_size)
    elif test_size is not None:
        split_idx = len(data) - test_size - forecasting_horizon
    else:
        split_idx = int(len(data) * 0.8)

    train_data = data.iloc[:split_idx]

    # Create sequences from training data with column-wise standardization
    def create_sequences(data, obs_len, horizon):
        X, y = [], []
        eps = 1e-8  # Small value to avoid division by zero

        for i in range(len(data) - obs_len - horizon + 1):
            # Get observation window
            window = data.iloc[i : i + obs_len].values

            # Column-wise standardization for this window
            mu = window.mean(axis=0)
            sigma = window.std(axis=0)
            sigma[sigma < eps] = 1.0  # Avoid division by zero

            # Standardize the window
            norm_window = (window - mu) / sigma

            X.append(norm_window.flatten())
            y.append(data.iloc[i + obs_len : i + obs_len + horizon].values)
        return np.array(X), np.array(y)

    # Train model on training sequences only
    X_train, y_train = create_sequences(
        train_data[target_columns], observation_length, forecasting_horizon
    )
    y_train_flat = y_train.reshape(y_train.shape[0], -1)

    # Fit model
    if isinstance(estimator, NaiveBaseline):
        model = estimator
        model.fit(X_train, y_train_flat)
    else:
        model = MultiOutputRegressor(estimator)
        model.fit(X_train, y_train_flat)

    # Generate predictions for test period
    predictions = []
    start_test_idx = split_idx
    eps = 1e-8

    for t in range(start_test_idx, len(data) - forecasting_horizon):
        # Get test window and standardize it independently
        test_window = data.iloc[t - observation_length : t].values

        # Column-wise standardization for this test window
        mu = test_window.mean(axis=0)
        sigma = test_window.std(axis=0)
        sigma[sigma < eps] = 1.0  # Avoid division by zero

        # Standardize the test window
        norm_test_window = (test_window - mu) / sigma
        X_test = norm_test_window.flatten().reshape(1, -1)

        y_pred_flat = model.predict(X_test)
        y_pred = y_pred_flat.reshape(forecasting_horizon, len(target_columns))

        # Create prediction DataFrame
        pred_df = pd.DataFrame(
            y_pred,
            columns=target_columns,
            index=data.index[t : t + forecasting_horizon],
        )
        predictions.append(pred_df)

    estimator_name = (
        "naive" if isinstance(estimator, NaiveBaseline) else type(estimator).__name__
    )
    session_ids = [f"{estimator_name}_session_{i}" for i in range(len(predictions))]

    return predictions, session_ids


# Naive Baseline
naive_predictions, naive_sessions = predict_sklearn(
    data=data,
    target_columns=target_columns,
    forecasting_horizon=forecasting_horizon,
    observation_length=observation_length,
    estimator="naive",
    test_size=test_size,
)


# Linear Regression
linear_predictions, linear_sessions = predict_sklearn(
    data=data,
    target_columns=target_columns,
    forecasting_horizon=forecasting_horizon,
    observation_length=observation_length,
    estimator=LinearRegression(),
    test_size=test_size,
)

In [None]:
scores_df = pd.DataFrame(
    scores, columns=["MAE"], index=["Inait basic", "Inait advanced", "Inait best"]
).round(4)

scores_df = pd.concat(
    [
        pd.DataFrame(
            score_test(predictions=naive_predictions, ground_truth=data, metric="mae"),
            columns=["MAE"],
            index=["Naive from scratch"],
        ),
        pd.DataFrame(
            score_test(predictions=linear_predictions, ground_truth=data, metric="mae"),
            columns=["MAE"],
            index=["Linear from scratch"],
        ),
        scores_df,
    ], axis=0
).round(4)
scores_df.columns = ["Mean Absolute Error (lower is better)"]
print(scores_df)

### Performance Comparison Visualization

The following chart compares our Inait models against the open-source baseline implementations. For clarity of visualization, we exclude the Naive model from this comparison due to its significantly higher error rate.

In [None]:
import plotly.express as px

scores_df = scores_df.drop(index=["Naive from scratch"])

# Create vertical bar plot with green color scale based on values
fig = px.bar(
    scores_df,
    x=scores_df.index,
    y="Mean Absolute Error (lower is better)",
    color="Mean Absolute Error (lower is better)",
    color_continuous_scale="Greens_r",  # Reversed greens (darker for lower values)
    title="Model Performance Comparison",
    labels={"x": "Models", "y": "Mean Absolute Error (lower is better)"},
    text="Mean Absolute Error (lower is better)",  # Add values on bars
)

# Update layout for better readability
fig.update_layout(
    xaxis_title="Models",
    yaxis_title="Mean Absolute Error (lower is better)",
    showlegend=False,
    yaxis=dict(
        range=[
            scores_df["Mean Absolute Error (lower is better)"].min() * 0.9,
            scores_df["Mean Absolute Error (lower is better)"].max() * 1.1,
        ]
    ),
    coloraxis_showscale=False,
)

fig.update_traces(texttemplate="%{text:.4f}", textposition="outside")

fig.show()