## Understanding What Drives the Forecast
In this notebook we focus on interpretability, i.e., understanding why the model makes its predictions. We introduce a way to measure how much each input factor contributes to the forecast. This helps identify the key drivers behind the predictions and supports informed decision-making.

For this example, we use the German energy day-ahead hourly prediction dataset with two main features:
- Consumption: the actual electricity demand.
- Residual load: the demand not covered by renewable generation.

While there are several publicly available datasets for the day-ahead electricity forecasting problem, the dataset used here was built internally to capture more recent trends, including major regime shifts and outliers observed in the energy market over the past years. This allows us to benchmark forecasting models under realistic and challenging conditions.

By examining the relative importance of these features, we can see which factors most influence the forecast at different times, enabling better operational and strategic planning.

### Prerequisites

Let's import the relevant packages.

In [None]:
import sys
import pathlib


sys.path.append(pathlib.Path().resolve().parent.as_posix())

from inait import explain, predict, plot, read_file

### Load the dataset



In [None]:
### Load the data
data_path = "../data/power_day_ahead.csv"
data = read_file(
    data_path, index_col=0
)  # dataset must have a valid datetime index with fixed frequency
plot(historical_data=data, observation_length=int(data.shape[0] * 0.5))

### Forecasting setup and run 
We set the target variable to be the price `DE_Spot_EPEX_1H_A`. Besides that, we will use two features to train the model: `DE_Residual_Load_15M_A_AVG`, and `DE_Consumption_15M_A_AVG`.

In [None]:
# Configure prediction parameters
target_columns = [
    "DE_Spot_EPEX_1H_A"
]  # List of target columns to predict in the dataset
feature_columns = [
    "DE_Residual_Load_15M_A_AVG",
    "DE_Consumption_15M_A_AVG",
]  # Optional: List of feature columns to use for prediction
forecasting_horizon = 24  # Predict 24 hours ahead
observation_length = 24  # Use last 24 hours as historical context
model = "inait-advanced"

In [None]:
results, session_ids = {}, {}
for features in ["Price-only model ", "Price + other drivers model"]:
    if features == "Price + other drivers model":
        _feature_columns = feature_columns
    else:
        _feature_columns = None

    result = predict(
        data=data.iloc[
            :-forecasting_horizon, :
        ],  # we keep the last 24 observations as ground truth
        target_columns=target_columns,
        feature_columns=_feature_columns,
        forecasting_horizon=forecasting_horizon,
        observation_length=observation_length,
        model=model,
    )

    results[features] = result["prediction"]
    session_ids[features] = result["session_id"]

results["Price + other drivers model"]

In [None]:
plot(
    historical_data=data.loc[:, target_columns],
    predicted_data=results,
    observation_length=24 * 5,
)

Now, let's run inait's `explain` feature to identify the main drivers behind the predictions. Each forecasting horizon can have its own set of key drivers. By default, the feature shows results for the first forecasting horizon (`forecasted_step=0`), but you can easily change this.

In this example, we focus on the prediction for 19:00, a classic peak in the energy market that is notoriously difficult to forecast accurately.

The bars in the plot are ordered by their importance. Each bar is labeled with a variable name (e.g., price, consumption, or residual load) and a time reference in the format `t-1`, `t-2`, …, `t-observation_length` The time reference indicates which historical time step of that variable is influential. For example, if `DE_Spot_EPEX_1H_A(t-1)` appears in the top 10, it means that the most recent observation of the (spot) price variable is one of the most important predictors.

The last bar in the plot shows the summed importance of all remaining drivers. In this example, we have three core drivers, each considered over an `observation_length` of 24 hours, giving us a total of 72 features. Even though the plot highlights only the top 10, the total information contained in the remaining 62 features is still far from negligible.

In [None]:
explain(
    session_id=session_ids["Price + other drivers model"],
    historical_data=data.iloc[
        :-24, :
    ],  # we keep the last 24 observations as ground truth
    target_column=target_columns[0],
    max_drivers_displayed=10,  # you can increase this number to see more drivers
    forecasted_step=18,  # 19:00 (the 19th hour, index 18 since 0-indexed) is a classic peak time in the energy market
)

### Comments on results

We have run two models: one without features and one with features. The model with features demonstrates higher accuracy. The explanation plot shows that consumption and residual load—the two external drivers—are important contributors to the prediction. These variables complement the price input, which remains a key driver on its own. While domain knowledge is valuable for the initial selection of candidate features, the `explain` feature then reveals exactly which variables the model finds most important. This combination of expert intuition and data-driven explanation helps enhance both model performance and interpretability.