# Baseline Time Series Forecasting for Electricity Load Data

This notebook demonstrates and evaluates several baseline time series forecasting models for electricity load data. Each section introduces a different model, explains its assumptions and suitability, and provides diagnostics to assess its performance. Baseline models are essential for benchmarking more complex forecasting approaches.

## 1. Import Required Libraries

We begin by importing all necessary libraries for data manipulation, modeling, plotting, and evaluation. This includes `polars` for efficient data handling, `numpy` for numerical operations, `plotly` for interactive visualization, and forecasting utilities from `statsforecast` and custom modules.

In [None]:
# Data manipulation
import polars as pl

# Plotting

# Forecasting models and utilities
from statsforecast import StatsForecast
from statsforecast.models import (
    Naive,
    HistoricAverage,
    SeasonalNaive,
    RandomWalkWithDrift,
)
from utilsforecast.plotting import plot_series
from utilsforecast.losses import *
from utilsforecast.evaluation import evaluate

# Feature engineering and diagnostics
from statsmodels.stats.diagnostic import acorr_ljungbox

# Custom plotting and summary utilities
from plotting_utils import (
    plotly_series as plot_series,
    plot_residuals_diagnostic,
)
from summary_utils import get_fitted_residuals

from functools import partial

## 2. Load and Prepare Data

We load the preprocessed electricity load dataset using `polars`. The dataset contains half-hourly energy consumption readings from London smart meters, along with weather and calendar features. We generate a timestamp column for each series and rename columns for modeling.

In [None]:
# Load preprocessed data
data = pl.read_parquet(
    "data/london_smart_meters/preprocessed/london_smart_meters_merged_block_0-7.parquet"
)

# Generate timestamp column for each smart meter
timestamp = data.group_by("LCLid").agg(
    pl.datetime_range(
        start=pl.col("start_timestamp"),
        end=pl.col("start_timestamp").dt.offset_by(
            pl.format("{}m", pl.col("series_length").sub(1).mul(30))
        ),
        interval="30m",
    ).alias("ds"),
)

# Join timestamps and rename columns for modeling
data = timestamp.join(data, on="LCLid", how="inner").rename(
    {"LCLid": "unique_id", "energy_consumption": "y"}
)

data.head(5)

## 3. Select a Single Time Series

For demonstration, we focus on a single smart meter (e.g., `MAC000193`). This allows for clear visualization and comparison of model forecasts on a single, interpretable series.

In [None]:
# Define column names for modeling
id_ = "unique_id"
time_ = "ds"
target_ = "y"
id_col = pl.col(id_)
time_col = pl.col(time_)
target_col = pl.col(target_)

# Filter for a single block and select relevant columns
data = (
    data.filter(pl.col("file").eq("block_7"))
    .select(
        [
            time_,
            id_,
            target_,
            "Acorn",
            "Acorn_grouped",
            "holidays",
            "visibility",
            "windBearing",
            "temperature",
            "dewPoint",
            "pressure",
            "apparentTemperature",
            "windSpeed",
            "precipType",
            "icon",
            "humidity",
            "summary",
        ]
    )
    .explode(
        [
            time_,
            target_,
            "holidays",
            "visibility",
            "windBearing",
            "temperature",
            "dewPoint",
            "pressure",
            "apparentTemperature",
            "windSpeed",
            "precipType",
            "icon",
            "humidity",
            "summary",
        ]
    )
)

# Select a single smart meter for demonstration
selected_id = "MAC000193"
data = data.filter(pl.col(id_).eq(selected_id))
data.head()

## 4. Naive Forecast Model

The **Naive forecast** predicts the next value as the last observed value in the series:

$$
\hat{y}_{t+1} = y_t
$$

This model is effective for random walk or highly persistent series, but it does not account for seasonality or trend. For electricity load data, which typically exhibits strong seasonality, the naive model often underperforms.

In [None]:
# Set up StatsForecast with Naive model only
fcst_naive = StatsForecast(
    models=[Naive()],
    freq="30m",
)

# Forecast using the Naive model
y_hat_naive = fcst_naive.cross_validation(
    df=data.select([id_, time_, target_col.forward_fill()]),
    fitted=True,
    h=48,
    n_windows=1,
    step_size=48,
).drop("cutoff")

In [None]:
# Plot actual vs. all baseline forecasts
plot_series(
    data,
    y_hat_naive,
    max_insample_length=200,
    width=1400,
    title="Actual vs. Baseline Forecast for Electricity Load",
)

## 5. Historic Average Forecast Model

The **Historic Average** model predicts the next value as the mean of all observed values up to the current time:

$$
\hat{y}_{t+1} = \frac{1}{t} \sum_{i=1}^{t} y_i
$$

This approach is suitable for stationary series without strong trends or seasonality. However, electricity load data often has pronounced daily and weekly cycles, which this model cannot capture.

In [None]:
# Set up StatsForecast with Historic Average model only
fcst_mean = StatsForecast(
    models=[HistoricAverage()],
    freq="30m",
)

# Forecast using the Historic Average model
y_hat_mean = fcst_mean.cross_validation(
    df=data.select([id_, time_, target_col.forward_fill()]),
    fitted=True,
    h=48,
    n_windows=1,
    step_size=48,
).drop("cutoff")

In [None]:
# Plot actual vs. all baseline forecasts
plot_series(
    data,
    y_hat_mean,
    max_insample_length=200,
    width=1400,
    title="Actual vs. Baseline Forecast for Electricity Load",
)

## 6. Seasonal Naive Forecast Models (Daily & Weekly)

The **Seasonal Naive** model repeats the value from the same season in the previous cycle:

$$
\hat{y}_{t+1} = y_{t+1-s}
$$

where $s$ is the season length (e.g., $s=48$ for daily, $s=336$ for weekly seasonality with half-hourly data).

These models are well-suited for electricity load forecasting, as load patterns often repeat daily and weekly due to human activity and routines.

In [None]:
# Set up StatsForecast with daily and weekly Seasonal Naive models
fcst_seasonal = StatsForecast(
    models=[
        SeasonalNaive(season_length=48, alias="DailySeasonalNaive"),
        SeasonalNaive(season_length=48 * 7, alias="WeeklySeasonalNaive"),
    ],
    freq="30m",
)

# Forecast using the Seasonal Naive models
y_hat_seasonal = fcst_seasonal.cross_validation(
    df=data.select([id_, time_, target_col.forward_fill()]),
    fitted=True,
    h=48,
    n_windows=1,
    step_size=48,
).drop("cutoff")

In [None]:
# Plot actual vs. all baseline forecasts
plot_series(
    data,
    y_hat_seasonal,
    max_insample_length=200,
    width=1400,
    title="Actual vs. Baseline Forecast for Electricity Load",
)

## 7. Drift Forecast Model

The **Drift** model extends the naive forecast by adding a linear trend (drift) estimated from the historical average change:

$$
\hat{y}_{t+1} = y_t + \hat{d}
$$

where

$$
\hat{d} = \frac{y_t - y_1}{t-1}
$$

#### Derivation

Suppose we have a time series $\{y_1, y_2, \ldots, y_t\}$.

The random walk with drift model is:
$$
y_{k+1} = y_k + d
$$
where $d$ is the drift (constant increment), and $\varepsilon_{k+1}$ is a noise term.

If we recursively expand this from $y_1$:
$$
y_2 = y_1 + d \\
y_3 = y_2 + d = y_1 + 2d \\
\vdots \\
y_t = y_1 + (t-1)d + 
$$

This means
$$
d = \frac{y_t - y_1}{t-1}
$$

So the value at time $k$ is:
$$
y_k = y_1 + (k-1)\frac{y_t - y_1}{t-1}
$$

This is the equation of a straight line passing through $(1, y_1)$ and $(t, y_t)$, showing that the random walk with drift essentially fits a line between the first and last points of the series.

This means the forecast advances along a line defined by the starting and ending values, without considering any intermediate fluctuations or seasonality. It provides a simple way to capture a linear trend in the data.

Hence, this model is appropriate for series with a linear trend. However, electricity load data is typically dominated by seasonality rather than trend, so the drift model may not perform well.

In [None]:
# Set up StatsForecast with Random Walk with Drift model only
fcst_drift = StatsForecast(
    models=[RandomWalkWithDrift()],
    freq="30m",
)

# Forecast using the Drift model
y_hat_drift = fcst_drift.cross_validation(
    df=data.select([id_, time_, target_col.forward_fill()]),
    fitted=True,
    h=48,
    n_windows=1,
    step_size=48,
).drop("cutoff")

In [None]:
# Plot actual vs. all baseline forecasts
plot_series(
    data,
    y_hat_drift,
    max_insample_length=200,
    width=1400,
    title="Actual vs. Baseline Forecast for Electricity Load",
)

Electricity load demand data typically exhibits strong seasonality with no trend. As a result, simple baseline methods such as the naive forecast, historic average, and drift are not well-suited for this type of data

In contrast, models that explicitly account for seasonality, such as the daily and weekly seasonal naive methods, perform significantly better. Among these, the weekly seasonal naive model tends to track the actual time series more closely, as it leverages the repeated weekly consumption patterns inherent in electricity demand data. This highlights the importance of incorporating seasonality into forecasting models for load demand.

## 8. Evaluate Forecast Accuracy with Metrics

To quantitatively compare model performance, we compute standard forecast accuracy metrics:

- **MAE** (Mean Absolute Error):  
    $$
    \mathrm{MAE} = \frac{1}{n} \sum_{t=1}^n |y_t - \hat{y}_t|
    $$
    *Pros*: Simple to interpret, not sensitive to outliers.  
    *Cons*: Does not penalize large errors more than small ones.  
    *Suitability*: Good for electricity load data, especially when all errors are equally important.

- **MSE** (Mean Squared Error):  
    $$
    \mathrm{MSE} = \frac{1}{n} \sum_{t=1}^n (y_t - \hat{y}_t)^2
    $$
    *Pros*: Penalizes larger errors more heavily, useful for highlighting large deviations.  
    *Cons*: Sensitive to outliers, units are squared.  
    *Suitability*: Useful for electricity load forecasting when large errors are particularly undesirable.

- **RMSE** (Root Mean Squared Error):  
    $$
    \mathrm{RMSE} = \sqrt{\frac{1}{n} \sum_{t=1}^n (y_t - \hat{y}_t)^2}
    $$
    *Pros*: Same units as the original data, interpretable.  
    *Cons*: Still sensitive to outliers.  
    *Suitability*: Commonly used in load forecasting, especially when large errors are costly.

- **MAPE** (Mean Absolute Percentage Error):  
    $$
    \mathrm{MAPE} = \frac{100\%}{n} \sum_{t=1}^n \left| \frac{y_t - \hat{y}_t}{y_t} \right|
    $$
    *Pros*: Scale-independent, easy to interpret as a percentage.  
    *Cons*: Undefined or infinite when $y_t = 0$, can be biased when actual values are near zero.  
    *Suitability*: Useful for electricity load data with consistently positive values, but caution needed if zeros are present.

- **sMAPE** (Symmetric Mean Absolute Percentage Error):  
    $$
    \mathrm{sMAPE} = \frac{100\%}{n} \sum_{t=1}^n \frac{|y_t - \hat{y}_t|}{(|y_t| + |\hat{y}_t|)/2}
    $$
    *Pros*: Bounded between 0% and 200%, less sensitive to scale and zeros than MAPE.  
    *Cons*: Can still be unstable when both $y_t$ and $\hat{y}_t$ are near zero.  
    *Suitability*: Preferred over MAPE for electricity load forecasting, especially with low or zero values.

- **MASE** (Mean Absolute Scaled Error, with seasonality $s=48$):  
    $$
    \mathrm{MASE} = \frac{\frac{1}{n} \sum_{t=1}^n |y_t - \hat{y}_t|}{\frac{1}{n-s} \sum_{t=s+1}^n |y_t - y_{t-s}|}
    $$
    *Pros*: Scale-free, interpretable relative to a seasonal naive forecast, robust to scale and seasonality.  
    *Cons*: Requires enough data to compute seasonal differences.  
    *Suitability*: Highly recommended for electricity load data, as it benchmarks models against a simple seasonal naive baseline (e.g., daily seasonality for half-hourly data).

These metrics provide different perspectives on forecast accuracy. For electricity load data, MASE (with daily seasonality) is especially informative, as it benchmarks models against a simple seasonal naive forecast.

In [None]:
# Define metrics
y_hat_all = pl.concat(
    [y_hat_mean, y_hat_naive, y_hat_seasonal, y_hat_drift], how="align"
)
metrics = [
    mae,
    mse,
    rmse,
    mape,
    smape,
    partial(mase, seasonality=48),
]

# Evaluate all baseline forecasts
evaluate(
    y_hat_all,
    metrics=metrics,
    train_df=data.select([id_, time_, target_]),
)

The evaluation metrics confirm the visual findings from the plots: the naive, mean (historic average), and drift models yield relatively high error values, indicating poor performance on this dataset. This is expected, as these models do not account for the strong seasonality present in electricity load demand.

In contrast, the weekly seasonal naive model achieves significantly lower errors across all metrics

## 10. Residual Analysis for Baseline Models

Residual analysis is a crucial step in time series modeling, as it helps assess whether the chosen model adequately captures the underlying patterns in the data. By examining the residuals—the differences between observed values and model predictions—we can check for remaining structure, autocorrelation, or non-randomness.

Ideally, residuals should resemble white noise: they should be randomly distributed with constant variance and no discernible patterns over time. If residuals display autocorrelation, seasonality, or changing variance, this indicates that the model has not fully captured important aspects of the data, and further refinement or alternative modeling approaches may be necessary.

Here, we analyze the residuals (forecast errors) of the best-performing baseline model (typically the Weekly Seasonal Naive for electricity load).

In [None]:
y_hat = fcst_seasonal.forecast(
    df=data.select([id_, time_, target_col.forward_fill()]),
    h=48,
    fitted=True,
)

In [None]:
fitted_residuals = get_fitted_residuals(fcst_seasonal).drop_nans()

In [None]:
# Get fitted residuals for the Weekly Seasonal Naive model
fitted_residuals = get_fitted_residuals(fcst_seasonal).drop_nans()
model = "WeeklySeasonalNaive"

residuals = fitted_residuals.get_column(model).drop_nulls().to_numpy()
time = fitted_residuals.get_column(time_).drop_nulls().to_numpy()

# Plot residual diagnostics
plot_residuals_diagnostic(time=time, residuals=residuals)

The residual plot reveals significant autocorrelation, indicating that the residuals are not white noise. This suggests that the model has not fully captured all the underlying patterns or dependencies in the data

## 11. Ljung-Box Test for Residual Autocorrelation
The Ljung-Box test is a statistical test used to check whether any group of autocorrelations of a time series are significantly different from zero. In other words, it tests whether the residuals (errors) from a time series model are independently distributed (i.e., exhibit no autocorrelation).

### Purpose

- **Null hypothesis ($H_0$):** The data are independently distributed (no autocorrelation up to lag $h$).
- **Alternative hypothesis ($H_1$):** The data are not independently distributed (there is autocorrelation at one or more lags up to $h$).

### Test Statistic

The Ljung-Box test statistic is calculated as:

$$
Q = n(n+2) \sum_{k=1}^h \frac{\hat{\rho}_k^2}{n-k}
$$

where:

- $n$ = number of observations
- $h$ = number of lags being tested
- $\hat{\rho}_k$ = sample autocorrelation at lag $k$

### Distribution

- Under the null hypothesis, $Q$ approximately follows a chi-squared distribution with $h$ degrees of freedom:
    $$
    Q \sim \chi^2_h
    $$

### Steps

1. **Compute residuals** from your time series model.
2. **Calculate sample autocorrelations** $\hat{\rho}_k$ for lags $k = 1, 2, ..., h$.
3. **Compute $Q$** using the formula above.
4. **Compare $Q$** to the critical value from the chi-squared distribution with $h$ degrees of freedom, or compute the p-value.
5. **Decision:**  
     - If the p-value is small (e.g., $< 0.05$), reject $H_0$ (there is significant autocorrelation).
     - If the p-value is large, do not reject $H_0$ (no evidence of autocorrelation).

### Interpretation

- **Low p-value:** Residuals are autocorrelated; the model may be inadequate.
- **High p-value:** No evidence of autocorrelation; residuals resemble white noise.

### Summary Table

| Step | Description |
|------|-------------|
| 1    | Compute residuals from model |
| 2    | Calculate autocorrelations up to lag $h$ |
| 3    | Compute $Q$ statistic |
| 4    | Compare $Q$ to $\chi^2_h$ distribution |
| 5    | Interpret p-value |

The Ljung-Box test is widely used for model diagnostics in time series analysis to ensure that the model has captured all temporal dependencies.

In [None]:
# Apply Ljung-Box test to residuals
resid_test = acorr_ljungbox(residuals, boxpierce=True)
resid_test

The Ljung-Box test result shows a p-value of 0, which means we reject the null hypothesis of no autocorrelation in the residuals. This indicates that significant autocorrelation remains, which is inline with what we observe in the ACF plot above