In [1]:
%load_ext autoreload
%autoreload 2

# Exponential Smoothing Models for Electricity Forecasting

Welcome to this hands-on tutorial on exponential smoothing models for time series forecasting! In this notebook, we'll explore several classic models, fit each one separately, and discuss their suitability for forecasting electricity consumption. Each section is self-contained, with explanations, code, and interpretation.

This notebook is written for beginners, with the goal of making each concept clear and accessible. Let's get started!

## Data Preparation

We'll use a single household's half-hourly electricity consumption data. This keeps things simple and lets us focus on model behavior.

Feel free to swap in your own data or try with other households!

In [None]:
import polars as pl

data = pl.read_parquet(
    "data/london_smart_meters/preprocessed/london_smart_meters_merged_block_0-7.parquet"
)
timestamp = data.group_by("LCLid").agg(
    pl.datetime_range(
        start=pl.col("start_timestamp"),
        end=pl.col("start_timestamp").dt.offset_by(
            pl.format("{}m", pl.col("series_length").sub(1).mul(30))
        ),
        interval="30m",
    ).alias("ds"),
)
data = timestamp.join(data, on="LCLid", how="inner").rename(
    {"LCLid": "unique_id", "energy_consumption": "y"}
)
data = data.filter(pl.col("file").eq("block_7"))
id_ = "unique_id"
time_ = "ds"
target_ = "y"
selected_id = "MAC000193"
data = data.filter(pl.col(id_).eq(selected_id)).with_columns(
    pl.col(target_).forward_fill().backward_fill()
)
data = data.select([time_, id_, target_])
data.head()

# 1. Simple Exponential Smoothing (SES)

## What is SES?

**Simple Exponential Smoothing** is the most basic exponential smoothing model. It is designed for time series data with **no clear trend or seasonality**. SES produces forecasts by taking a weighted average of past observations, where the weights decrease exponentially for older data.

The forecast equation is:

$$
\hat{y}_{t+1} = \alpha y_t + (1-\alpha) \hat{y}_t
$$

- $y_t$ is the actual value at time $t$
- $\hat{y}_t$ is the forecast at time $t$
- $\alpha$ is the smoothing parameter $(0 < \alpha < 1)$

## Is SES suitable for electricity data?

Electricity consumption data almost always has **strong seasonality** (daily, weekly) and sometimes a trend. SES is **not suitable** for such data, but it's a good baseline and helps us see why more complex models are needed.

In [None]:
from statsforecast import StatsForecast
from statsforecast.models import AutoETS

fcst_ses = StatsForecast(
    models=[AutoETS(season_length=48 * 7, model="ANN", alias="SES")],
    freq="30m",
)
ses_fit = fcst_ses.fit(df=data)
ses_forecast = fcst_ses.predict(h=48)
ses_forecast.head()

## Interpretation

- SES will produce a flat forecast (no trend, no seasonality).
- For electricity data, this will **miss all daily and weekly patterns**.
- Use SES only as a baseline to see how much improvement you get from more advanced models.

# 2. Holt’s Linear Trend Method

## What is Holt’s Method?

**Holt's Linear Trend** method extends SES by adding a trend component. It is suitable for data with a trend but **no seasonality**.

The equations are:

$$
\begin{align*}
\ell_t &= \alpha y_t + (1-\alpha)(\ell_{t-1} + b_{t-1}) \\
b_t &= \beta (\ell_t - \ell_{t-1}) + (1-\beta) b_{t-1} \\
\hat{y}_{t+h} &= \ell_t + h b_t
\end{align*}
$$

- $\ell_t$ is the level at time $t$
- $b_t$ is the trend (slope) at time $t$
- $\alpha, \beta$ are smoothing parameters

## Is Holt’s method suitable for electricity data?

It can capture a trend, but **cannot model seasonality**. For electricity data, it will still miss the daily/weekly cycles.

In [None]:
fcst_holt = StatsForecast(
    models=[AutoETS(season_length=48 * 7, model="AAN", alias="Holt")],
    freq="30m",
)
holt_fit = fcst_holt.fit(df=data)
holt_forecast = fcst_holt.predict(h=48)
holt_forecast.head()

## Interpretation

- Holt’s method will forecast a straight line (with trend), but **no repeating cycles**.
- For electricity, this is still not enough, but it’s a step up from SES.

# 3. Damped Holt’s Method

## What is Damped Trend?

The **damped trend** model is a modification of Holt’s method. It adds a damping parameter $\phi$ $(0 < \phi < 1)$ that shrinks the trend as the forecast horizon increases, preventing unrealistic long-term growth or decline.

The forecast equation is:

$$
\hat{y}_{t+h} = \ell_t + \left(\sum_{j=1}^h \phi^j\right) b_t
$$

## Is Damped Holt’s method suitable for electricity data?

Still **no seasonality**, so it’s not ideal for electricity, but it’s more realistic for long-term forecasts than plain Holt.

In [None]:
fcst_damped = StatsForecast(
    models=[
        AutoETS(
            season_length=48 * 7, model="AAN", damped=True, phi=0.9, alias="Damped Holt"
        )
    ],
    freq="30m",
)
damped_fit = fcst_damped.fit(df=data)
damped_forecast = fcst_damped.predict(h=48)
damped_forecast.head()

## Interpretation

- The trend will flatten out as you forecast further into the future.
- Still, **no seasonality** is captured, so daily/weekly cycles are missed.

# 4. Holt-Winters Additive Seasonality (ETS(A,A,A))

## What is Holt-Winters Additive?

The **Holt-Winters** method (also called ETS(A,A,A)) adds a seasonal component to Holt’s trend model. This is the classic model for data with both trend and **additive seasonality** (where seasonal effects are roughly constant in size).

The equations are:

$$
\begin{align*}
\ell_t &= \alpha (y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1}) \\
b_t &= \beta (\ell_t - \ell_{t-1}) + (1-\beta) b_{t-1} \\
s_t &= \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma) s_{t-m} \\
\hat{y}_{t+h} &= \ell_t + h b_t + s_{t-m+h^*}
\end{align*}
$$

- $s_t$ is the seasonal component
- $m$ is the season length (here, $m=48$ for daily, $m=336$ for weekly)
- $\gamma$ is the seasonal smoothing parameter

## Is Holt-Winters suitable for electricity data?

**Yes!** This model can capture both the trend and the strong daily/weekly cycles in electricity consumption.

In [None]:
fcst_hw = StatsForecast(
    models=[AutoETS(season_length=48 * 7, model="AAA", alias="AdditiveHW")],
    freq="30m",
)
hw_fit = fcst_hw.fit(df=data)
hw_forecast = fcst_hw.predict(h=48)
hw_forecast.head()

## Interpretation

- This model will capture both the overall trend and the repeating daily/weekly cycles.
- For most electricity data, this is a **strong baseline**.

# 5. Damped Holt-Winters Additive Seasonality

## What is Damped Holt-Winters?

This model adds a damping parameter to the trend in the Holt-Winters model. It is useful when you expect the trend to eventually flatten out, which is often realistic for long-term electricity forecasts.

## Is Damped Holt-Winters suitable for electricity data?

**Yes!** This is often the best choice for electricity forecasting, especially when you want to avoid unrealistic long-term growth or decline.

In [None]:
fcst_hw_damped = StatsForecast(
    models=[
        AutoETS(
            season_length=48 * 7, model="AAA", damped=True, alias="DampedAdditiveHW"
        )
    ],
    freq="30m",
)
hw_damped_fit = fcst_hw_damped.fit(df=data)
hw_damped_forecast = fcst_hw_damped.predict(h=48)
hw_damped_forecast.head()

## Interpretation

- This model captures trend, seasonality, and damps the trend for long-term forecasts.
- For electricity data, this is often the **most robust exponential smoothing model**.

# Summary Table: Model Suitability for Electricity Forecasting

| Model                        | Trend | Seasonality | Damped | Suitable for Electricity? |
|------------------------------|-------|-------------|--------|--------------------------|
| Simple Exponential Smoothing |   ✗   |      ✗      |   ✗    |            ✗             |
| Holt’s Linear Trend          |   ✓   |      ✗      |   ✗    |            ✗             |
| Damped Holt                  |   ✓   |      ✗      |   ✓    |            ✗             |
| Holt-Winters Additive        |   ✓   |      ✓      |   ✗    |            ✓             |
| Damped Holt-Winters Additive |   ✓   |      ✓      |   ✓    |            ✓             |

For electricity consumption, always use a model with **seasonality** (and usually with a damped trend for long-term forecasts).

# Understanding ETS (Error-Trend-Seasonal) State-Space Models

So far, we have explored models that capture trend and seasonality using various smoothing techniques. All of these approaches belong to the broader family of ETS (Error-Trend-Seasonal) models. In this section, we will dive deeper into the structure and components of ETS models to better understand how they work and when to use each type.

## What is ETS?

ETS stands for **Error, Trend, and Seasonal**. ETS models are a family of exponential smoothing models that can capture different combinations of:
- **Error**: How the random noise enters the model (Additive or Multiplicative)
- **Trend**: The long-term direction of the series (None, Additive, Additive Damped, Multiplicative, Multiplicative Damped)
- **Seasonality**: Repeating patterns (None, Additive, Multiplicative)

Each ETS model is specified by a three-letter code, e.g. `ETS(A,A,N)`:
- The first letter: Error type (`A` for Additive, `M` for Multiplicative)
- The second letter: Trend type (`N` for None, `A` for Additive, `Ad` for Additive Damped, `M` for Multiplicative, `Md` for Multiplicative Damped)
- The third letter: Seasonality type (`N` for None, `A` for Additive, `M` for Multiplicative)

## The Components

### 1. Error Type

- **Additive (A):** The error is added to the model. Suitable when the variability of the series does not depend on the level.
- **Multiplicative (M):** The error is multiplied by the level. Suitable when the variability increases as the level increases.

### 2. Trend Type

- **None (N):** No trend.
- **Additive (A):** Linear trend, added to the level.
- **Additive Damped (Ad):** Linear trend, but the effect of the trend decreases over time (damped).
- **Multiplicative (M):** Trend is proportional to the level.
- **Multiplicative Damped (Md):** Like multiplicative, but damped.

### 3. Seasonality Type

- **None (N):** No seasonality.
- **Additive (A):** Seasonal effect is added to the level.
- **Multiplicative (M):** Seasonal effect is multiplied by the level.

## Mathematical Formulation

Let’s use the notation:
- $y_t$: observed value at time $t$
- $\ell_t$: level at time $t$
- $b_t$: trend at time $t$
- $s_t$: seasonal component at time $t$
- $m$: season length
- $\alpha, \beta, \gamma$: smoothing parameters
- $\phi$: damping parameter (if used)
- $\varepsilon_t$: error at time $t$

### Example: ETS(A,A,A) — Additive Error, Additive Trend, Additive Seasonality

$$
\begin{align*}
y_t &= \ell_{t-1} + b_{t-1} + s_{t-m} + \varepsilon_t \\\\
\ell_t &= \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t \\\\
b_t &= b_{t-1} + \beta \varepsilon_t \\\\
s_t &= s_{t-m} + \gamma \varepsilon_t
\end{align*}
$$

### Example: ETS(M,Ad,M) — Multiplicative Error, Additive Damped Trend, Multiplicative Seasonality

$$
\begin{align*}
y_t &= (\ell_{t-1} + \phi b_{t-1}) s_{t-m} (1 + \varepsilon_t) \\\\
\ell_t &= \ell_{t-1} + \phi b_{t-1} + \alpha (\ell_{t-1} + \phi b_{t-1}) \varepsilon_t \\\\
b_t &= \phi b_{t-1} + \beta (\ell_{t-1} + \phi b_{t-1}) \varepsilon_t \\\\
s_t &= s_{t-m} + \gamma s_{t-m} \varepsilon_t
\end{align*}
$$

## All Possible ETS Models

There are 30 possible combinations, but the most common are:

| Model Code | Trend         | Seasonality   | Error         | Suitable for Electricity? |
|------------|--------------|--------------|---------------|--------------------------|
| ETS(A,N,N) | None         | None         | Additive      | ✗                        |
| ETS(A,A,N) | Additive     | None         | Additive      | ✗                        |
| ETS(A,Ad,N)| Additive Damp| None         | Additive      | ✗                        |
| ETS(A,N,A) | None         | Additive     | Additive      | ✓ (if no trend)          |
| ETS(A,A,A) | Additive     | Additive     | Additive      | ✓                        |
| ETS(A,Ad,A)| Additive Damp| Additive     | Additive      | ✓                        |
| ETS(M,N,N) | None         | None         | Multiplicative| ✗                        |
| ETS(M,A,N) | Additive     | None         | Multiplicative| ✗                        |
| ETS(M,Ad,N)| Additive Damp| None         | Multiplicative| ✗                        |
| ETS(M,N,M) | None         | Multiplicative| Multiplicative| ✓ (if multiplicative seasonality) |
| ETS(M,A,M) | Additive     | Multiplicative| Multiplicative| ✓                        |
| ETS(M,Ad,M)| Additive Damp| Multiplicative| Multiplicative| ✓                        |

- **Additive models** are preferred when seasonal/trend effects are roughly constant in size.
- **Multiplicative models** are preferred when effects grow/shrink with the level (e.g., higher consumption means bigger seasonal swings).

## Damped Trend

- **Damping** (parameter $\phi$) is used to prevent the trend from growing indefinitely. This is often realistic for electricity data, where long-term growth is limited.

## Choosing the Right Model

- **No trend, no seasonality:** ETS(A,N,N) (rare for electricity)
- **Trend, no seasonality:** ETS(A,A,N) or ETS(A,Ad,N)
- **Seasonality, no trend:** ETS(A,N,A) or ETS(M,N,M)
- **Trend and seasonality:** ETS(A,A,A), ETS(A,Ad,A), ETS(M,A,M), ETS(M,Ad,M)
- **Damped trend:** Use Ad or Md for the trend component

**For electricity consumption:**  
- Use a model with **seasonality** (A or M in the third position).
- Use a **damped trend** for long-term forecasts.
- Additive models are usually sufficient, but check if the seasonal effect grows with the level (then use multiplicative).

## Summary Table

| Model Code | Trend         | Seasonality   | Damped | Suitable for Electricity? |
|------------|--------------|--------------|--------|--------------------------|
| ETS(A,N,N) | None         | None         | No     | ✗                        |
| ETS(A,A,N) | Additive     | None         | No     | ✗                        |
| ETS(A,Ad,N)| Additive     | None         | Yes    | ✗                        |
| ETS(A,N,A) | None         | Additive     | No     | ✓ (if no trend)          |
| ETS(A,A,A) | Additive     | Additive     | No     | ✓                        |
| ETS(A,Ad,A)| Additive     | Additive     | Yes    | ✓                        |
| ETS(M,A,M) | Additive     | Multiplicative| No    | ✓                        |
| ETS(M,Ad,M)| Additive     | Multiplicative| Yes   | ✓                        |

---

**References:**
- [Forecasting: Principles and Practice, Hyndman & Athanasopoulos, Chapter 8](https://otexts.com/fpp3/ets.html)
- [ETS Model Wikipedia](https://en.wikipedia.org/wiki/Exponential_smoothing#State_space_model)
