# Exponential Smoothing Model (ETS)

The **ETS** (Error, Trend, Seasonality) framework is a powerful and flexible approach to exponential smoothing for time series forecasting. ETS models are characterized by three key components: the type of error (additive or multiplicative), the type of trend (none, additive, multiplicative, or damped), and the type of seasonality (none, additive, or multiplicative). This systematic framework provides a unified approach to exponential smoothing methods, encompassing classic techniques like simple exponential smoothing, Holt's linear method, and Holt-Winters seasonal methods.

Unlike traditional models that rely on rigid differencing transformations, ETS uses a state-space formulation where the level, trend, and seasonal components are recursively updated at each time step. This makes ETS models highly interpretable while maintaining flexibility to handle various patterns in time series data.

**The ETS Framework**

ETS models maintain internal state variables that evolve over time through smoothing equations:

* **Level ($\ell_t$)**: The baseline value of the series at time $t$.
* **Trend ($b_t$)**: The rate of change or growth pattern.
* **Seasonal ($s_t$)**: Repeating patterns with a fixed period $m$.

Each component can be modeled as additive or multiplicative, resulting in different model behaviors.

**Error, Trend, and Seasonality Components**

The model specification uses three-letter notation (e.g., "AAN", "MAM"):

**First Letter - Error Type:**
* **A (Additive)**: Errors are independent of the series level
* **M (Multiplicative)**: Errors scale proportionally with the series level

**Second Letter - Trend Type:**
* **N (None)**: No trend component
* **A (Additive)**: Linear trend
* **M (Multiplicative)**: Exponential growth trend
* **Add damping**: Use `damped=True` to dampen the trend over time

**Third Letter - Seasonal Type:**
* **N (None)**: No seasonal pattern
* **A (Additive)**: Constant seasonal effect
* **M (Multiplicative)**: Seasonal effect proportional to level


**ETS vs. ARIMA**

While both methods aim to predict future values based on history, they approach the problem from fundamentally different angles.

| Feature | ARIMA (Auto-Regressive Integrated Moving Average) | ETS (Error, Trend, Seasonality) |
| :--- | :--- | :--- |
| **Approach** | **Differencing + ARMA.** Achieves stationarity through differencing, then fits AR and MA terms. | **State-Space Smoothing.** Recursively updates level, trend, and seasonal states with exponential smoothing. |
| **Model Form** | **Linear combination of past values and errors** (after differencing). | **Explicit state equations** for level, trend, and seasonality with additive or multiplicative structure. |
| **Automation** | **Semi-Manual.** Requires order selection ($p,d,q$) though auto.ARIMA algorithms exist. | **Fully Automated.** Model selection ('ZZZ') systematically searches over all valid ETS models. |


## ETS model theory

ETS models use a state-space framework with two core equations: an observation equation relating observations to states, and state transition equations describing how states evolve.

### Additive Error State-Space Form

For additive error models, the state-space representation is:

**Observation equation:**
$$Y_t = H x_{t-1} + \varepsilon_t$$

**State equation:**
$$x_t = F x_{t-1} + G \varepsilon_t$$

where $\varepsilon_t \sim WN(0, \sigma^2)$ is white noise, $x_t$ is the state vector containing level, trend, and seasonal components, and $H$, $F$, $G$ are system matrices that depend on the specific ETS model.

**Forecast mean and variance at horizon $h$:**

$$\mu_n(h) = H F^{h-1} x_n$$

$$v_n(h) = \sigma^2 \left(1 + \sum_{j=1}^{h-1} (H F^{j-1} G)^2\right)$$

### Simple Exponential Smoothing (ANN)

For series with no trend or seasonality:

**Additive Error Form:**
$$Y_t = \ell_{t-1} + \varepsilon_t$$
$$\ell_t = \ell_{t-1} + \alpha \varepsilon_t$$

where $\ell_t$ is the level at time $t$, $\alpha \in (0,1)$ is the smoothing parameter, and $\varepsilon_t \sim WN(0, \sigma^2)$.

**Component form:**
$$\ell_t = \alpha Y_t + (1-\alpha) \ell_{t-1}$$

**Forecast function:**
$$\hat{Y}_{n+h|n} = \ell_n \text{ for all } h \geq 1$$

**Prediction variance:**
$$\text{Var}[\hat{Y}_{n+h|n}] = \sigma^2 h$$

**Multiplicative Error Form (MNN):**
$$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$
$$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$

Point forecasts are identical to the additive form, but prediction intervals scale with the level.

### Holt's Linear Trend Method (AAN)

For series with additive trend:

**State-space form:**
$$Y_t = \ell_{t-1} + b_{t-1} + \varepsilon_t$$
$$\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})$$
$$b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta) b_{t-1}$$

where $\ell_t$ is the level, $b_t$ is the trend, $\alpha, \beta \in (0,1)$ are smoothing parameters, and $\varepsilon_t \sim WN(0, \sigma^2)$.

**Component form:**
* Level: $\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})$
* Trend: $b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta) b_{t-1}$

**Forecast function:**
$$\hat{Y}_{n+h|n} = \ell_n + h \cdot b_n$$

### Damped Trend

**State-space form:**
$$Y_t = \ell_{t-1} + \phi b_{t-1} + \varepsilon_t$$
$$\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})$$
$$b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta) \phi b_{t-1}$$

where $\phi \in (0,1]$ is the damping parameter.

**Forecast function:**
$$\hat{Y}_{n+h|n} = \ell_n + (\phi + \phi^2 + \cdots + \phi^h) b_n = \ell_n + \phi \frac{1-\phi^h}{1-\phi} b_n$$

The damping parameter controls how quickly the trend dampens:
* $\phi = 1$: Standard Holt (no damping)
* $\phi < 1$: Damped trend (trend flattens out in forecasts)

Advantages of damped trend:
* More realistic long-term forecasts
* Prevents unbounded linear extrapolation
* Often improves forecast accuracy for horizons $h > 10$

### Holt-Winters Seasonal Methods

**Additive Seasonality (AAA):**

$$Y_t = (\ell_{t-1} + b_{t-1}) + s_{t-m} + \varepsilon_t$$
$$\ell_t = \alpha(Y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})$$
$$b_t = \beta(\ell_t - \ell_{t-1}) + (1-\beta) b_{t-1}$$
$$s_t = \gamma(Y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma) s_{t-m}$$

where $m$ is the seasonal period.

**Forecast function:**
$$\hat{Y}_{n+h|n} = \ell_n + h b_n + s_{n+h-m(k+1)}$$

where $k = \lfloor (h-1)/m \rfloor$.

**Multiplicative Seasonality (MAM):**

$$Y_t = (\ell_{t-1} + b_{t-1}) s_{t-m} (1 + \varepsilon_t)$$
$$\ell_t = (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t)$$
$$b_t = b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t$$
$$s_t = s_{t-m}(1 + \gamma \varepsilon_t)$$

**Forecast function:**
$$\hat{Y}_{n+h|n} = (\ell_n + h b_n) s_{n+h-m(k+1)}$$

### Multiplicative Error Form

For multiplicative error models:

**Observation:**
$$Y_t = \hat{Y}_t(1 + \varepsilon_t)$$

where $\varepsilon_t \sim WN(0, \sigma^2)$.

**Key property:** Point forecasts are the same as additive-error models, but prediction intervals scale with the level.

**Examples:**

* **MNN** (no trend, no seasonality):
$$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$
$$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$

* **MAN** (additive trend):
$$Y_t = (\ell_{t-1} + b_{t-1})(1 + \varepsilon_t)$$
$$\ell_t = (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t)$$
$$b_t = b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t$$

### Admissible Parameter Space

For stability and forecastability, ETS models have admissible parameter regions:

**ANN / MNN:**
$$0 < \alpha < 2$$

**AAN / MAN:**
$$0 < \alpha < 2, \quad 0 < \beta < 4 - 2\alpha$$

**ADN (damped additive trend):**
$$0 < \phi \leq 1, \quad 1 - \frac{1}{\phi} < \alpha < 1 + \frac{1}{\phi}$$
$$\alpha(\phi - 1) < \beta < (1 + \phi)(2 - \alpha)$$

In practice, $\alpha, \beta, \gamma$ are typically constrained to $(0,1)$ for conventional exponential smoothing behavior.

Admissible regions do not depend on whether errors are additive or multiplicative.

### Model Selection

ETS models are typically estimated by maximizing the likelihood function. For model selection, information criteria are used:

* **AIC** (Akaike Information Criterion): $\text{AIC} = -2\log L + 2k$
* **AICc** (Corrected AIC): $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$ (recommended for small samples)
* **BIC** (Bayesian Information Criterion): $\text{BIC} = -2\log L + k\log n$

where $k$ is the number of parameters and $n$ is the number of observations.

The log-likelihood depends on the error type:

**Additive errors:**
$$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right)$$

**Multiplicative errors:**
$$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right) - \sum_{t=1}^{n}\log|\hat{y}_t|$$

Ref: Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D. (2008) *Forecasting with exponential smoothing: the state space approach*, Springer-Verlag: New York. [exponentialsmoothing.net](http://www.exponentialsmoothing.net)

<div role="note"
     style="background: rgba(16,142,233,0.1); border-left: 4px solid #1890ff;
            border-radius: 4px; padding: 10px 14px; margin: 1em 0;">
  <p style="display:flex; align-items:center; font-size:1rem; color:#1890ff;
            margin:0 0 6px 0; font-weight:500;">
    <span style="margin-right:6px; font-size:18px;">ℹ️</span> Note
  </p>

  <p style="margin:0; color:inherit;">
  
  The Python implementation of the ETS algorithm in skforecast follows the state-space framework described in Hyndman et al. (2008) and is based on the Julia package <a href="https://taf-society.github.io/Durbyn.jl/dev/">Durbyn.jl</a> developed by Resul Akay.

</div>

## Libraries and data

In [None]:
# Libraries
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.stats import Ets
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme

In [None]:
# Download data
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=False)
data = data.loc[:'1990-01-01 00:00:00']
y = data['Gasolinas'].rename('y').rename_axis('date')
y

## ETS

**Skforecast** provides the class [`Ets`](https://skforecast.org/latest/api/stats#ets) to facilitate the implementation of ETS models in Python, allowing users to easily fit and forecast time series data using this approach.

In [None]:
# ETS model
# ==============================================================================
model = Ets()
model.fit(y)

Once the model is fitted, future observations can be forecasted using the `predict` and `predict_interval` methods.

In [None]:
# Prediction
# ==============================================================================
model.predict(steps=10)

In [None]:
# Prediction interval
# ==============================================================================
model.predict_interval(steps=10, level=[95])

## ForecasterStats

The previous section introduced the construction of ETS models. In order to seamlessly integrate these models with the various functionalities provided by **skforecast**, the next step is to encapsulate the skforecast [`Ets`](https://skforecast.org/latest/api/stats#ets) model within a [`ForecasterStats`](https://skforecast.org/latest/api/forecasterstats) object. This encapsulation harmonizes the intricacies of the model and allows for the coherent use of skforecast's extensive capabilities.

In [None]:
# Create and fit ForecasterStats
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets())
forecaster.fit(y=y)
forecaster

In [None]:
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()

## Prediction

In [None]:
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=10)
predictions.head(3)

In [None]:
# Predict intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=36, alpha=0.05)
predictions.head(3)

## Backtesting

ETS and other statistical models, once integrated in a [`ForecasterStats`](https://skforecast.org/latest/api/forecasterstats) object, can be evaluated using any of the [backtesting strategies](../introduction-forecasting/introduction-forecasting.html#backtesting-forecasting-models) implemented in skforecast.

In [None]:
# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
    initial_train_size = 150,
    steps              = 12,
    refit              = True,
)

metric, predictions = backtesting_stats(
    y               = y,
    forecaster      = forecaster,
    cv              = cv,
    interval        = [2.5, 97.5],
    metric          = 'mean_absolute_error',
    verbose         = False
)

In [None]:
# Backtest predictions
# ==============================================================================
predictions.head(4)

In [None]:
# Plot predictions
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(7, 4))
y.loc[predictions.index].plot(ax=ax, label='y')
predictions['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
        predictions.index,
        predictions['lower_bound'],
        predictions['upper_bound'],
        label='prediction interval',
        color='gray',
        alpha=0.6,
        zorder=1
    )
plt.legend()
plt.show()