# M4 Forecasting Competition with `sktime` Catalogues

This notebook demonstrates how to reproduce the classical benchmarks from the M4 forecasting competition using `sktime`’s catalogue and benchmarking framework.

Instead of manually wiring datasets, forecasters, and metrics, `sktime` provides catalogues: reusable, declarative collections of benchmarking components.

For the M4 competition, each temporal granularity (hourly, daily, weekly, monthly, quarterly, yearly) is represented by its own catalogue.
In this notebook, we show the full end-to-end workflow using one granularity. The same pattern applies to all other M4 catalogues.

## Imports

In [12]:
from sktime.benchmarking.forecasting import ForecastingBenchmark
from sktime.catalogues import M4CompetitionCatalogueMonthly

## What is an M4 catalogue?

An M4 catalogue is a self-contained, declarative specification of a benchmark setup.

Each M4 catalogue:

* Binds a single M4 dataset (one temporal granularity)

* Defines a shared set of classical forecasters

* Defines any forecaster which accepts dataset specific parameters like `sp`

* Uses Overall Weighted Average (OWA) metric

* Is fully compatible with sktime’s benchmarking framework

## Creating the benchmark

We first create a `ForecastingBenchmark` object. This object collects datasets, forecasters, metrics, and evaluation logic, and coordinates execution.

In [13]:
benchmark = ForecastingBenchmark()

## Adding the M4 monthly catalogue

We now add the monthly M4 catalogue to the benchmark.

In [14]:
catalogue = M4CompetitionCatalogueMonthly()

A catalogue can be queried for the items it contains. Passing `as_object=True` returns the instantiated objects.

In [15]:
forecasters = catalogue.get(object_type="forecaster", as_object=True)
forecasters

[('Naive_1', NaiveForecaster()),
 ('SES', ExponentialSmoothing()),
 ('Holt', ExponentialSmoothing(trend='add')),
 ('Damped', ExponentialSmoothing(damped_trend=True, trend='add')),
 ('Theta', ThetaForecaster()),
 ('AutoARIMA', AutoARIMA()),
 ('AutoETS', AutoETS()),
 ('Comb',
  EnsembleForecaster(forecasters=[('ses', ExponentialSmoothing()),
                                  ('holt', ExponentialSmoothing(trend='add')),
                                  ('damped',
                                   ExponentialSmoothing(damped_trend=True,
                                                        trend='add'))])),
 ('Naive_S', NaiveForecaster(sp=12))]

In [16]:
dataset_loaders = catalogue.get(object_type="dataset", as_object=True)
dataset_loaders

[ForecastingData(name='m4_monthly_dataset')]

In [17]:
scorers = catalogue.get(object_type="metric", as_object=True)
scorers

[OverallWeightedAverage(sp=12),
 MeanAbsolutePercentageError(symmetric=True),
 MeanAbsoluteScaledError()]

M4 used a fixed-origin holdout evaluation strategy (single train/test split per series) with frequency-dependent forecast horizons, so there is intentionally no splitting strategy in the catalogue.
For this example, we will use `ExpandingWindowSplitter` with horizon = 18 (h = 18 was used for Monthly forecasts in M4).

In [18]:
from sktime.split import ExpandingWindowSplitter

cv_splitter = ExpandingWindowSplitter(
    initial_window=24,
    step_length=24,
    fh=list(range(1, 18 + 1)),
)

`ForecastingBenchmark` evaluates the Cartesian product of all added components.
By adding a single catalogue, we ensure that datasets, forecasters, and metrics remain correctly aligned.

In [19]:
benchmark.add(catalogue)
benchmark.add(cv_splitter)

## Running the benchmark

Once the catalogue is added, the benchmark can be executed end-to-end.

In [None]:
results = benchmark.run()

In [None]:
results.T

The resulting object contains the evaluated forecasters in a standardized, analysis-ready format.

## Extending the benchmark with a new forecaster

Catalogues define baseline benchmark configurations, but you can extend an existing benchmark by adding new forecasters on top of a catalogue-defined setup.

This is useful when you want to compare your method against the classical M4 baselines while keeping datasets, metrics, and evaluation protocol unchanged.

### Adding a custom forecaster

Below, we add a new forecaster to the benchmark after the M4 monthly catalogue has been registered.

In [None]:
from sktime.forecasting.moirai_forecaster import MOIRAIForecaster

morai_forecaster = MOIRAIForecaster(checkpoint_path="sktime/moirai-1.0-R-small")
benchmark.add(morai_forecaster)
benchmark.run()

The benchmark will now evaluate:

* All forecasters defined in the M4 monthly catalogue

* Plus the newly added `MOIRAIForecaster`

* On the same dataset, metrics, and evaluation protocol

Just like forecasters, any task (dataset, metric, CV splitter) can be added to the benchmark object to extend the benchmark.

## Using a different M4 frequency

To benchmark a different temporal granularity, simply replace `M4CompetitionCatalogueMonthly` with another M4 catalogue, for example:

`M4CompetitionCatalogueHourly`

`M4CompetitionCatalogueDaily`

`M4CompetitionCatalogueWeekly`

`M4CompetitionCatalogueQuarterly`

`M4CompetitionCatalogueYearly`

No other code changes are required. Each catalogue encapsulates the correct dataset, seasonal period, and evaluation metric for its frequency.