# Module - Benchmarking
Ontime provides a `Benchmark` class that can be used to run a number of prediction models on a number of datasets.

In [1]:
from ontime.module.benchmarking import BenchmarkMode, BenchmarkDataset, BenchmarkMetric, Benchmark
import ontime as on

## Initialization
A `Benchmark` instance can be initialized with a list of datasets, models and metrics to run through. When invoking `run()`, it will train (if needed) and test every dataset on every model, and compute every metric on the predicted data.


### Preparing datasets

Datasets submitted to a `Benchmark` must be of type `TimeSeries`, wrapped into `BenchmarkDataset`. `BenchmarkDataset` allows to give datasets a name, give training and test splits, and define how data will be split to perform a rolling evaluation.

In [2]:
from ontime.module.datasets.dataset import Dataset
from darts.utils.missing_values import fill_missing_values # for filling missing values in the time series (for models that don't handle missing values)

datasets = [
    BenchmarkDataset(on.TimeSeries.from_darts(fill_missing_values(Dataset.TemperatureDataset.load())), input_length=96, gap=0, stride=96, horizon=24, name="Daily temperature"),
]

### Preparing models  

Models are wrapped according to the `AbstractBenchmarkModel` interface. Wrappers implementing this interface will instanciate the model for each dataset.  
In Ontime, some wrappers around darts models are provided. For specific models, whose wrappers are not provided, a custom wrapper can be written, implementing the AbstractBenchmarkModel interface.  
A `mode` should be provided to the wrapper constructor, which will define how the model is evaluated. In can be either:
- `ZERO_SHOT`: the model is not trained, and the evaluation is done on the test set. It is used for models that already has trained weights, available through checkpoints, or for some models from darts, where predictions are directly made using the fitted data as input.  
For some univariate models, a `multivariate` can be set so that they perform a multi-univariate predictions, and therefore can be compare against true multivariate models.
- `FULL_SHOT`: the model is trained on the entire given training set. Once trained, the model is evaluated using the learnt weights.

In [None]:
from ontime.core.time_series.time_series import TimeSeries
from ontime.module.benchmarking.darts_models import LocalDartsBenchmarkModel, GlobalDartsBenchmarkModel
from typing import Any
from darts.models import AutoARIMA, ExponentialSmoothing, TCNModel

models = [
    LocalDartsBenchmarkModel("AutoARIMA", model=AutoARIMA(start_p=8, max_p=12, start_q=1), mode=BenchmarkMode.ZERO_SHOT),
    LocalDartsBenchmarkModel("ExponentialSmoothingUnivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT),
    LocalDartsBenchmarkModel("ExponentialSmoothingMultivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT, multivariate=True),
    GlobalDartsBenchmarkModel("Temporal Convolutional Network", model=TCNModel(input_chunk_length=24, output_chunk_length=10), mode=BenchmarkMode.FULL_SHOT)
]

### Preparing metrics
Metrics must be given to the `BenchmarkMetric` constructor. If the function can't be invoked as is in `BenchmarkMetric`'s implementation, a child class can be written and submitted.

In [4]:
import darts

metrics = [
   BenchmarkMetric(name="RMSE", metric_function=darts.metrics.metrics.coefficient_of_variation),
   BenchmarkMetric(name="MAE", metric_function=darts.metrics.metrics.mae),
   BenchmarkMetric(name="sMAPE", metric_function=darts.metrics.metrics.smape),
]

## Creating and running a Benchmark

In [5]:
benchmark = Benchmark(datasets=datasets,
                      models=models, 
                      metrics=metrics)

Datasets, models and metrics can also be added after instanciation. This allows to name datasets.

In [7]:
benchmark.add_dataset(BenchmarkDataset(Dataset.ETTh1Dataset.load(), input_length=500, gap=0, stride=96, horizon=96, name = "ETTh1"))

Once the models and datasets have been added, the `run()` method will train instances of all the models on all the datasets individually and compute metrics. The verbose parameter will print the status and results of the process as it progresses, and the debug parameter will print error messages (warnings are printed anyways).

In [None]:
benchmark.run(verbose=False, debug=False)

## Visualizing results

The benchmark automatically stores measures and metrics computed during the run, available through class attributes.

### Measures and metrics
To view the results, you can call `get_report()` and print the returned value

In [15]:
print(benchmark.get_report())

Model Temporal Convolutional Network:
Supported univariate datasets: ✓
Supported multivariate datasets: ✓
Dataset Daily temperature:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
training time: 19.18831992149353
test set size: 732
testing time: 2.322927713394165
metrics: {'RMSE': 28.238215577600233, 'MAE': 2.175213281180056, 'sMAPE': 22.00669241283975}
Dataset ETTh1:
nb features: 7
target column: ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL', 'OT']
training set size: 11147
validation set size: 2788
training time: 81.35324382781982
test set size: 3485
testing time: 11.666565179824829
metrics: {'RMSE': 354.4952793598406, 'MAE': 2.8482374344394916, 'sMAPE': 79.5762334672784}
Dataset ETTh1:
nb features: 7
target column: ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL', 'OT']
training set size: 11147
validation set size: 2788
training time: 81.35324382781982
test set size: 3485
testing time: 11.666565179824829
metrics: {'RMSE'

You can also get results as dataframes by calling `get_report_df()`. The results are then returned as a dataframe with model names as columns, dataset names as main rows, and measure as sub rows.

In [16]:
df_1, df_2 = benchmark.get_report_df()
df_1

Unnamed: 0_level_0,Temporal Convolutional Network
Statistic,Unnamed: 1_level_1
supports univariate,✓
supports multivariate,✓


In [17]:
df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,Temporal Convolutional Network
Dataset,Metric,Unnamed: 2_level_1
Daily temperature,training time,19.18832
Daily temperature,testing time,2.322928
Daily temperature,RMSE,28.238216
Daily temperature,MAE,2.175213
Daily temperature,sMAPE,22.006692
ETTh1,training time,81.353244
ETTh1,testing time,11.666565
ETTh1,RMSE,354.495279
ETTh1,MAE,2.848237
ETTh1,sMAPE,79.576233


### Plotting

By default (argument `nb_predictions` of `benchmark.run()` method), the benchmark will generate a prediction for one random input sample of each dataset with each model. The predictions, along input and target series, are stored in a dictionnary and can be retrieved by calling `benchmark.get_predictions()`. The predictions can be plotted using the Ontime plotting module.

In [18]:
predictions = benchmark.get_predictions()

In [19]:
# currently, Ontime plotting module needs the time index to be named 'time'
def rename_index(ts, name='time'):
    df = ts.pd_dataframe()
    df.rename_axis(name, inplace=True)
    return TimeSeries.from_dataframe(df)

In [21]:
input = rename_index(predictions['inputs']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'input'})
target = rename_index(predictions['targets']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'target'})
prediction = rename_index(predictions['predictions']['Temporal Convolutional Network']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [22]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [None]:
prediction = rename_index(predictions['predictions']['ExponentialSmoothingUnivariate']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [None]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [None]:
input = rename_index(predictions['inputs']['ETTh1'][0][500:].univariate_component(0)).rename({'HUFL': 'input'})
target = rename_index(predictions['targets']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'target'})
prediction = rename_index(predictions['predictions']['ExponentialSmoothingMultivariate']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'prediction'})

In [None]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)