# Module - Benchmarking
Ontime provides a `Benchmark` class that can be used to run a number of prediction models on a number of datasets.

In [1]:
from ontime.module.benchmarking import BenchmarkMode, BenchmarkDataset, BenchmarkMetric, BenchmarkModelConfig, Benchmark
import ontime as on

## Initialization
A `Benchmark` instance can be initialized with a list of datasets, models and metrics to run through. When invoking `run()`, it will train (if needed) and test every dataset on every model, and compute every metric on the predicted data.


### Preparing datasets

Datasets submitted to a `Benchmark` must be of type `TimeSeries`, wrapped into `BenchmarkDataset`. `BenchmarkDataset` allows to give datasets a name, give training and test splits, and define how data will be split to perform a rolling evaluation.

In [2]:
from ontime.module.datasets.dataset import Dataset
from darts.utils.missing_values import fill_missing_values # for filling missing values in the time series (for models that don't handle missing values)

datasets = [
    BenchmarkDataset(on.TimeSeries.from_darts(fill_missing_values(Dataset.TemperatureDataset.load())), input_length=96, target_length=24, gap=0, stride=96, name="Daily temperature"),
]

### Preparing models  

Benchmark models must be given in a `BenchmarkModelConfig` class object, and their class must implement ontime `AbstractModel` interface. The `BenchmarkModelConfig` class contains attribute that allows to later instantiate the model with the desired configuration.
It requires the following parameters :
- `model_name` : the name of the model, for result logging purpose,
- `model_class` : the class of the model that implements `AbstractModel` interface,
- `benchmark_mode` : the mode in which the model must be evaluted:
    - if `ZERO_SHOT`, the model is not trained, and the evaluation is done on the test set. It is used for models that already has trained weights, available through checkpoints, or for some models from darts, where predictions are directly made using the fitted data as input (such as ARIMA),
    - if `FULL_SHOT`, the model is trained on the entire given training set. Once trained, the model is evaluated using the learnt weights.
- `static_model_params` : the static parameters to give to the model class for instantiating it. This parameters can be defined when instatiating the `BenchmarkModelConfig` object.
- `dynamic_model_params` : the dynamic parameters to give to the model class for instantiating it. This parameters can only be known when the dataset on which the model is trained is known. Therefore, a callable object that take a `BenchmarkDataset` must be given.

In [3]:
from ontime.core.time_series.time_series import TimeSeries
from ontime import Model
from darts.models import ExponentialSmoothing, TCNModel

# torch related parameters
pl_trainer_kwargs = {
    "accelerator": "gpu",
    "enable_progress_bar": False
    }

# dynamic parameters callback
input_length_param = lambda ds: ds.input_length
target_length_param = lambda ds: ds.target_length


model_configs = [
    BenchmarkModelConfig("ExponentialSmoothing", model_class=Model, benchmark_mode=BenchmarkMode.ZERO_SHOT, static_model_params={"model" : ExponentialSmoothing()}),
    BenchmarkModelConfig("Temporal Convolutional Network", model_class=Model, benchmark_mode=BenchmarkMode.FULL_SHOT, 
                         static_model_params={"model":TCNModel, "n_epochs":2, "pl_trainer_kwargs":pl_trainer_kwargs}, 
                         dynamic_model_params={"input_chunk_length":input_length_param, "output_chunk_length":target_length_param})
]

### Preparing metrics
Metrics must be given to the `BenchmarkMetric` constructor. If the function can't be invoked as is in `BenchmarkMetric`'s implementation, a child class can be written and submitted.

In [4]:
import darts.metrics

metrics = [
   BenchmarkMetric(name="MAE", metric_function=darts.metrics.metrics.mae),
   BenchmarkMetric(name="sMAPE", metric_function=darts.metrics.metrics.smape),
   BenchmarkMetric(name="MASE", metric_function=darts.metrics.metrics.mase)
]

## Creating and running a Benchmark

In [5]:
benchmark = Benchmark(datasets=datasets,
                      model_configs=model_configs, 
                      metrics=metrics)

Datasets, models and metrics can also be added after instanciation. This allows to name datasets.

In [6]:
benchmark.add_dataset(BenchmarkDataset(Dataset.ETTh1Dataset.load(), input_length=336, gap=0, stride=72, target_length=72, name = "ETTh1"))

Once the models and datasets have been added, the `run()` method will train instances of all the models on all the datasets individually and compute metrics. The logging level can be chosen to show less or more information about the benchmark execution in the console.

In [7]:
benchmark.run(logging_level="debug")

                                                                                                                                                                                                        

on 0: 11:44:37 - [INFO] - BenchmarkLogger - On Daily temperature dataset...


                                                                                                                                                                                                        

on 1: 11:44:37 - [INFO] - BenchmarkLogger - ExponentialSmoothing model...


                                                                                                                                                                                                        

on 1: 11:44:37 - [INFO] - BenchmarkLogger - Evaluating...


on 1: False                                                                                                                                                                                             
on 1: MAE                                                                                                                                                                                               
on 1: sMAPE                                                                                                                                                                                             
on 1: MASE                                                                                                                                                                                              
                                                                                                                                                                                                    

on 1: 11:44:38 - [INFO] - BenchmarkLogger - Evaluation done, took 1.0009534358978271


                                                                                                                                                                                                        

on 1: 11:44:38 - [INFO] - BenchmarkLogger - getting predictions...


                                                                                                                                                                                                        

on 1: 11:44:38 - [INFO] - BenchmarkLogger - Computed metrics: 
       {'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}


                                                                                                                                                                                                        

on 2: 11:44:38 - [INFO] - BenchmarkLogger - Temporal Convolutional Network model...


                                                                                                                                                                                                        

on 2: root       INFO  Training ...


on 2: True                                                                                                                                                                                              
                                                                                                                                                                                                        

on 2: darts.models.forecasting.torch_forecasting_model INFO  Train dataset contains 2216 samples.


                                                                                                                                                                                                        

on 2: darts.models.forecasting.tcn_model INFO  Number of layers chosen: 5


                                                                                                                                                                                                        

on 2: darts.models.forecasting.torch_forecasting_model INFO  Time series values are 64-bits; casting model to float64.


                                                                                                                                                                                                        

on 2: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: INFO: You are using a CUDA device ('NVIDIA GeForce RTX 3070 Ti Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  You are using a CUDA device ('NVIDIA GeForce RTX 3070 Ti Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


                                                                                                                                                                                                        

on 2: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


                                                                                                                                                                                                        

on 2: 
        | Name            | Type             | Params | Mode 
      -------------------------------------------------------------
      0 | criterion       | MSELoss          | 0      | train
      1 | train_criterion | MSELoss          | 0      | train
      2 | val_criterion   | MSELoss          | 0      | train
      3 | train_metrics   | MetricCollection | 0      | train
      4 | val_metrics     | MetricCollection | 0      | train
      5 | res_blocks      | ModuleList       | 272    | train
      -------------------------------------------------------------
      272       Trainable params
      0         Non-trainable params
      272       Total params
      0.001     Total estimated model params size (MB)
      33        Modules in train mode
      0         Modules in eval mode


                                                                                                                                                                                                        

on 2: INFO: `Trainer.fit` stopped: `max_epochs=2` reached.


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  `Trainer.fit` stopped: `max_epochs=2` reached.


                                                                                                                                                                                                        

on 2: 11:44:46 - [INFO] - BenchmarkLogger - Training done, it took 7.525358438491821 seconds


                                                                                                                                                                                                        

on 2: 11:44:46 - [INFO] - BenchmarkLogger - Evaluating...


                                                                                                                                                                                                        

on 2: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


on 2: MAE                                                                                                                                                                                               
on 2: sMAPE                                                                                                                                                                                             
on 2: MASE                                                                                                                                                                                              
                                                                                                                                                                                                        

on 2: 11:44:46 - [INFO] - BenchmarkLogger - Evaluation done, took 0.5738580226898193


                                                                                                                                                                                                        

on 2: 11:44:46 - [INFO] - BenchmarkLogger - getting predictions...


                                                                                                                                                                                                        

on 2: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 2: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 2: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 2: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


                                                                                                                                                                                                        

on 2: 11:44:47 - [INFO] - BenchmarkLogger - Computed metrics: 
       {'MAE': 2.348732890574183, 'sMAPE': 23.846048763201882, 'MASE': 1.200195414642612}


                                                                                                                                                                                                        

on 2: 11:44:47 - [INFO] - BenchmarkLogger - On ETTh1 dataset...


                                                                                                                                                                                                        

on 3: 11:44:47 - [INFO] - BenchmarkLogger - ExponentialSmoothing model...


                                                                                                                                                                                                        

on 3: 11:44:47 - [INFO] - BenchmarkLogger - Evaluating...


on 3: False                                                                                                                                                                                             
on 3: MAE                                                                                                                                                                                               
on 3: sMAPE                                                                                                                                                                                             
on 3: MASE                                                                                                                                                                                              
                                                                                                                                                                                                    

on 3: 11:45:48 - [INFO] - BenchmarkLogger - Evaluation done, took 61.609753370285034


                                                                                                                                                                                                        

on 3: 11:45:48 - [INFO] - BenchmarkLogger - getting predictions...


                                                                                                                                                                                                        

on 3: 11:45:50 - [INFO] - BenchmarkLogger - Computed metrics: 
       {'MAE': 3.518272782469046, 'sMAPE': 62.05659914168031, 'MASE': 2.9669407195966673}


                                                                                                                                                                                                        

on 4: 11:45:50 - [INFO] - BenchmarkLogger - Temporal Convolutional Network model...


                                                                                                                                                                                                        

on 4: root       INFO  Training ...


on 4: True                                                                                                                                                                                              
                                                                                                                                                                                                        

on 4: darts.models.forecasting.torch_forecasting_model INFO  Train dataset contains 10740 samples.


                                                                                                                                                                                                        

on 4: darts.models.forecasting.tcn_model INFO  Number of layers chosen: 7


                                                                                                                                                                                                        

on 4: darts.models.forecasting.torch_forecasting_model INFO  Time series values are 64-bits; casting model to float64.


                                                                                                                                                                                                        

on 4: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


                                                                                                                                                                                                        

on 4: 
        | Name            | Type             | Params | Mode 
      -------------------------------------------------------------
      0 | criterion       | MSELoss          | 0      | train
      1 | train_criterion | MSELoss          | 0      | train
      2 | val_criterion   | MSELoss          | 0      | train
      3 | train_metrics   | MetricCollection | 0      | train
      4 | val_metrics     | MetricCollection | 0      | train
      5 | res_blocks      | ModuleList       | 548    | train
      -------------------------------------------------------------
      548       Trainable params
      0         Non-trainable params
      548       Total params
      0.002     Total estimated model params size (MB)
      43        Modules in train mode
      0         Modules in eval mode


                                                                                                                                                                                                        

on 4: INFO: `Trainer.fit` stopped: `max_epochs=2` reached.


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  `Trainer.fit` stopped: `max_epochs=2` reached.


                                                                                                                                                                                                        

on 4: 11:46:13 - [INFO] - BenchmarkLogger - Training done, it took 23.201418161392212 seconds


                                                                                                                                                                                                        

on 4: 11:46:13 - [INFO] - BenchmarkLogger - Evaluating...


                                                                                                                                                                                                        

on 4: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


on 4: MAE                                                                                                                                                                                               
on 4: sMAPE                                                                                                                                                                                             
on 4: MASE                                                                                                                                                                                              
                                                                                                                                                                                                        

on 4: 11:46:14 - [INFO] - BenchmarkLogger - Evaluation done, took 1.4504737854003906


                                                                                                                                                                                                        

on 4: 11:46:14 - [INFO] - BenchmarkLogger - getting predictions...


                                                                                                                                                                                                        

on 4: INFO: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 4: INFO: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 4: INFO: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: lightning.pytorch.utilities.rank_zero INFO  HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 4: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


                                                                                                                                                                                                        

on 4: 11:46:15 - [INFO] - BenchmarkLogger - Computed metrics: 
       {'MAE': 2.867906589673361, 'sMAPE': 84.92323548841348, 'MASE': 4.189970881462188}


Benchmarking |████████████████████| 4/4 [100%] in 1:38.5 (0.04/s)                                                                                                                                       


## Visualizing results

The benchmark automatically stores measures and metrics computed during the run, available through class attributes.

### Measures and metrics
To view the results, you can call `get_report()` and print the returned value

In [8]:
print(benchmark.get_report())

Daily temperature dataset:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
test set size: 732

ExponentialSmoothing model:
suceeded: ✓
training time: 0
evaluation time: 1.0009534358978271
inference time: 0.12173032760620117
metrics: {'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}

Temporal Convolutional Network model:
suceeded: ✓
training time: 7.525358438491821
evaluation time: 0.5738580226898193
inference time: 0.4079856872558594
metrics: {'MAE': 2.348732890574183, 'sMAPE': 23.846048763201882, 'MASE': 1.200195414642612}


ETTh1 dataset:
nb features: 7
target column: ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL', 'OT']
training set size: 11147
validation set size: 2788
test set size: 3485

ExponentialSmoothing model:
suceeded: ✓
training time: 0
evaluation time: 61.609753370285034
inference time: 1.4529154300689697
metrics: {'MAE': 3.518272782469046, 'sMAPE': 62.05659914168031, 'MASE': 2

You can also get results as dataframes by calling `get_report_df()`. The results are then returned as a dataframe with model names as columns, dataset names as main rows, and measure as sub rows.

In [10]:
df_1, df_2 = benchmark.get_report_dfs()
df_1

Unnamed: 0_level_0,Daily temperature,ETTh1
Characteristic,Unnamed: 1_level_1,Unnamed: 2_level_1
nb features,1,7
target column,[Daily minimum temperatures],"[HUFL, HULL, MUFL, MULL, LUFL, LULL, OT]"
training set size,2335,11147
validation set size,585,2788
test set size,732,3485


In [11]:
df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,ExponentialSmoothing,Temporal Convolutional Network
Dataset,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1
Daily temperature,training time,0.0,7.525358
Daily temperature,evaluation time,1.000953,0.573858
Daily temperature,inference time,0.12173,0.407986
Daily temperature,MAE,2.400041,2.348733
Daily temperature,sMAPE,24.148348,23.846049
Daily temperature,MASE,1.25905,1.200195
ETTh1,training time,0.0,23.201418
ETTh1,evaluation time,61.609753,1.450474
ETTh1,inference time,1.452915,0.425124
ETTh1,MAE,3.518273,2.867907


### Plotting

By default (argument `nb_predictions` of `benchmark.run()` method), the benchmark will generate a prediction for one random input sample of each dataset with each model. The predictions, along input and target series, are stored in a dictionnary and can be retrieved by calling `benchmark.get_predictions()`. The predictions can be plotted using the Ontime plotting module.

In [12]:
predictions = benchmark.get_predictions()

In [13]:
# currently, Ontime plotting module needs the time index to be named 'time'
def rename_index(ts, name='time'):
    df = ts.pd_dataframe()
    df.rename_axis(name, inplace=True)
    return TimeSeries.from_dataframe(df)

In [15]:
input = rename_index(predictions['inputs']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'input'})
target = rename_index(predictions['targets']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'target'})
prediction = rename_index(predictions['predictions']['Daily temperature']['Temporal Convolutional Network'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [16]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [21]:
prediction = rename_index(predictions['predictions']['Daily temperature']['ExponentialSmoothing'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [22]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [23]:
input = rename_index(predictions['inputs']['ETTh1'][0][500:].univariate_component(0)).rename({'HUFL': 'input'})
target = rename_index(predictions['targets']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'target'})
prediction = rename_index(predictions['predictions']['ETTh1']['ExponentialSmoothing'][0].univariate_component(0)).rename({'HUFL': 'prediction'})

In [24]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)