# Module - Benchmarking
Ontime provides a `Benchmark` class that can be used to run a number of prediction models on a number of datasets.

In [9]:
from ontime.module.benchmarking import BenchmarkMode, BenchmarkDataset, BenchmarkMetric, Benchmark
import ontime as on

## Initialization
A `Benchmark` instance can be initialized with a list of datasets, models and metrics to run through. When invoking `run()`, it will train (if needed) and test every dataset on every model, and compute every metric on the predicted data.


### Preparing datasets

Datasets submitted to a `Benchmark` must be of type `TimeSeries`, wrapped into `BenchmarkDataset`. `BenchmarkDataset` allows to give datasets a name, give training and test splits, and define how data will be split to perform a rolling evaluation.

In [10]:
from ontime.module.datasets.dataset import Dataset
from darts.utils.missing_values import fill_missing_values # for filling missing values in the time series (for models that don't handle missing values)

datasets = [
    BenchmarkDataset(on.TimeSeries.from_darts(fill_missing_values(Dataset.TemperatureDataset.load())), input_length=96, gap=0, stride=96, horizon=24, name="Daily temperature"),
]

### Preparing models  

Models are wrapped according to the `AbstractBenchmarkModel` interface. Wrappers implementing this interface will instanciate the model for each dataset.  
In Ontime, some wrappers around darts models are provided. For specific models, whose wrappers are not provided, a custom wrapper can be written, implementing the AbstractBenchmarkModel interface.  
A `mode` should be provided to the wrapper constructor, which will define how the model is evaluated. In can be either:
- `ZERO_SHOT`: the model is not trained, and the evaluation is done on the test set. It is used for models that already has trained weights, available through checkpoints, or for some models from darts, where predictions are directly made using the fitted data as input.  
For some univariate models, a `multivariate` can be set so that they perform a multi-univariate predictions, and therefore can be compare against true multivariate models.
- `FULL_SHOT`: the model is trained on the entire given training set. Once trained, the model is evaluated using the learnt weights.

In [11]:
from ontime.core.time_series.time_series import TimeSeries
from ontime.module.benchmarking.darts_models import LocalDartsBenchmarkModel, GlobalDartsBenchmarkModel
from darts.models import ExponentialSmoothing, TCNModel
from pytorch_lightning.callbacks import EarlyStopping
from torchmetrics import MeanAbsolutePercentageError

torch_metrics = MeanAbsolutePercentageError()
early_stopper = EarlyStopping(
    monitor="val_MeanAbsolutePercentageError",
    patience=5,
    min_delta=0.05,
    mode='min'
)
pl_trainer_kwargs = {"callbacks": [early_stopper],
                     "accelerator": "gpu",
                     "enable_progress_bar": False}

models = [
    LocalDartsBenchmarkModel("ExponentialSmoothingUnivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT),
    LocalDartsBenchmarkModel("ExponentialSmoothingMultivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT, multivariate=True),
    GlobalDartsBenchmarkModel("Temporal Convolutional Network", model=TCNModel(input_chunk_length=24, output_chunk_length=10, n_epochs=2, pl_trainer_kwargs=pl_trainer_kwargs, torch_metrics=torch_metrics), mode=BenchmarkMode.FULL_SHOT)
]

### Preparing metrics
Metrics must be given to the `BenchmarkMetric` constructor. If the function can't be invoked as is in `BenchmarkMetric`'s implementation, a child class can be written and submitted.

In [12]:
import darts.metrics

metrics = [
   BenchmarkMetric(name="COV", metric_function=darts.metrics.metrics.coefficient_of_variation),
   BenchmarkMetric(name="MAE", metric_function=darts.metrics.metrics.mae),
   BenchmarkMetric(name="sMAPE", metric_function=darts.metrics.metrics.smape),
   BenchmarkMetric(name="MASE", metric_function=darts.metrics.metrics.mase)
]

## Creating and running a Benchmark

In [13]:
benchmark = Benchmark(datasets=datasets,
                      models=models, 
                      metrics=metrics)

Datasets, models and metrics can also be added after instanciation. This allows to name datasets.

In [14]:
benchmark.add_dataset(BenchmarkDataset(Dataset.ETTh1Dataset.load(), input_length=336, gap=0, stride=72, horizon=72, name = "ETTh1"))

Once the models and datasets have been added, the `run()` method will train instances of all the models on all the datasets individually and compute metrics. The verbose parameter will print the status and results of the process as it progresses, and the debug parameter will print error messages (warnings are printed anyways).

In [15]:
benchmark.run(verbose=False, debug=False)

                                                                                                                                                                                                        

on 6: GPU available: True (cuda), used: True


                                                                                                                                                                                                        

on 6: TPU available: False, using: 0 TPU cores


                                                                                                                                                                                                        

on 6: HPU available: False, using: 0 HPUs


                                                                                                                                                                                                        

on 6: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Benchmarking |████████████████████| 6/6 [100%] in 1:31.5 (0.07/s)                                                                                                                                       


## Visualizing results

The benchmark automatically stores measures and metrics computed during the run, available through class attributes.

### Measures and metrics
To view the results, you can call `get_report()` and print the returned value

In [16]:
print(benchmark.get_report())

Model ExponentialSmoothingUnivariate:
Supported univariate datasets: ✓
Supported multivariate datasets: X
Dataset Daily temperature:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
test set size: 732
training time: 0
evaluation time: 0.3661947250366211
inference time: 0.04873967170715332
metrics: {'COV': 29.264706049554235, 'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}
Dataset ETTh1:
couldn't complete training on ETTh1


Model ExponentialSmoothingMultivariate:
Supported univariate datasets: ✓
Supported multivariate datasets: ✓
Dataset Daily temperature:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
test set size: 732
training time: 0
evaluation time: 0.3557884693145752
inference time: 0.046477556228637695
metrics: {'COV': 29.264706049554235, 'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}
Data

You can also get results as dataframes by calling `get_report_df()`. The results are then returned as a dataframe with model names as columns, dataset names as main rows, and measure as sub rows.

In [17]:
df_1, df_2 = benchmark.get_report_df()
df_1

Unnamed: 0_level_0,ExponentialSmoothingUnivariate,ExponentialSmoothingMultivariate,Temporal Convolutional Network
Statistic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
supports univariate,✓,✓,✓
supports multivariate,X,✓,✓


In [18]:
df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,ExponentialSmoothingUnivariate,ExponentialSmoothingMultivariate,Temporal Convolutional Network
Dataset,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Daily temperature,training time,0.0,0.0,3.184156
Daily temperature,evaluation time,0.366195,0.355788,2.705085
Daily temperature,inference time,0.04874,0.046478,0.331808
Daily temperature,COV,29.264706,29.264706,29.051706
Daily temperature,MAE,2.400041,2.400041,2.351355
Daily temperature,sMAPE,24.148348,24.148348,23.888942
Daily temperature,MASE,1.25905,1.25905,1.200879
ETTh1,training time,,0.0,20.49744
ETTh1,evaluation time,,47.565958,14.546187
ETTh1,inference time,,1.007837,0.321019


### Plotting

By default (argument `nb_predictions` of `benchmark.run()` method), the benchmark will generate a prediction for one random input sample of each dataset with each model. The predictions, along input and target series, are stored in a dictionnary and can be retrieved by calling `benchmark.get_predictions()`. The predictions can be plotted using the Ontime plotting module.

In [19]:
predictions = benchmark.get_predictions()

In [20]:
# currently, Ontime plotting module needs the time index to be named 'time'
def rename_index(ts, name='time'):
    df = ts.pd_dataframe()
    df.rename_axis(name, inplace=True)
    return TimeSeries.from_dataframe(df)

In [21]:
input = rename_index(predictions['inputs']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'input'})
target = rename_index(predictions['targets']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'target'})
prediction = rename_index(predictions['predictions']['Temporal Convolutional Network']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [22]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [23]:
prediction = rename_index(predictions['predictions']['ExponentialSmoothingUnivariate']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [24]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [25]:
input = rename_index(predictions['inputs']['ETTh1'][0][500:].univariate_component(0)).rename({'HUFL': 'input'})
target = rename_index(predictions['targets']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'target'})
prediction = rename_index(predictions['predictions']['ExponentialSmoothingMultivariate']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'prediction'})

In [26]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)