# Module - Benchmarking
Ontime provides a `Benchmark` class that can be used to run a number of prediction models on a number of datasets.

In [1]:
from ontime.module.benchmarking import BenchmarkMode, BenchmarkDataset, BenchmarkMetric, Benchmark
import ontime as on

## Initialization
A `Benchmark` instance can be initialized with a list of datasets, models and metrics to run through. When invoking `run()`, it will train (if needed) and test every dataset on every model, and compute every metric on the predicted data.


### Preparing datasets

Datasets submitted to a `Benchmark` must be of type `TimeSeries`, wrapped into `BenchmarkDataset`. `BenchmarkDataset` allows to give datasets a name, give training and test splits, and define how data will be split to perform a rolling evaluation.

In [2]:
from ontime.module.datasets.dataset import Dataset
from darts.utils.missing_values import fill_missing_values # for filling missing values in the time series (for models that don't handle missing values)

datasets = [
    BenchmarkDataset(on.TimeSeries.from_darts(fill_missing_values(Dataset.TemperatureDataset.load())), input_length=96, gap=0, stride=96, horizon=24, name="Daily temperature"),
]

### Preparing models  

Models are wrapped according to the `AbstractBenchmarkModel` interface. Wrappers implementing this interface will instanciate the model for each dataset.  
In Ontime, some wrappers around darts models are provided. For specific models, whose wrappers are not provided, a custom wrapper can be written, implementing the AbstractBenchmarkModel interface.  
A `mode` should be provided to the wrapper constructor, which will define how the model is evaluated. In can be either:
- `ZERO_SHOT`: the model is not trained, and the evaluation is done on the test set. It is used for models that already has trained weights, available through checkpoints, or for some models from darts, where predictions are directly made using the fitted data as input.  
For some univariate models, a `multivariate` can be set so that they perform a multi-univariate predictions, and therefore can be compare against true multivariate models.
- `FULL_SHOT`: the model is trained on the entire given training set. Once trained, the model is evaluated using the learnt weights.

In [None]:
from ontime.core.time_series.time_series import TimeSeries
from ontime.module.benchmarking.darts_models import LocalDartsBenchmarkModel, GlobalDartsBenchmarkModel
from darts.models import ExponentialSmoothing, TCNModel

models = [
    LocalDartsBenchmarkModel("ExponentialSmoothingUnivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT),
    LocalDartsBenchmarkModel("ExponentialSmoothingMultivariate", model=ExponentialSmoothing(), mode=BenchmarkMode.ZERO_SHOT, multivariate=True),
    GlobalDartsBenchmarkModel("Temporal Convolutional Network", model=TCNModel(input_chunk_length=24, output_chunk_length=10), mode=BenchmarkMode.FULL_SHOT, training_epochs=5)
]

### Preparing metrics
Metrics must be given to the `BenchmarkMetric` constructor. If the function can't be invoked as is in `BenchmarkMetric`'s implementation, a child class can be written and submitted.

In [4]:
import darts.metrics

metrics = [
   BenchmarkMetric(name="COV", metric_function=darts.metrics.metrics.coefficient_of_variation),
   BenchmarkMetric(name="MAE", metric_function=darts.metrics.metrics.mae),
   BenchmarkMetric(name="sMAPE", metric_function=darts.metrics.metrics.smape),
   BenchmarkMetric(name="MASE", metric_function=darts.metrics.metrics.mase)
]

## Creating and running a Benchmark

In [5]:
benchmark = Benchmark(datasets=datasets,
                      models=models, 
                      metrics=metrics)

Datasets, models and metrics can also be added after instanciation. This allows to name datasets.

In [6]:
benchmark.add_dataset(BenchmarkDataset(Dataset.ETTh1Dataset.load(), input_length=500, gap=0, stride=96, horizon=96, name = "ETTh1"))

Once the models and datasets have been added, the `run()` method will train instances of all the models on all the datasets individually and compute metrics. The verbose parameter will print the status and results of the process as it progresses, and the debug parameter will print error messages (warnings are printed anyways).

In [None]:
benchmark.run(verbose=True, debug=True)

## Visualizing results

The benchmark automatically stores measures and metrics computed during the run, available through class attributes.

### Measures and metrics
To view the results, you can call `get_report()` and print the returned value

In [19]:
print(benchmark.get_report())

Model ExponentialSmoothingUnivariate:
Supported univariate datasets: ✓
Supported multivariate datasets: X
Dataset Daily temperature:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
training time: 0
test set size: 732
testing time: 0.5555312633514404
metrics: {'COV': 29.264706049554235, 'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}
Dataset ETTh1:
couldn't complete training on ETTh1


Model ExponentialSmoothingMultivariate:
Supported univariate datasets: ✓
Supported multivariate datasets: ✓
Dataset Daily temperature:
nb features: 1
target column: ['Daily minimum temperatures']
training set size: 2335
validation set size: 585
training time: 0
test set size: 732
testing time: 0.5387301445007324
metrics: {'COV': 29.264706049554235, 'MAE': 2.400040758215776, 'sMAPE': 24.148347520589876, 'MASE': 1.259049717016714}
Dataset ETTh1:
nb features: 7
target column: ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL

You can also get results as dataframes by calling `get_report_df()`. The results are then returned as a dataframe with model names as columns, dataset names as main rows, and measure as sub rows.

In [21]:
df_1, df_2 = benchmark.get_report_df()
df_1

Unnamed: 0_level_0,ExponentialSmoothingUnivariate,ExponentialSmoothingMultivariate,Temporal Convolutional Network
Statistic,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
supports univariate,✓,✓,✓
supports multivariate,X,✓,✓


In [22]:
df_2

Unnamed: 0_level_0,Unnamed: 1_level_0,ExponentialSmoothingUnivariate,ExponentialSmoothingMultivariate,Temporal Convolutional Network
Dataset,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Daily temperature,training time,0.0,0.0,7.102958
Daily temperature,testing time,0.555531,0.53873,1.902054
Daily temperature,COV,29.264706,29.264706,27.426134
Daily temperature,MAE,2.400041,2.400041,2.307088
Daily temperature,sMAPE,24.148348,24.148348,23.384631
Daily temperature,MASE,1.25905,1.25905,1.189637
ETTh1,training time,,0.0,34.660468
ETTh1,testing time,,30.634999,11.778613
ETTh1,COV,,124.741419,135.067327
ETTh1,MAE,,3.018218,2.959528


### Plotting

By default (argument `nb_predictions` of `benchmark.run()` method), the benchmark will generate a prediction for one random input sample of each dataset with each model. The predictions, along input and target series, are stored in a dictionnary and can be retrieved by calling `benchmark.get_predictions()`. The predictions can be plotted using the Ontime plotting module.

In [11]:
predictions = benchmark.get_predictions()

In [12]:
# currently, Ontime plotting module needs the time index to be named 'time'
def rename_index(ts, name='time'):
    df = ts.pd_dataframe()
    df.rename_axis(name, inplace=True)
    return TimeSeries.from_dataframe(df)

In [13]:
input = rename_index(predictions['inputs']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'input'})
target = rename_index(predictions['targets']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'target'})
prediction = rename_index(predictions['predictions']['Temporal Convolutional Network']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [14]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [15]:
prediction = rename_index(predictions['predictions']['ExponentialSmoothingUnivariate']['Daily temperature'][0]).rename({'Daily minimum temperatures': 'prediction'})

In [16]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)

In [17]:
input = rename_index(predictions['inputs']['ETTh1'][0][500:].univariate_component(0)).rename({'HUFL': 'input'})
target = rename_index(predictions['targets']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'target'})
prediction = rename_index(predictions['predictions']['ExponentialSmoothingMultivariate']['ETTh1'][0].univariate_component(0)).rename({'HUFL': 'prediction'})

In [18]:
(on.Plot()
    .add(on.marks.line, input)
    .add(on.marks.line, target)
    .add(on.marks.line, prediction, type='dashed')
    .properties(width=600, height=200)
    .show()
)