# Module - Benchmarking
Ontime provides a Benchmark class that can be used to run a number of prediction models on a number of datasets.

In [4]:
# Import to be able to import python package from src
import sys
sys.path.insert(0, '../../../../src')

from ontime.module.benchmarking.benchmark import Benchmark

Datasets submitted to a Benchmark must be of type TimeSeries and models must implement AbstractModel. To ensure that, you may have to create a simple wrapper class for the models, as shown below.

In [5]:
from ontime.core.model.abstract_model import AbstractModel

#a simple class ensuring used models have a fit() and predict() methods
class ModelWrapper(AbstractModel):
    def __init__(self, model):
        self.model = model

    def fit(self, dataset):
        self.model.fit(dataset)

    def predict(self, horizon):
        return self.model.predict(horizon)

In [8]:
from darts.models import ARIMA
from darts.models import BATS
from ontime.module.data.dataset import Dataset

m1 = ModelWrapper(ARIMA(p=12, d=1, q=2))
m2 = ModelWrapper(BATS(use_trend = True))

d1 = Dataset.AirPassengersDataset.load()
d2 = Dataset.AusBeerDataset.load()

Models and datasets can be added to a Benchmark upon its instanciation or using add_model() and add_dataset(). Using the add_x() methods allows to give a name to the models and datasets that will be used when generating the report.

In [9]:
# adding directly
b1 = Benchmark([d1, d2], [m1, m2])

# adding one by one, with names
b2 = Benchmark()
b2.add_dataset(d1, "Air Passengers")
b2.add_dataset(d2, "Aus Beer")
b2.add_model(m1, "ARIMA")
b2.add_model(m2, "BATS")

Once the models and datasets have been added, the run() method will train instances of all the models on all the datasets individually and genereate metrics. The verbose parameter will print the status and results of the process as it progresses.

In [10]:
b1.run(verbose = True)
print("------------------------------------------------")
b2.run(verbose = False)

Starting evaluation...
Evaluation for model 1
on dataset 1 
train 

  warn('Non-stationary starting autoregressive parameters'
  warn('Non-stationary starting autoregressive parameters'


infer done, took 0.010730266571044922
on dataset 2 
train infer done, took 0.01378774642944336
Evaluation for model 2
on dataset 1 
train 



infer done, took 0.01328420639038086
on dataset 2 
train infer done, took 0.005892276763916016
------------------------------------------------


  warn('Non-stationary starting autoregressive parameters'
  warn('Non-stationary starting autoregressive parameters'


To view the results, you can call get_report() and print the returned value

In [11]:
print(b1.get_report())

Model 1:
Supported univariate datasets: ✓
Supported multivariate datasets: unknown
                      1           2
nb features    1.000000    1.000000
train size    42.000000   63.000000
train time     0.626309    0.719149
test size    102.000000  148.000000
test time      0.010730    0.013788
mape          21.009775   37.099098

Model 2:
Supported univariate datasets: ✓
Supported multivariate datasets: unknown
                      1           2
nb features    1.000000    1.000000
train size    42.000000   63.000000
train time     4.467343    3.031891
test size    102.000000  148.000000
test time      0.013284    0.005892
mape          16.256853   66.605265



As mentioned above, when datasets and models have been initialized with names, those will be used in the report.

In [12]:
print(b2.get_report())

Model ARIMA:
Supported univariate datasets: ✓
Supported multivariate datasets: unknown
             Air Passengers    Aus Beer
nb features        1.000000    1.000000
train size        42.000000   63.000000
train time         0.603575    0.704960
test size        102.000000  148.000000
test time          0.010121    0.013602
mape              21.009775   37.099098

Model BATS:
Supported univariate datasets: ✓
Supported multivariate datasets: unknown
             Air Passengers    Aus Beer
nb features        1.000000    1.000000
train size        42.000000   63.000000
train time         3.988902    2.904714
test size        102.000000  148.000000
test time          0.004589    0.006986
mape              16.256853   66.605265

