# Add a custom forecasting model

This notebook provides a minimal example on how to add a custom forecasting
model to Test-of-time.
The example further illustrates how to use it in the Test-of-time ecosystem
for data loading and model evaluation.
We choose to implement the **seasonal naive forecasting model** available in the **darts** library.

**TL;DR:** To implement a new model, you need follow these steps:

* **Step 1:** Implement a new model class that inherits from the abstract class
`test_of_time.tot.models.Model`.
* **Step 2:** Implement the `__post_init__()` to initialize the required class attributes
* **Step 3:** Implement the abstract `fit()` method
* **Step 4:** Implement the abstract `predict()` returning the model's forecast
* **Step 5:** Implement the parent class method `maybe_extend_df()` and `maybe_drop_added_values_from_df()` for
model-specific pre-/ post-processing
* **Step 6:** Using your new model in a simple benchmark


In [28]:
#imports
from tot.models import Model
from tot.utils import convert_df_to_TimeSeries, _predict_darts_model
from tot.df_utils import _check_min_df_len, prep_or_copy_df, add_first_inputs_to_df, drop_first_inputs_from_df
from dataclasses import dataclass
from typing import Type

from copy import deepcopy
try:
    from darts.models import NaiveSeasonal

    _darts_installed = True
except ImportError:
    RegressionModel = None
    _darts_installed = False
    raise ImportError(
        "The NaiveSeasonal could not be imported."
        "Check for proper installation of darts: https://github.com/unit8co/darts/blob/master/INSTALL.md"
    )

## Step 1 - Implement new model class
* We import the abstract base class `tot.models.Model` and define a new class with the name `CustomSeasonalNaiveModel`
which inherits from `Model`. We want this class to be a dataclass and decorate it with `@dataclass` imported from
`dataclasses`. We can see this class as a wrapper of the actual model we are adding which represents and interface between
the test-of-time library and the model conventions. Hence, we will call it **model wrapper** in the following.
* We assign the attribute `model_name` which is a non-optional attribute to `CustomSeasonalNaive`
* Optional: If we import our custom model from another package, we would also assign the attribute `model_type`.
In this case we assign it to `NaiveSeasonal` which we import from the **darts** library.

In [12]:
>>> @dataclass
>>> class CustomSeasonalNaiveModel(Model):
>>>    model_name: str = "CustomSeasonalNaive"
>>>    model_class: Type = NaiveSeasonal

## Step 2 - Implement the `__post_init__()`
We implement the `__post_init__()` to initialize the required model wrapper class attributes and local parameters.
This includes the sub-steps: (1) installation check, (2) assign `model_params`, (3) re-assign `data_params`, and
 (4) assign and verify model wrapper attribute. Let's have a look at it step-by-step:

* First, we check if the darts library was install to access the `NaiveSeasoal` model. Remark: In case we implement
the custom model from scratch, we must not check for installation.
* Next, we want to make sure that all model parameter that are relevant for model fitting and predicting are assigned.
Therefore, we only extract the parameters that the custom model needs. The `NaiveSeasonal` needs 1 input parameters to
make predictions, the `season_length`, which is the seasonal period in number of time steps. To instantiate the actual
model, we pass the `model_params` to `model_class`.
* Re-assign data parameters: Parameters like the seasonalities or the seasonality_mode are defined by the dataset
and hence provided as a `_data_param`. In the `__post_init__()`. In this case, we assign the
frequency to the model wrapper attribute `self.freq`.
* Last we assigne the model wrapper attributes `season_length`, and `n_forecasts`,  which is the forecast horizon.
For both attributes we verify valid inputs.

Remark:This is the minimum required initialization, further attributes could be added.

In [14]:
>>> def __post_init__(self):
>>>         # check if installed
>>>
>>>         if not (_darts_installed):
>>>             raise RuntimeError(
>>>                 "Requires darts and sklearn to be installed:"
>>>                 "https://scikit-learn.org/stable/install.html"
>>>                 "https://github.com/unit8co/darts/blob/master/INSTALL.md"
>>>             )
>>>         # extract model parameters and instantiate actual model
>>>         model_params = deepcopy(self.params)
>>>         model_params.pop("_data_params")
>>>         model_params.pop("n_forecasts")
>>>         self.model = self.model_class(**model_params)
>>>         # re-assign the data params
>>>         data_params = self.params["_data_params"]
>>>         self.freq = data_params["freq"]
>>>
>>>         # Set forecast horizon as model wrapper attribute horizon and verify
>>>         self.n_forecasts = self.params["n_forecasts"]
>>>         assert self.n_forecasts >= 1, "Model parameter n_forecasts must be >=1. "
>>>         # Set season length as model wrapper attribute horizon and verify
>>>         self.season_length = model_params["K"]
>>>         assert self.season_length is not None, (
>>>             "Dataset does not provide a seasonality. Assign a seasonality to each of the datasets "
>>>             "OR input desired season_length as model parameter to be used for all datasets "
>>>             "without specified seasonality."
>>>         )

## Step 3 - Fit() method
The `fit()` method of the model wrapper can be considered as an interface to the `fit()` method of the actual model.
It includes the model-specific pre-processing of the data.
* The model-specific pre-processing in this case comprises to check if the dataframe contains enough samples for fitting
 via calling `_check_min_df_len()` and converting the dataframe to the `TimeSeries` format from darts. Both functions
 are available as a helper function in test-of-time.
* We pass the series of type `TimeSeries` to the `fit()` method of the instantiated model

In [16]:
>>> def fit(self, df: pd.DataFrame, freq: str):
>>>     _check_min_df_len(df=df, min_len= self.n_forecasts + self.season_length)
>>>     self.freq = freq
>>>     series = convert_df_to_TimeSeries(df, value_cols=df.columns.values[1:-1].tolist(), freq=self.freq)
>>>     self.model = self.model.fit(series)

## Step 4 - Predict() method
The `predict()` method of the model wrapper can be considered as an interface to the `predict()` method of the actual
model. It includes the model-specific pre- and post-processing of the data.
* The model-specific pre-processing in this case comprises to check if the dataframe contains enough samples for fitting
 via calling `_check_min_df_len()`. Optionally, a `maybe_extend_df()` can be implemented and called to extend the test
 with `season_length` samples of the train data with the intend to increase the test data amount. The overall benchmarking
  pipeline will ensure that the `df_historic` is only passed for predicting on the test data.  Further the
 `prep_or_copy_df()` ensures that the dataframe has an `ID` column. Last we set the `n_req_past_obs` which for the
 `NaiveSeasonal` has to be greater than 3 and increase it by 1 to be consitent with the prediction range of darts model
 that have retraining activated.
* Next, we predict the forecast by calling `_predict_darts_model()`. This function is an available wrapper to predict
model from the darts library.
* Last, we implement the model-specific post-processing, which comprises to drop the previously added samples via
`maybe_drop_added_values_from_df()`.

Remarks
* Data format: Our input and output of the test-of-time environment is of type pd.Dataframe. In case we work with any
other data format in between, in this case `TimeSeries`, we need to ensure to convert this data format from/to a
pd.Dataframe. For darts model we offer a helper function `_predict_darts_model` that incorporates this capability.
for the returned forecast.
* Backtesting: Test-of-time is a framework that per default executes backtesting. That means it forecasts the selected
forecast horizon in a rolling manner on the complete available data. Some libraries offer that capability along with
their models. For other libraries, this procedure needs to be implemented in the `predict()` wrapper.
rolling historical procedure

In [17]:
>>> def predict(self, df: pd.DataFrame, df_historic: pd.DataFrame = None):
>>>     _check_min_df_len(df=df, min_len=1)
>>>     if df_historic is not None:
>>>         df = self.maybe_extend_df(df_historic, df)
>>>     df, received_ID_col, received_single_time_series, _ = prep_or_copy_df(df)
>>>     # min. past observations !> 3 and 1 needs to be added because retrain=True
>>>     n_req_past_obs = 3 if self.season_length < 3 else self.season_length
>>>     n_req_past_obs += 1
>>>
>>>     fcst_df = _predict_darts_model(df=df, model=self, n_req_past_obs=n_req_past_obs, n_req_future_obs=self.n_forecasts, retrain=True)
>>>
>>>     if df_historic is not None:
>>>         fcst_df, df = self.maybe_drop_added_values_from_df(fcst_df, df)
>>>     return fcst_df

## Step 5 - Implement the parent class method
The abstract parent class `Model` has 2 class methods `maybe_extend_df()` and `maybe_drop_added_values_from_df()` that
must be reimplemented in case they should be active. Since we want to have the active for our custom model, we implement
them.
In the `maybe_extend_df()` we add `season_length` samples of the train dataframe to the test dataframe. In the
`drop_first_inputs_from_df()` we will drop them again.

In [18]:
>>> def maybe_extend_df(self, df_train, df_test):
>>>     samples = self.season_length
>>>     df_test = add_first_inputs_to_df(samples=samples, df_train=df_train, df_test=df_test)
>>>     return df_test

>>> def maybe_drop_added_values_from_df(self, predicted, df):
>>>     samples = self.season_length
>>>     predicted, df = drop_first_inputs_from_df(samples=samples, predicted=predicted, df=df)
>>>     return predicted, df

## Step 6 - Run your model in a benchmark

Let's put together the whole new model class

In [20]:
@dataclass
class CustomSeasonalNaiveModel(Model):
    model_name: str = "CustomSeasonalNaive"
    model_class: Type = NaiveSeasonal

    def __post_init__(self):
         # check if installed

         if not (_darts_installed):
             raise RuntimeError(
                 "Requires darts and sklearn to be installed:"
                 "https://scikit-learn.org/stable/install.html"
                 "https://github.com/unit8co/darts/blob/master/INSTALL.md"
             )
         # extract model parameters and instantiate actual model
         model_params = deepcopy(self.params)
         model_params.pop("_data_params")
         model_params.pop("n_forecasts")
         self.model = self.model_class(**model_params)
         # re-assign the data params
         data_params = self.params["_data_params"]
         self.freq = data_params["freq"]

         # Set forecast horizon as model wrapper attribute horizon and verify
         self.n_forecasts = self.params["n_forecasts"]
         assert self.n_forecasts >= 1, "Model parameter n_forecasts must be >=1. "
         # Set season length as model wrapper attribute horizon and verify
         self.season_length = model_params["K"]
         assert self.season_length is not None, (
             "Dataset does not provide a seasonality. Assign a seasonality to each of the datasets "
             "OR input desired season_length as model parameter to be used for all datasets "
             "without specified seasonality."
         )

    def fit(self, df: pd.DataFrame, freq: str):
        _check_min_df_len(df=df, min_len= self.n_forecasts + self.season_length)
        self.freq = freq
        series = convert_df_to_TimeSeries(df, value_cols=df.columns.values[1:-1].tolist(), freq=self.freq)
        self.model = self.model.fit(series)

    def predict(self, df: pd.DataFrame, df_historic: pd.DataFrame = None):
        _check_min_df_len(df=df, min_len=1)
        if df_historic is not None:
            df = self.maybe_extend_df(df_historic, df)
        df, received_ID_col, received_single_time_series, _ = prep_or_copy_df(df)
        # min. past observations !> 3 and 1 needs to be added because retrain=True
        n_req_past_obs = 3 if self.season_length < 3 else self.season_length
        n_req_past_obs += 1

        fcst_df = _predict_darts_model(df=df, model=self, n_req_past_obs=n_req_past_obs, n_req_future_obs=self.n_forecasts, retrain=True)

        if df_historic is not None:
            fcst_df, df = self.maybe_drop_added_values_from_df(fcst_df, df)
        return fcst_df

    def maybe_extend_df(self, df_train, df_test):
        """
        If model depends on historic values, extend beginning of df_test with last
        df_train values.
        """
        samples = self.season_length
        df_test = add_first_inputs_to_df(samples=samples, df_train=df_train, df_test=df_test)

        return df_test

    def maybe_drop_added_values_from_df(self, predicted, df):
        """
        If model depends on historic values, drop first values of predicted and df_test.
        """
        samples = self.season_length
        predicted, df = drop_first_inputs_from_df(samples=samples, predicted=predicted, df=df)
        return predicted, df

For running our new model in a benchmark, we load some sample datasets

In [21]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"

air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')
yosemite_temps_df = pd.read_csv(data_location +  'yosemite_temps.csv')
ercot_load_df = pd.read_csv(data_location +  'multivariate/load_ercot_regions.csv')

Let's set up the `SimpleBenchmark` template with our CustomSeasonalNaiveModel and run the benchmark.

In [24]:
from tot.dataset import Dataset
from tot.benchmark import SimpleBenchmark
dataset_list = [
    Dataset(df = air_passengers_df, name = "air_passengers", freq = "MS"),
    Dataset(df = peyton_manning_df, name = "peyton_manning", freq = "D"),
    Dataset(df = yosemite_temps_df, name = "yosemite_temps", freq = "5min"),
    # Dataset(df = ercot_load_df, name = "ercot_load", freq = "H"),
]
model_classes_and_params = [
    (CustomSeasonalNaiveModel, {"K": 1, "n_forecasts":3}),
]
benchmark = SimpleBenchmark(
    model_classes_and_params=model_classes_and_params, # iterate over this list of tuples
    datasets=dataset_list, # iterate over this list
    metrics=["MAE", "MSE", "MASE", "RMSE"],
    test_percentage=25,
)

In [25]:
results_train, results_test = benchmark.run()

  0%|          | 0/113 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/2933 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/18690 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

In [27]:
results_test

Unnamed: 0,data,model,params,experiment,MAE,MSE,MASE,RMSE
0,air_passengers,CustomSeasonalNaive,"{'K': 1, 'n_forecasts': 3, '_data_params': {'f...",air_passengers_CustomSeasonalNaive_K_1_n_forec...,73.016663,8050.25,3.301137,86.131256
1,peyton_manning,CustomSeasonalNaive,"{'K': 1, 'n_forecasts': 3, '_data_params': {'f...",peyton_manning_CustomSeasonalNaive_K_1_n_forec...,0.573357,0.648822,1.832438,0.791932
2,yosemite_temps,CustomSeasonalNaive,"{'K': 1, 'n_forecasts': 3, '_data_params': {'f...",yosemite_temps_CustomSeasonalNaive_K_1_n_forec...,0.631667,0.588167,1.627998,0.735115
