[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourownstory/neural_prophet/blob/master/tutorials/UnderstandeTheBenchmarkingPipeline.ipynb)

# Understand the benchmarking pipline
This tutorial takes you behind the scenes of the benchmark template and explains the processing steps
in the benchmarking pipeline.


In [4]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/test-of-time.git # may take a while
    #!pip install test-of-time # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
from tot.df_utils import _check_min_df_len, prep_or_copy_df, check_dataframe, handle_missing_data, split_df, return_df_in_original_format, maybe_drop_added_dates
from tot.models_neuralprophet import NeuralProphetModel
from tot.plot_forecast_plotly import plot

from tot.exp_utils import evaluate_forecast
from plotly_resampler import register_plotly_resampler

In [5]:
set_log_level("ERROR")
register_plotly_resampler('figure')


The benchmark templates test-of-time offers are a quick and simple way to compare multiple models and datasets.
Defining a benchmark and running it will trigger a pipeline that returns the benchmark results. The benchmark
is sub-divided into multiple experiments, that are consecutively executed in the pipeline. Every experiment run
follows the same steps. Let's have a closer look at the individual steps.

### Processing steps of an experiment

1. Data-specific pre-processing:
    - Set random seed to ensure reproducibility
    - Prepare dataframe to have an ID column
    - Performs a basic sanity check
    - Handles missing data
    - Splits the data into train and test datasets

2. Model-specific pre-processing:
    - Adjusts the data according to the model configuration to have each model be fitted and predicted correctly

3. Fit model:
    - Calls the fit method on the instantiated model object

4. Predict model:
    - Calls the predict method to create the forecast

5. Model-specific post-processing:
    - Adjusts the data according to the model configuration to be returned consitently

6. Data-specific post-processing:
    - Drops any added dates

7. Evaluation:
    - Evaluates the forecasts based on selected error metrics

### Load data
Let's load the AirPassenger dataset as an example dataset to walk through the pipeline step by step.

In [6]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df_air = pd.read_csv(data_location + 'air_passengers.csv')

### 1. Data-specific pre-processing:

In [7]:
df_air, received_ID_col, received_single_time_series, _ = prep_or_copy_df(df_air)
df_air = check_dataframe(df_air, check_y=True)
# add infer frequency
df_air = handle_missing_data(df_air, freq='MS')
df_air_train, df_air_test = split_df(
    df=df_air,
    test_percentage=0.40,
)

### 2. Model-specific pre-processing:

In [8]:
model_class = NeuralProphetModel
params =  {
    "n_forecasts": 3,
    "n_lags":7,
    "seasonality_mode": "multiplicative",
    "learning_rate": 0.03,
    "_data_params":{},
}
model=model_class(params=params)

_check_min_df_len(df=df_air_train, min_len=model.n_forecasts + model.n_lags)
_check_min_df_len(df=df_air_test, min_len=model.n_forecasts)
df_air_test = model.maybe_extend_df(df_air_train, df_air_test)

### 3. Fit model:

In [9]:
model.model.fit(df=df_air_train, freq='MS', progress="none", minimal=True)

### 4. Predict model:

In [10]:
fcst_train = model.model.predict(df=df_air_train)
fcst_test = model.model.predict(df=df_air_test)

### 5. Model-specific post-processing:
As you can see, the method is a class method and hence linked to the function

In [11]:
fcst_test, df_air_test = model.maybe_drop_added_values_from_df(fcst_test, df_air_test)

### 6. Data-specific post-processing:

In [12]:
fcst_train_df = return_df_in_original_format(fcst_train, received_ID_col, received_single_time_series)
fcst_test_df = return_df_in_original_format(fcst_test, received_ID_col, received_single_time_series)
fcst_train_df, df_air_train = maybe_drop_added_dates(fcst_train_df, df_air_train)
fcst_test_df, df_air_test = maybe_drop_added_dates(fcst_test_df, df_air_test)

### 7. Evaluation:

In [13]:
result_train, result_test = evaluate_forecast(fcst_train_df, fcst_test_df, metrics=['MAPE','MAE','RMSE'], metadata=None)
print(result_test)


{'MAPE': 7.274437, 'MAE': 30.610687, 'RMSE': 38.161297}
