# Kats 206 - Backtesting with Kats

We begin by loading the `air_passengers` data set into a `TimeSeriesData` object.  This code is essentially the same as the code in our introduction to the `TimeSeriesData` object in the Kats 101 Tutorial.

In [18]:
import pandas as pd
from kats.consts import TimeSeriesData
from kats.models.prophet import ProphetModel, ProphetParams

air_passengers_df = pd.read_csv("https://raw.githubusercontent.com/facebookresearch/Kats/main/kats/data/air_passengers.csv")

# Note: If the column holding the time values is not called time, you will want to specify the name of this column.
air_passengers_df.columns = ["time", "value"]
air_passengers_ts = TimeSeriesData(air_passengers_df)

# 1. Simple Backtesting

Kats provides a backtesting module that makes it easy to to compare an evaluate different forecasting models.  While our hyperparameter tuning module allows you to compare different sets of parameters for a single base forecasting model, backtesting allows you to compare different types of base models (with pre-specified parameters).  

Our backtesting module allows you to look at multiple error metrics in a single function call.  Here are the error metrics that are currently supported:
* Mean Absolute Error (MAE)
* Mean Absolute Percentage Error (MAPE)
* Symmetric Mean Absolute Percentage Error (SMAPE)
* Mean Squared Error (MSE)
* Mean Absolute Scaled Error (MASE)
* Root Mean Squared Error (RMSE)

Our example below shows how you can use the `BackTesterSimple` class to compare errors between an additive and multiplicative Prophet model using the `air_passengers` data set.

In [2]:
from kats.utils.backtesters import BackTesterSimple

Here, we define a backtester to look at each of the supported error metrics.  We specify in the `BackTesterSimple` definition that we are using a 75/25 training-test split to train and evaluate the metrics for this model

In [None]:
params = ProphetParams()
ALL_ERRORS = ['mape', 'smape', 'mae', 'mase', 'mse', 'rmse']

backtester = BackTesterSimple(
    error_methods=ALL_ERRORS,
    data=air_passengers_ts,
    params=params,
    train_percentage=75,
    test_percentage=25, 
    model_class=ProphetModel)


backtester.run_backtest()

After we run the backtester, the `errors` attribute will be a dictionary mapping each error type name to a its corresponding value

In [4]:
backtester.errors

{'mape': 0.0987220885319523,
 'smape': 0.09425996205740349,
 'mae': 39.98767197068843,
 'mase': 1.9681144898176919,
 'mse': 2021.2020419795579,
 'rmse': 44.95778066118876}

## 1.1 Comparing two Simple Backtests

First, we store the previous backtest results in a nested dict.

In [5]:
backtester_errors = {}
backtester_errors['prophet_additive'] = {}
for error, value in backtester.errors.items():
    backtester_errors['prophet_additive'][error] = value

Now we run another backteseter to caluculate the same error metrics for a multiplicative Prophet model.

In [None]:
params_prophet = ProphetParams(seasonality_mode='multiplicative') # additive mode gives worse results

backtester_prophet = BackTesterSimple(
    error_methods=ALL_ERRORS,
    data=air_passengers_ts,
    params=params_prophet,
    train_percentage=75,
    test_percentage=25, 
    model_class=ProphetModel)

backtester_prophet.run_backtest()

backtester_errors['prophet_multiplicative'] = {}
for error, value in backtester_prophet.errors.items():
    backtester_errors['prophet_multiplicative'][error] = value

Here we can compare the error metrics for the two models.

In [7]:
pd.DataFrame.from_dict(backtester_errors)

Unnamed: 0,prophet_additive,prophet_multiplicative
mape,0.098722,0.074719
smape,0.09426,0.071171
mae,39.987672,29.818648
mase,1.968114,1.467615
mse,2021.202042,1142.139138
rmse,44.957781,33.795549


# 2. Time Series CrossValidation 
CrossValidation is a utility function to help make using the different non-simple BackTesters simpler.
It essentially wraps the RollingOrigin Backtester to make it a little more accessible to data scientists new to timeseries.

In [8]:
from kats.utils.backtesters import CrossValidation

In [None]:
ts = TimeSeriesData(df=air_passengers_df)
params = ProphetParams(seasonality_mode="multiplicative")
all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
cv = CrossValidation(
    error_methods=all_errors,
    data=ts,
    params=params,
    train_percentage=50,
    test_percentage=25,
    num_folds=3,
    model_class=ProphetModel,
    constant_train_size=False
  )
cv.run_cv()

In [10]:
cv.get_error_value("mase")

1.8466151287639943

# 3. Rolling Origin Backtester
For more information on rolling origin evaluation, see:
https://openforecast.org/adam/rollingOrigin.html

In [11]:
from kats.utils.backtesters import BackTesterRollingOrigin

## 3.1 Rolling Origin with expanding train size

In [None]:
params = ProphetParams(seasonality_mode="multiplicative")
all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]

backtester = BackTesterRollingOrigin(
    error_methods=all_errors,
    data=air_passengers_ts,
    params=params,
    start_train_percentage=50,
    test_percentage=20,
    expanding_steps=3,
    model_class=ProphetModel,
    constant_train_size=False,    
)
backtester.run_backtest()

In [13]:
backtester_errors = {}
backtester_errors['prophet_expanding'] = {}
for error, value in backtester.errors.items():
    backtester_errors['prophet_expanding'][error] = value

## 3.2 Rolling Origin with constant train size

In [None]:
params = ProphetParams(seasonality_mode="multiplicative")
all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]

backtester = BackTesterRollingOrigin(
    error_methods=all_errors,
    data=air_passengers_ts,
    params=params,
    start_train_percentage=50,
    test_percentage=20,
    expanding_steps=3,
    model_class=ProphetModel,
    constant_train_size=True,    
)
backtester.run_backtest()

In [16]:
backtester_errors['prophet_constant'] = {}
for error, value in backtester.errors.items():
    backtester_errors['prophet_constant'][error] = value

In [17]:
pd.DataFrame.from_dict(backtester_errors)

Unnamed: 0,prophet_expanding,prophet_constant
mape,0.074277,0.081833
smape,0.075449,0.08407
mae,26.144465,30.195053
mase,1.494274,1.553073
mse,1043.690635,1305.484984
rmse,31.217452,35.951676
