**Not A Demo Notebook**
# Benchmarking workflows

- Univariate with Exogenous forecasting
- Hierarchical forecasting
- Multivariate forecasting

### Robust Model Performance Evaluation

In this notebook, we will demonstrate how to evaluate the performance of a model using time series cross-validation.

1. Data preparation
2. Model evaluation 
    - using splitter to split data into multiple windows
    - evaluate prediction performance for each window and across windows

In [11]:
import pandas as pd

from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import (
    ExpandingWindowSplitter,
    SingleWindowSplitter,
    SlidingWindowSplitter,
)
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import MeanSquaredError
from sktime.utils._testing.hierarchical import _make_hierarchical
from sktime.utils._testing.series import _make_series

In [12]:
# 3 unique series per level of hierarchy (2 levels)
y_hierarchical = _make_hierarchical(hierarchy_levels=(2, 3), random_state=0)
# 4 multivariate series
y_multivariate = _make_series(n_timepoints=12, n_columns=4, random_state=0)
y = _make_series(n_timepoints=12, random_state=0)

In [59]:
fh = [1, 2]
step_length = 2
window_length = 8
fh_timedelta = pd.timedelta_range(
    start="1 day", end="2 days", freq="D"
)  # pd.Timedelta(2, unit='D')

# each unique series (lowest hierachical level) should have two splits
cv = ExpandingWindowSplitter(
    initial_window=window_length, fh=fh, step_length=step_length
)
sliding = SlidingWindowSplitter(
    fh=fh, window_length=window_length, step_length=step_length, start_with_window=True
)
single = SingleWindowSplitter(fh=fh_timedelta, window_length=pd.offsets.Day(8))

In [5]:
for train, test in single.split(y_hierarchical):
    print(train)
    print(test)
    print("------------------")
    print(y_hierarchical.iloc[test])
    print()

[ 2  3  4  5  6  7  8  9 14 15 16 17 18 19 20 21 26 27 28 29 30 31 32 33
 38 39 40 41 42 43 44 45 50 51 52 53 54 55 56 57 62 63 64 65 66 67 68 69]
[10 11 22 23 34 35 46 47 58 59 70 71]
------------------
                            c0
h0   h1   time                
h0_0 h1_0 2000-01-11  3.697033
          2000-01-12  5.007263
     h1_1 2000-01-11  4.417426
          2000-01-12  2.810825
     h1_2 2000-01-11  3.205078
          2000-01-12  3.709339
h0_1 h1_0 2000-01-11  2.300194
          2000-01-12  4.330480
     h1_1 2000-01-11  2.918668
          2000-01-12  3.190249
     h1_2 2000-01-11  4.282080
          2000-01-12  3.681973



In [6]:
list(cv.split(y_hierarchical))

[(array([ 0,  1,  2,  3,  4,  5,  6,  7, 12, 13, 14, 15, 16, 17, 18, 19, 24,
         25, 26, 27, 28, 29, 30, 31, 36, 37, 38, 39, 40, 41, 42, 43, 48, 49,
         50, 51, 52, 53, 54, 55, 60, 61, 62, 63, 64, 65, 66, 67],
        dtype=int64),
  array([ 8,  9, 20, 21, 32, 33, 44, 45, 56, 57, 68, 69])),
 (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 12, 13, 14, 15, 16, 17, 18,
         19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 37, 38, 39,
         40, 41, 42, 43, 44, 45, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 60,
         61, 62, 63, 64, 65, 66, 67, 68, 69], dtype=int64),
  array([10, 11, 22, 23, 34, 35, 46, 47, 58, 59, 70, 71]))]

In [7]:
list(sliding.split(y_hierarchical))

[(array([ 0,  1,  2,  3,  4,  5,  6,  7, 12, 13, 14, 15, 16, 17, 18, 19, 24,
         25, 26, 27, 28, 29, 30, 31, 36, 37, 38, 39, 40, 41, 42, 43, 48, 49,
         50, 51, 52, 53, 54, 55, 60, 61, 62, 63, 64, 65, 66, 67],
        dtype=int64),
  array([ 8,  9, 20, 21, 32, 33, 44, 45, 56, 57, 68, 69])),
 (array([ 2,  3,  4,  5,  6,  7,  8,  9, 14, 15, 16, 17, 18, 19, 20, 21, 26,
         27, 28, 29, 30, 31, 32, 33, 38, 39, 40, 41, 42, 43, 44, 45, 50, 51,
         52, 53, 54, 55, 56, 57, 62, 63, 64, 65, 66, 67, 68, 69],
        dtype=int64),
  array([10, 11, 22, 23, 34, 35, 46, 47, 58, 59, 70, 71]))]

In [30]:
# docstring aggregration explaination for different multilevel options is not clear.
hierachical_scorer = MeanSquaredError(multilevel="uniform_average_time")
forecaster = NaiveForecaster(strategy="last")


def manual_evaluate(cv, y, fh, forecaster, scores):
    errors = []
    for train, test in cv.split_series(y):
        forecaster.fit(train)
        y_pred = forecaster.predict(fh)
        error = scores(test, y_pred)
        errors.append(error)

    for i, error in enumerate(errors):
        print(f"window/fold {i}: {error}")
    return errors


cv_errors = manual_evaluate(cv, y_hierarchical, fh, forecaster, hierachical_scorer)

window/fold 0: 2.216486065005522
window/fold 1: 1.1927155356395638


In [9]:
# or you can just use evaluate function!
backtest = evaluate(
    forecaster=forecaster,
    y=y_hierarchical,
    cv=cv,
    scoring=hierachical_scorer,
    return_data=True,
    error_score="raise",
)
backtest

Unnamed: 0,test_MeanSquaredError,fit_time,pred_time,len_train_window,cutoff,y_train,y_test,y_pred
0,2.216486,0.077827,0.144058,48,2000-01-08 00:00:00,c0 h0 h1 time ...,c0 h0 h1 time ...,c0 h0 h1 time ...
1,1.192716,0.052101,0.136132,60,2000-01-10 00:00:00,c0 h0 h1 time ...,c0 h0 h1 time ...,c0 h0 h1 time ...


In [11]:
# checking train data
backtest["y_train"][0]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,c0
h0,h1,time,Unnamed: 3_level_1
h0_0,h1_0,2000-01-01,5.317042
h0_0,h1_0,2000-01-02,3.953147
h0_0,h1_0,2000-01-03,4.531728
h0_0,h1_0,2000-01-04,5.793883
h0_0,h1_0,2000-01-05,5.420548
h0_0,h1_0,2000-01-06,2.575712
h0_0,h1_0,2000-01-07,4.503078
h0_0,h1_0,2000-01-08,3.401633
h0_0,h1_1,2000-01-01,4.314028
h0_0,h1_1,2000-01-02,3.674665


## Multivariate

In [7]:
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_multivariate, fh=[1, 2, 3])

In [8]:
forecaster.forecasters_

Unnamed: 0,0,1,2,3
forecasters,NaiveForecaster(),NaiveForecaster(),NaiveForecaster(),NaiveForecaster()


In [18]:
for train, test in cv.split_series(y_multivariate):
    print(f"TRAIN: \n {train}\n TEST: \n {test}")
    print("=====================================")

TRAIN: 
                    0         1         2         3
2000-01-01  5.317042  3.380954  3.685008  4.094989
2000-01-02  5.420548  2.003519  3.656359  1.702739
2000-01-03  3.449771  3.391395  2.850314  3.308369
2000-01-04  4.314028  3.102471  3.150133  2.187770
2000-01-05  5.047069  2.775638  3.019338  1.000000
2000-01-06  1.000000  3.634415  3.570706  1.111931
2000-01-07  5.822744  1.526431  2.752029  1.666912
2000-01-08  5.085769  4.450155  2.861218  2.232258
 TEST: 
                    0         1         2         3
2000-01-09  2.665204  1.000000  2.358358  2.010445
2000-01-10  4.783280  4.183176  2.318943  1.551793
TRAIN: 
                    0         1         2         3
2000-01-01  5.317042  3.380954  3.685008  4.094989
2000-01-02  5.420548  2.003519  3.656359  1.702739
2000-01-03  3.449771  3.391395  2.850314  3.308369
2000-01-04  4.314028  3.102471  3.150133  2.187770
2000-01-05  5.047069  2.775638  3.019338  1.000000
2000-01-06  1.000000  3.634415  3.570706  1.111931
2000

In [31]:
multivariate_scorer = MeanSquaredError(multioutput="raw_values")
cv_errors = manual_evaluate(
    single, y_multivariate, fh_timedelta, forecaster, multivariate_scorer
)

window/fold 0: [4.11026476 4.78402998 1.24432371 3.12115714]


In [25]:
backtest = evaluate(
    forecaster=forecaster,
    y=y_multivariate,
    cv=single,
    scoring=multivariate_scorer,
    return_data=True,
    error_score="raise",
)
backtest

Unnamed: 0,test_MeanSquaredError,fit_time,pred_time,len_train_window,cutoff,y_train,y_test,y_pred
0,"[4.110264763336158, 4.7840299836879066, 1.2443...",0.050207,0.154773,8,2000-01-10,0 1 2 ...,0 1 2 ...,0 1 2 ...


In [33]:
backtest["y_train"][0]

Unnamed: 0,0,1,2,3
2000-01-03,3.449771,3.391395,2.850314,3.308369
2000-01-04,4.314028,3.102471,3.150133,2.18777
2000-01-05,5.047069,2.775638,3.019338,1.0
2000-01-06,1.0,3.634415,3.570706,1.111931
2000-01-07,5.822744,1.526431,2.752029,1.666912
2000-01-08,5.085769,4.450155,2.861218,2.232258
2000-01-09,2.665204,1.0,2.358358,2.010445
2000-01-10,4.78328,4.183176,2.318943,1.551793


## Univariate w and w/o Exogenous Data

This is quite tricky as most splitter only accept one argument y as input. The only way to get around this and split X as well is by using SameLocSplitter.

In [46]:
X = y_multivariate.iloc[:, 1:]

from sktime.forecasting.model_selection import SameLocSplitter

cv_X = SameLocSplitter(sliding, y)

for train, test in cv_X.split_series(X):
    print(f"TRAIN: \n {train}\n TEST: \n {test}")
    print("=====================================")

TRAIN: 
                    1         2         3
2000-01-01  3.380954  3.685008  4.094989
2000-01-02  2.003519  3.656359  1.702739
2000-01-03  3.391395  2.850314  3.308369
2000-01-04  3.102471  3.150133  2.187770
2000-01-05  2.775638  3.019338  1.000000
2000-01-06  3.634415  3.570706  1.111931
2000-01-07  1.526431  2.752029  1.666912
2000-01-08  4.450155  2.861218  2.232258
 TEST: 
                    1         2         3
2000-01-09  1.000000  2.358358  2.010445
2000-01-10  4.183176  2.318943  1.551793
TRAIN: 
                    1         2         3
2000-01-03  3.391395  2.850314  3.308369
2000-01-04  3.102471  3.150133  2.187770
2000-01-05  2.775638  3.019338  1.000000
2000-01-06  3.634415  3.570706  1.111931
2000-01-07  1.526431  2.752029  1.666912
2000-01-08  4.450155  2.861218  2.232258
2000-01-09  1.000000  2.358358  2.010445
2000-01-10  4.183176  2.318943  1.551793
 TEST: 
                    1         2         3
2000-01-11  1.560779  1.000000  3.804871
2000-01-12  2.542722 

In [47]:
for train, test in sliding.split_series(y):
    print(f"TRAIN: \n {train}\n TEST: \n {test}")
    print("=====================================")

TRAIN: 
 2000-01-01    5.317042
2000-01-02    5.420548
2000-01-03    3.449771
2000-01-04    4.314028
2000-01-05    5.047069
2000-01-06    1.000000
2000-01-07    5.822744
2000-01-08    5.085769
Freq: D, Name: 0, dtype: float64
 TEST: 
 2000-01-09    2.665204
2000-01-10    4.783280
Freq: D, Name: 0, dtype: float64
TRAIN: 
 2000-01-03    3.449771
2000-01-04    4.314028
2000-01-05    5.047069
2000-01-06    1.000000
2000-01-07    5.822744
2000-01-08    5.085769
2000-01-09    2.665204
2000-01-10    4.783280
Freq: D, Name: 0, dtype: float64
 TEST: 
 2000-01-11    2.504437
2000-01-12    3.043338
Freq: D, Name: 0, dtype: float64


In [57]:
errors = []
mse = MeanSquaredError()
for (train_y, test_y), (train_x, future_x) in zip(
    sliding.split_series(y), cv_X.split_series(X)
):
    forecaster.fit(train_y, train_x)
    y_pred = forecaster.predict(fh, future_x)
    error = mse(test_y, y_pred)
    errors.append(error)

for i, error in enumerate(errors):
    print(f"window/fold {i}: {error}")

window/fold 0: 0.15905576081692285
window/fold 1: 0.5801545236174498


In [58]:
backtest = evaluate(
    forecaster=forecaster,
    y=y,
    X=X,
    cv=sliding,
    scoring=mse,
    return_data=True,
    error_score="raise",
)

# should evaluate also return X_train, X_future if there exists
backtest

Unnamed: 0,test_MeanSquaredError,fit_time,pred_time,len_train_window,cutoff,y_train,y_test,y_pred
0,0.159056,0.006003,0.029476,8,2000-01-08,2000-01-01 3.741330 2000-01-02 2.377435 ...,2000-01-09 1.874059 2000-01-10 2.387876 ...,2000-01-09 1.825921 2000-01-10 1.825921 ...
1,0.580155,0.007115,0.040105,8,2000-01-10,2000-01-03 2.956016 2000-01-04 4.218171 ...,2000-01-11 2.121321 2000-01-12 3.431551 ...,2000-01-11 2.387876 2000-01-12 2.387876 ...


In [57]:
backtest["y_train"][0]

2000-01-01    5.317042
2000-01-02    5.420548
2000-01-03    3.449771
2000-01-04    4.314028
2000-01-05    5.047069
2000-01-06    1.000000
2000-01-07    5.822744
2000-01-08    5.085769
2000-01-09    2.665204
2000-01-10    4.783280
Freq: D, Name: 0, dtype: float64

## Datatypes

In [70]:
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.utils._testing.hierarchical import _make_hierarchical

# 6 unique (lowest level) time series, each of length 6
df = _make_hierarchical(
    hierarchy_levels=(2, 3), max_timepoints=6, min_timepoints=6, random_state=0
)

# split each unique time series into 4+2
df_tr_size, df_ts_size = temporal_train_test_split(df, test_size=2)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,c0
h0,h1,time,Unnamed: 3_level_1
h0_0,h1_0,2000-01-01,5.317042
h0_0,h1_0,2000-01-02,3.953147
h0_0,h1_0,2000-01-03,4.531728
h0_0,h1_0,2000-01-04,5.793883
h0_0,h1_0,2000-01-05,5.420548
h0_0,h1_0,2000-01-06,2.575712
h0_0,h1_1,2000-01-01,4.503078
h0_0,h1_1,2000-01-02,3.401633
h0_0,h1_1,2000-01-03,3.449771
h0_0,h1_1,2000-01-04,3.963588


In [6]:
from sktime.datatypes import (
    MTYPE_LIST_HIERARCHICAL,
    MTYPE_REGISTER,
    check_is_mtype,
    check_is_scitype,
)

MTYPE_LIST_HIERARCHICAL

array(['pd_multiindex_hier', 'dask_hierarchical'], dtype=object)

In [7]:
MTYPE_REGISTER

[('pd.Series', 'Series', 'pd.Series representation of a univariate series'),
 ('pd.DataFrame',
  'Series',
  'pd.DataFrame representation of a uni- or multivariate series'),
 ('np.ndarray',
  'Series',
  '2D numpy.ndarray with rows=samples, cols=variables, index=integers'),
 ('xr.DataArray',
  'Series',
  'xr.DataArray representation of a uni- or multivariate series'),
 ('dask_series',
  'Series',
  'xdas representation of a uni- or multivariate series'),
 ('nested_univ',
  'Panel',
  'pd.DataFrame with one column per variable, pd.Series in cells'),
 ('numpy3D',
  'Panel',
  '3D np.array of format (n_instances, n_columns, n_timepoints)'),
 ('numpyflat',
  'Panel',
 ('pd-multiindex',
  'Panel',
  'pd.DataFrame with multi-index (instances, timepoints)'),
 ('pd-wide',
  'Panel',
  'pd.DataFrame in wide format, cols = (instance*timepoints)'),
 ('pd-long',
  'Panel',
  'pd.DataFrame in long format, cols = (index, time_index, column)'),
 ('df-list', 'Panel', 'list of pd.DataFrame'),
 ('dask_

In [13]:
print(check_is_scitype(y_hierarchical, scitype="Hierarchical"))
print(check_is_mtype(y_hierarchical, mtype="pd_multiindex_hier"))

True
True


## Tags and Estimator lookup

In [32]:
from sktime.registry import all_estimators, all_tags

all_tags("forecaster", as_dataframe=True)

Unnamed: 0,name,scitype,type,description
0,X-y-must-have-same-index,"[forecaster, regressor]",bool,do X/y in fit/update and X/fh in predict have ...
1,X_inner_mtype,"[clusterer, forecaster, transformer, transform...","(list, [pd.Series, pd.DataFrame, np.array, nes...",which machine type(s) is the internal _fit/_pr...
2,capability:insample,forecaster,bool,can the forecaster make in-sample predictions?
3,capability:pred_int,forecaster,bool,does the forecaster implement predict_interval...
4,capability:pred_int:insample,forecaster,bool,can the forecaster make in-sample predictions ...
5,capability:pred_var,forecaster,bool,does the forecaster implement predict_variance?
6,enforce_index_type,"[forecaster, regressor]",type,"passed to input checks, input conversion index..."
7,ignores-exogeneous-X,forecaster,bool,does forecaster ignore exogeneous data (X)?
8,remember_data,"[forecaster, transformer]",bool,whether estimator remembers all data seen as s...
9,requires-fh-in-fit,forecaster,bool,does forecaster require fh passed already in f...


In [37]:
all_estimators(
    "forecaster", as_dataframe=True, filter_tags={"scitype:y": "multivariate"}
)

Unnamed: 0,name,object
0,DynamicFactor,<class 'sktime.forecasting.dynamic_factor.Dyna...
1,VAR,<class 'sktime.forecasting.var.VAR'>
2,VARMAX,<class 'sktime.forecasting.varmax.VARMAX'>
3,VECM,<class 'sktime.forecasting.vecm.VECM'>
