[BUG] `temporal_train_test_split` with `fh` does not split per unique time series #4968

yarnabrina · 2023-07-27T18:45:38Z

This can be a bug, or may be an expected behaviour as of now. In the latter case, I would want to submit this as a feature request that split by forecast horizon should work per series instead of on the entire dataset.

Reproducible code

from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.utils._testing.hierarchical import _make_hierarchical

# 6 unique (lowest level) time series, each of length 6
df = _make_hierarchical(
    hierarchy_levels=(2, 3), max_timepoints=6, min_timepoints=6, random_state=0
)

# split each unique time series into 4+2
df_tr_size, df_ts_size = temporal_train_test_split(df, test_size=2)

# split only entire dataset into 34+2
df_tr_fh, df_ts_fh = temporal_train_test_split(df, fh=[1, 2])

# show shapes
print(
    f"""
    {df.shape=}
    {df_tr_size.shape=}, {df_ts_size.shape=}
    {df_tr_fh.shape=}, {df_ts_fh.shape=}
    """
)

Results

    
    df.shape=(36, 1)
    df_tr_size.shape=(24, 1), df_ts_size.shape=(12, 1)
    df_tr_fh.shape=(34, 1), df_ts_fh.shape=(2, 1)

Version

Ubuntu WSL on Windows 11, Python 3.10.12, main at c48883e

The text was updated successfully, but these errors were encountered:

fkiraly · 2023-07-27T20:24:27Z

There are some known issues with temporal_train_test_split.

I think we should have it dispatch to splitters, rather than having its own logic - that would solve the occasional odd behaviour, and the need to update it separately from the splitters.

Should we make this part of a splitter refactor?

hazrulakmal · 2023-08-04T19:20:52Z

I think we should have it dispatch to splitters, rather than having its own logic - that would solve the occasional odd behaviour, and the need to update it separately from the splitters.

are you saying that you are suggesting to refactor temporal_train_test_split to be a splitter class? like inherit the BaseSplitter?

fkiraly · 2023-08-04T21:06:33Z

almost - I think temporal_train_test_split should continue to exist, but under the hood call some_splitter.split (or some_splitter.split ... [0]), to ensure we have only one place for boilerplate logic.

hazrulakmal · 2023-08-04T21:26:05Z

Ok got it, that's a good idea! I was about to disagree if the refactor meant converting temporal_train_test_split into a splitter class. For now, I think the closest splitter that does what temporal_train_test_split does is SingleWindowSplitter. However, we may need to add a few more logics underneath to make it behave as per the current implementation of temporal_train_test_split.

fkiraly · 2023-08-04T22:04:29Z

Hm, perhaps the easiest is to write a new splitter, which has test_size, train_size, and fh? Perhaps called TrainTestSplitter?

Then it's called under the hood of temporal_train_test_split, applied to y

To get the split for X, we can ensure that TrainTestSplitter takes a y_template as an arg, to determine the cutoff. If this param is passed, it is used to determine the split-point, instead of the argument of split.

This refactor would also make it easy to allow an arbitrary number of args similar to sklearn's train_test_split, by virtue of applying the TrainTestSplitter with a template.

yarnabrina added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Jul 27, 2023

fkiraly changed the title ~~temporal_train_test_split with fh does not split per unique time series~~ [BUG] temporal_train_test_split with fh does not split per unique time series Aug 4, 2023

hazrulakmal mentioned this issue Aug 15, 2023

[BUG] temporal_train_test_split does not work on panel datatypes with unequal length series. #5107

Closed

yarnabrina mentioned this issue Oct 2, 2023

[BUG] fix temporal_train_test_split for hierarchical and panel data in case where fh is not passed #5330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `temporal_train_test_split` with `fh` does not split per unique time series #4968

[BUG] `temporal_train_test_split` with `fh` does not split per unique time series #4968

yarnabrina commented Jul 27, 2023

fkiraly commented Jul 27, 2023

hazrulakmal commented Aug 4, 2023 •

edited

fkiraly commented Aug 4, 2023 •

edited

hazrulakmal commented Aug 4, 2023

fkiraly commented Aug 4, 2023

[BUG] temporal_train_test_split with fh does not split per unique time series #4968

[BUG] temporal_train_test_split with fh does not split per unique time series #4968

Comments

yarnabrina commented Jul 27, 2023

fkiraly commented Jul 27, 2023

hazrulakmal commented Aug 4, 2023 • edited

fkiraly commented Aug 4, 2023 • edited

hazrulakmal commented Aug 4, 2023

fkiraly commented Aug 4, 2023

[BUG] `temporal_train_test_split` with `fh` does not split per unique time series #4968

[BUG] `temporal_train_test_split` with `fh` does not split per unique time series #4968

hazrulakmal commented Aug 4, 2023 •

edited

fkiraly commented Aug 4, 2023 •

edited