Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] temporal_train_test_split with fh does not split per unique time series #4968

Open
yarnabrina opened this issue Jul 27, 2023 · 5 comments
Labels
module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting

Comments

@yarnabrina
Copy link
Collaborator

This can be a bug, or may be an expected behaviour as of now. In the latter case, I would want to submit this as a feature request that split by forecast horizon should work per series instead of on the entire dataset.

Reproducible code

from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.utils._testing.hierarchical import _make_hierarchical

# 6 unique (lowest level) time series, each of length 6
df = _make_hierarchical(
    hierarchy_levels=(2, 3), max_timepoints=6, min_timepoints=6, random_state=0
)

# split each unique time series into 4+2
df_tr_size, df_ts_size = temporal_train_test_split(df, test_size=2)

# split only entire dataset into 34+2
df_tr_fh, df_ts_fh = temporal_train_test_split(df, fh=[1, 2])

# show shapes
print(
    f"""
    {df.shape=}
    {df_tr_size.shape=}, {df_ts_size.shape=}
    {df_tr_fh.shape=}, {df_ts_fh.shape=}
    """
)

Results

    
    df.shape=(36, 1)
    df_tr_size.shape=(24, 1), df_ts_size.shape=(12, 1)
    df_tr_fh.shape=(34, 1), df_ts_fh.shape=(2, 1)
    

Version

Ubuntu WSL on Windows 11, Python 3.10.12, main at c48883e

@yarnabrina yarnabrina added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Jul 27, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Jul 27, 2023

There are some known issues with temporal_train_test_split.

I think we should have it dispatch to splitters, rather than having its own logic - that would solve the occasional odd behaviour, and the need to update it separately from the splitters.

Should we make this part of a splitter refactor?

@hazrulakmal
Copy link
Collaborator

hazrulakmal commented Aug 4, 2023

I think we should have it dispatch to splitters, rather than having its own logic - that would solve the occasional odd behaviour, and the need to update it separately from the splitters.

are you saying that you are suggesting to refactor temporal_train_test_split to be a splitter class? like inherit the BaseSplitter?

@fkiraly fkiraly changed the title temporal_train_test_split with fh does not split per unique time series [BUG] temporal_train_test_split with fh does not split per unique time series Aug 4, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 4, 2023

almost - I think temporal_train_test_split should continue to exist, but under the hood call some_splitter.split (or some_splitter.split ... [0]), to ensure we have only one place for boilerplate logic.

@hazrulakmal
Copy link
Collaborator

Ok got it, that's a good idea! I was about to disagree if the refactor meant converting temporal_train_test_split into a splitter class. For now, I think the closest splitter that does what temporal_train_test_split does is SingleWindowSplitter. However, we may need to add a few more logics underneath to make it behave as per the current implementation of temporal_train_test_split.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 4, 2023

Hm, perhaps the easiest is to write a new splitter, which has test_size, train_size, and fh? Perhaps called TrainTestSplitter?

Then it's called under the hood of temporal_train_test_split, applied to y

To get the split for X, we can ensure that TrainTestSplitter takes a y_template as an arg, to determine the cutoff. If this param is passed, it is used to determine the split-point, instead of the argument of split.

This refactor would also make it easy to allow an arbitrary number of args similar to sklearn's train_test_split, by virtue of applying the TrainTestSplitter with a template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

No branches or pull requests

3 participants