New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] splitter that replicates loc
of another splitter
#4851
Conversation
this is weird, equality tests fails only on windows, as array type seems to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one comment, I am also fine if this comment is answered on #4862 since it is exactly the same comment
review comments addressed, kindly re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the comments :)
…not equal length (#4861) This refactors the `evaluate` forecasting benchmark tool and adds features: * refactors the internal `_split` to use the `BaseSplitter` interface instead of separate logic. * to handle `X`, we use the `SameLocSplitter` from #4851 and the `TestPlusTrainSplitter` from #4862 * As #4851 is more general and allows `y` and `X` of different length, this fixes #4842 * new argument to `evaluate` which allows the user to pass the splitter for `X` explicitly as `cv_X`. If not passed, defaults to the `SameLocSplitter`, i.e., `X` split indices are same as from `y`. (requires no deprecation as added as last arg, and default is existing behaviour) * proper docstring Note: this changes behaviour in cases where `y` and `X` had same length but different index set. However, I would contend (is this correct?) that in these cases behaviour was generally unexpected or buggy, so no deprecation is needed. Depends on: #4851 #4862 Related to, and indirectly fixes #4842 Includes MRE from #4842 as an integration test.
This adds a splitter that takes a splitter and some data, and always replicates the same
loc
references for splits of another splitter.Related issue and discussion: #4842
More general form of
temporal_train_test_split
, particularly useful for the case where we want to splitX
andy
, and primarilyy
, andX
can have different indices fromy
.This ability partially addresses #4842, FYI @felipeangelimvieira