Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG] Cross-validation for time series (inserting gaps between the training set and the test set) #13761
Time series have temporal dependence, which may cause information leaks during the cross-validation.
Concerning my implementation, I "refactored" the whole structure while still keeping the same public API.
Classes and functions added:
Related issues and PRs
In your examples in the docstrings, why do the training sets sometimes have larger values than the testing sets? That would mean training a model on the future and predicting data from the past.
Notice how for TimeSeriesSplit all the training indices precede the test indices:
Sorry I've not had time to look at this yet. Have you checked the build logs? https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=2928
I finally found the cause, which is the different interpretations of
Linux pylatest_conda interprets it as
Linux py35_conda_openblas and Linux py35_np_atlas interpret it as
According to the numpy manual, the first one is the correct interpretation, even for numpy