Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add TimeSeriesCV and HomogeneousTimeSeriesCV #6351

Conversation

yenchenlin
Copy link
Contributor

This PR is an implementation of time series cross validation.
Based on #6322 , there are basically two cases:

  1. Homogeneous time series
  2. Heterogeneous time series

For brevity,
I'll use HomoTSCV to represent homogeneous time series and HeteroTSCV to represent heterogeneous time series in the following checklist.

Check List:

  • HomoTSCV implementation
  • Test HomoTSCV
  • HeteroTSCV implementation
  • Test HeteroTSCV
  • Documentation

Reference:
Using k-fold cross-validation for time-series model selection

@yenchenlin yenchenlin force-pushed the add-homogeneous-time-series-cv branch 2 times, most recently from 75e7401 to 04b9e79 Compare February 13, 2016 06:18
@@ -637,6 +637,121 @@ def split(self, X, y, labels=None):
"""
return super(StratifiedKFold, self).split(X, y, labels)

class HomogeneousTimeSeriesCV(_BaseKFold):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is convenient to make HomogeneousTimeSeriesCV subclass _BaseKFold since __init__ in _BaseKFold can help HomogeneousTimeSeriesCV to check whether input parameter n_folds is valid.

However, in order to make HomogeneousTimeSeriesCV work, HomogeneousTimeSeriesCV needs to override function split which is defined in its superclass _BaseKFold and its super-superclass BaseCrossValidator due to their implementation detail of split.

I don't think override split in HomogeneousTimeSeriesCV is a good solution since other subclasses of _BaseKFold didn't override split but override _iter_test_indices and _iter_test_masks instead, can @rvraghav93 provide some suggestions on this?

@yenchenlin yenchenlin force-pushed the add-homogeneous-time-series-cv branch 3 times, most recently from 73f5661 to 62bafb4 Compare February 14, 2016 12:26
@MechCoder
Copy link
Member

Please split the PR into two for now. Will be easier to review.


Notes
-----
The first ``n_samples % n_folds`` folds have size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is confusing, to say the least.

You should mention that for the first n_samples % n_folds, the number of samples in each fold are "incremented" by n_samples // n_folds + 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry maybe I'm too dumb.

the number of samples in each fold are "incremented" by n_samples // n_folds + 1

Why is "incremented" used here?
I think the number of samples in the first n_samples % n_folds folds is exactly n_samples // n_folds + 1,
which is "incremented" by 1 compared to other folds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is related to the prev comment. Sorry I meant in "each split the number of samples are incremented by"

You can move this to the n_folds documentation to avoid such confusion

@MechCoder
Copy link
Member

Just gave a first pass. I also agree that this would be highly useful!!

@yenchenlin
Copy link
Contributor Author

@MechCoder Thanks!
I will wait for your reply and then separate this.

@yenchenlin
Copy link
Contributor Author

Close this PR since I plan to separate this into two PRs:

  1. Homogeneous time series
  2. Heterogeneous time series

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants