-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature-Request] Add a flag to StratifiedKFold to force classes with only 1 sample in training #10767
Comments
I can submit a PR for the same if needed. |
I've been tempted to phase out the current StratifiedKFold implementation, as I noted at #10274 (comment). I think it should be implemented as a stable sort on y followed by a round-robin, because:
Whether this is offered through a separate class, or through a |
Issue stale. I do not know if this is fixed or not fixed, but I no longer need this feature. |
Hey @akhilkedia would you mind re-opening this issue? I just ran into the same problem and would like to not open an identical issue. As a workaround I created a custom solution at https://github.com/automl/auto-sklearn/pull/1244/files but would rather see this in scikit-learn itself. |
Description
Add a flag to
StratifiedKFold
which ensures each class is present in training set.For
StratifiedKFold.split
, if some class has only 1 sample, currently this sample might be included in the test split rather than the training split. (sklearn
does give a warning.)While for some applications this can be acceptable, a flag which forces classes with a single sample to always be in training can be helpful
Steps/Code to Reproduce
Expected Results
There should be some flag in StratifiedKFold, so that atleast 1 element of class
1
is always present in train (in this case, the element at index2
)Actual Results
Versions
The text was updated successfully, but these errors were encountered: