-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StratifiedShuffleSplit generates overlapping train and test indices #6121
Comments
Could you put together a standalone example, so that we can try to reproduce the problem ? |
Please check with the reproducible example. Thanks. |
I can reproduce it, here is a stand-alone snippet that reproduces the problem: from sklearn.cross_validation import StratifiedShuffleSplit
from numpy.testing import assert_array_equal
import numpy as np
rng = np.random.RandomState(0)
labels = rng.randint(low=0, high=10, size=100)
sss = StratifiedShuffleSplit(labels, n_iter=1,
test_size=0.5, random_state=0)
train, test = next(iter(sss))
assert_array_equal(np.intersect1d(train, test), []) The output:
|
Also I tested this issue happens on master, 0.17 and 0.16. I didn't bother to check older versions. |
@MagicYoung can you tweak the title so that it is self-explanatory, e.g. something like StratifiedShuffleSplit generates overlapping train and test indices ? |
I did not check with the implementation of StratifiedShuffleSplit, but I guess the issue is relevant to the sample distribution of the array. @lesteve Title has already been changed. |
Why there is overlap between
dev_idx
andt_idx
in the following code? It should have been no overlap.The second
assert_equal()
test prompted a error as follows:The text was updated successfully, but these errors were encountered: