-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Raise ValueError when trainset is empty in CVSplitters #12861
[MRG] Raise ValueError when trainset is empty in CVSplitters #12861
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetics
def test_shuffle_split_empty_trainset(CVSplitter): | ||
cv = CVSplitter(test_size=.99) | ||
X, y = [[1]], [0] # 1 sample | ||
assert_raises_regexp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be moving to using pytest.raises
for testing exceptions.
'p={} must be strictly less than the number of ' | ||
'samples={}'.format(self.p, n_samples) | ||
) | ||
for combination in combinations(range(n_samples), self.p): | ||
yield np.array(combination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are python >= 3 now, maybe:
yield from map(np.array, combinations(range(n_samples), self.p))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not, since this is not related to the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is +2, @thomasjpfan can you merge this? |
ValueError, | ||
match='With n_samples=1, test_size=0.99 and train_size=None, ' | ||
'the resulting train set will be empty'): | ||
train_test_split(X, test_size=.99) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that train_test_split
is a commonly used function, a non-trival test such as X=[[1],[1],[1]]
and test_size=0.67
would be useful.
…cikit-learn#12861)" This reverts commit 9314ff7.
…cikit-learn#12861)" This reverts commit 9314ff7.
Reference Issues/PRs
Closes #11028, following #11028 (comment)
What does this implement/fix? Explain your changes.
ValueError is raised in the CVSplitters if the resulting train set is empty. As far as I can tell, the test sets are never empty so no need to check.
No need to check either for the KFolds splitters or some
*Group*
splitters, which have internal checks preventing the train set to be empty.Any other comments?