Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Raise ValueError when trainset is empty in CVSplitters #12861

Merged
merged 5 commits into from Mar 23, 2019

Conversation

NicolasHug
Copy link
Member

Reference Issues/PRs

Closes #11028, following #11028 (comment)

What does this implement/fix? Explain your changes.

ValueError is raised in the CVSplitters if the resulting train set is empty. As far as I can tell, the test sets are never empty so no need to check.

No need to check either for the KFolds splitters or some *Group* splitters, which have internal checks preventing the train set to be empty.

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetics

def test_shuffle_split_empty_trainset(CVSplitter):
cv = CVSplitter(test_size=.99)
X, y = [[1]], [0] # 1 sample
assert_raises_regexp(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be moving to using pytest.raises for testing exceptions.

'p={} must be strictly less than the number of '
'samples={}'.format(self.p, n_samples)
)
for combination in combinations(range(n_samples), self.p):
yield np.array(combination)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are python >= 3 now, maybe:

yield from map(np.array, combinations(range(n_samples), self.p))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not, since this is not related to the PR

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NicolasHug
Copy link
Member Author

This is +2, @thomasjpfan can you merge this?

ValueError,
match='With n_samples=1, test_size=0.99 and train_size=None, '
'the resulting train set will be empty'):
train_test_split(X, test_size=.99)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that train_test_split is a commonly used function, a non-trival test such as X=[[1],[1],[1]] and test_size=0.67 would be useful.

@thomasjpfan thomasjpfan merged commit 2718d62 into scikit-learn:master Mar 23, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should train_test_split warn or error out on single sample array?
3 participants