Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Raises warning when getting non-finite score in SearchCV #18266

Merged
merged 21 commits into from Aug 29, 2020

Conversation

subrat93
Copy link
Contributor

@subrat93 subrat93 commented Aug 26, 2020

Reference Issues/PRs

Fixes #10529
Supersedes and closes #10546
Supersedes and closes #15469

What does this implement/fix? Explain your changes.

The fix checks for the presence of any inf/-inf values in the mean score calculated after GridSearchCV.
If yes, it raises a warning - "One or more of the test scores are infinite"

Any other comments?

Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.
In addition to yourself, please add @Nirvan101 @ArthurBook as contributors to this PR.

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved
sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
sklearn/model_selection/tests/test_search.py Show resolved Hide resolved
@glemaitre glemaitre self-requested a review August 28, 2020 09:10
@glemaitre glemaitre changed the title Fixed -> inf or -inf values in CV ruin the mean and std ENH raise warning when getting non-finite value during scoring in GridSearchCV Aug 28, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre
Copy link
Contributor

Can you solve the linting issue: https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn/7266/workflows/6e341ac6-b7dc-498a-9133-59690fcef076/jobs/118045

sklearn/model_selection/tests/test_search.py:28:1: F401 'scipy.stats.distributions.norm' imported but unused
from scipy.stats.distributions import norm
^
sklearn/model_selection/tests/test_search.py:1790:1: W293 blank line contains whitespace
    
^
sklearn/model_selection/tests/test_search.py:1791:1: W293 blank line contains whitespace
    
^

@subrat93
Copy link
Contributor Author

@glemaitre, thanks for the review. I've resolved the linting issues.

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
@@ -26,6 +26,7 @@

from scipy.stats import bernoulli, expon, uniform


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this extra return line

Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 small fixes otherwise LGTM.

subrat93 and others added 2 commits August 28, 2020 19:38
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre removed their request for review August 28, 2020 14:29
@glemaitre
Copy link
Contributor

@thomasjpfan @adrinjalali Do you want to have a look?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @subrat93 !

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
@@ -1750,6 +1750,43 @@ def get_n_splits(self, *args, **kw):
ridge.fit(X[:train_size], y[:train_size])


@pytest.mark.parametrize("return_train_score", [False, True])
def test_gridsearchcv_raise_warning_with_non_finite_score(return_train_score):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also check for RandomSearchCV here as well:

@pytest.mark.parametrize("return_train_score", [False, True])
@pytest.mark.parametrize(
    "SearchCV, specialized_params",
    [(GridSearchCV, {"param_grid": {"max_depth": [2, 3]}}),
     (RandomizedSearchCV,
      {"param_distributions": {"max_depth": [2, 3]}, "n_iter": 2})])
def test_searchcv_raise_warning_with_non_finite_score(
        SearchCV, specialized_params, return_train_score):
	...
    grid = SearchCV(
        DecisionTreeClassifier(),
        scoring=FailingScorer(),
        cv=3,
        return_train_score=return_train_score,
        **specialized_params
    )

sklearn/model_selection/tests/test_search.py Outdated Show resolved Hide resolved
subrat93 and others added 4 commits August 29, 2020 12:09
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @subrat93 !

@thomasjpfan thomasjpfan changed the title ENH raise warning when getting non-finite value during scoring in GridSearchCV ENH Raises warning when getting non-finite score in SearchCV Aug 29, 2020
@thomasjpfan thomasjpfan merged commit 192c233 into scikit-learn:master Aug 29, 2020
7 checks passed
@glemaitre
Copy link
Contributor

Thanks @subrat93

@subrat93
Copy link
Contributor Author

Thanks a lot @glemaitre and @thomasjpfan for your support!! I look forward to contributing more.

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…learn#18266)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inf or -inf values in CV ruin the mean and std
3 participants