Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Raises warning when getting non-finite score in SearchCV #18266

Merged
merged 21 commits into from
Aug 29, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions doc/whats_new/v0.24.rst
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,11 @@ Changelog
all distributions are lists and `n_iter` is more than the number of unique
parameter combinations. :pr:`18222` by `Nicolas Hug`_.

- |Fix| A fix to raise warning when one or more CV splits of
:class:`GridSearchCV` results in inf or -inf.
subrat93 marked this conversation as resolved.
Show resolved Hide resolved
:pr:`18266` by :user:`Subrat Sahu <subrat93>`,
:user:`Nirvan <Nirvan101>` and :user:`Arthur Book <ArthurBook>`.

:mod:`sklearn.multiclass`
.........................

Expand All @@ -364,7 +369,7 @@ Changelog
- |Enhancement| :class:`multiclass.OneVsOneClassifier` now accepts
the inputs with missing values. Hence, estimators which can handle
missing values (may be a pipeline with imputation step) can be used as
a estimator for multiclass wrappers.
a estimator for multiclass wrappers.
:pr:`17987` by :user:`Venkatachalam N <venkyyuvy>`.

:mod:`sklearn.multioutput`
Expand All @@ -375,11 +380,11 @@ Changelog
:pr:`18124` by :user:`Gus Brocchini <boldloop>` and
:user:`Amanda Dsouza <amy12xx>`.

- |Enhancement| :class:`multioutput.MultiOutputClassifier` and
- |Enhancement| :class:`multioutput.MultiOutputClassifier` and
:class:`multioutput.MultiOutputRegressor` now accepts the inputs
with missing values. Hence, estimators which can handle missing
values (may be a pipeline with imputation step, HistGradientBoosting
estimators) can be used as a estimator for multiclass wrappers.
estimators) can be used as a estimator for multiclass wrappers.
:pr:`17987` by :user:`Venkatachalam N <venkyyuvy>`.

:mod:`sklearn.naive_bayes`
Expand Down
9 changes: 9 additions & 0 deletions sklearn/model_selection/_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -863,6 +863,15 @@ def _store(key_name, array, weights=None, splits=False, rank=False):

array_means = np.average(array, axis=1, weights=weights)
results['mean_%s' % key_name] = array_means

if (key_name.startswith(("train_", "test_")) and
np.any(~np.isfinite(array_means))):
warnings.warn(
f"One or more of the {key_name.split('_')[0]} scores "
f"are non-finite: {array_means}",
category=UserWarning
)

# Weighted std is not directly available in numpy
array_stds = np.sqrt(np.average((array -
array_means[:, np.newaxis]) ** 2,
Expand Down
38 changes: 38 additions & 0 deletions sklearn/model_selection/tests/test_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
from sklearn.utils._mocking import CheckingClassifier, MockDataFrame

from scipy.stats import bernoulli, expon, uniform
from scipy.stats.distributions import norm


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this extra return line

from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.base import clone
Expand Down Expand Up @@ -1750,6 +1752,42 @@ def get_n_splits(self, *args, **kw):
ridge.fit(X[:train_size], y[:train_size])


@pytest.mark.parametrize(
"return_train_score, expected_msgs",
[(False, ('One or more of the test scores are non-finite')),
(True, ("One or more of the test scores are non-finite",
"One or more of the train scores are non-finite"))]
)
def test_gridsearchcv_raise_warning_with_non_finite_score(
return_train_score, expected_msgs):
# Non-regression test for:
# https://github.com/scikit-learn/scikit-learn/issues/10529
# Check that we raise a UserWarning when a non-finite score is
# computed in the GridSearchCV
subrat93 marked this conversation as resolved.
Show resolved Hide resolved
X = norm(-1, 0.5).rvs(100, random_state=np.random.RandomState(28))
kernel = 'epanechnikov'
steps = 10
lower = 0.0194867441113
upper = 0.0974337205567
bandwidth_range = np.linspace(lower, upper, steps)
grid = GridSearchCV(
KernelDensity(kernel=kernel),
param_grid={'bandwidth': bandwidth_range},
cv=20,
return_train_score=return_train_score
)

with pytest.warns(UserWarning) as warnings:
grid.fit(X[:, np.newaxis])

warnings = list(map(lambda warning: str(warning.message), warnings))
warnings = ",".join(warnings)
assert expected_msgs[0] in warnings

if return_train_score:
assert expected_msgs[1] in warnings


subrat93 marked this conversation as resolved.
Show resolved Hide resolved
def test_callable_multimetric_confusion_matrix():
# Test callable with many metrics inserts the correct names and metrics
# into the search cv object
Expand Down