Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655

Merged
merged 25 commits into from Nov 21, 2017
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 26 additions & 6 deletions sklearn/neighbors/regression.py
Expand Up @@ -5,8 +5,12 @@
# Alexandre Gramfort <alexandre.gramfort@inria.fr>
# Sparseness support by Lars Buitinck
# Multi-output support by Arnaud Joly <a.joly@ulg.ac.be>
# Empty radius support by Andreas Bjerre-Nielsen
#
# License: BSD 3 clause (C) INRIA, University of Amsterdam
# License: BSD 3 clause (C) INRIA, University of Amsterdam,
# University of Copenhagen

import warnings

import numpy as np

Expand Down Expand Up @@ -274,7 +278,7 @@ def predict(self, X):

Returns
-------
y : array of int, shape = [n_samples] or [n_samples, n_outputs]
y : array of float, shape = [n_samples] or [n_samples, n_outputs]
Target values
"""
X = check_array(X, accept_sparse='csr')
Expand All @@ -288,13 +292,29 @@ def predict(self, X):
_y = _y.reshape((-1, 1))

if weights is None:
y_pred = np.array([np.mean(_y[ind, :], axis=0)
for ind in neigh_ind])

with warnings.catch_warnings():
warnings.filterwarnings('ignore', "Mean of empty slice")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It seems errstate won't cover this.

warnings.filterwarnings('ignore', ( "invalid value "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

                warnings.filterwarnings('ignore',
                                        "invalid value encountered in divide")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this before you fixed it.

"encountered "
"in true_divide"))

y_pred = np.array([np.mean(_y[ind, :], axis=0)
for ind in neigh_ind])
else:
y_pred = np.array([(np.average(_y[ind, :], axis=0,
weights=weights[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just add if len(ind) else np.nan here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, great idea - I've changed this now.

Copy link
Contributor Author

@abjer abjer Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on second thought I guess that it should match the second axis of _y? I have made a commit where it is inserting np.full(_y.shape[1], np.nan) under the else clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Is it tested?

Copy link
Contributor Author

@abjer abjer Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've tested it - works for one or more _y variables. Should I document these tests?

empty_obs = np.full_like(_y[0], np.nan)
y_pred = np.array([np.average(_y[ind, :], axis=0,
weights=weights[i])
if len(ind) else empty_obs
for (i, ind) in enumerate(neigh_ind)])

if np.max(np.isnan(y_pred)):
empty_warning_msg = ("One or more samples have no neighbors "
"within specified radius; predicting NaN.")

warnings.warn(empty_warning_msg)


if self._y.ndim == 1:
y_pred = y_pred.ravel()

Expand Down
12 changes: 12 additions & 0 deletions sklearn/neighbors/tests/test_neighbors.py
Expand Up @@ -20,6 +20,7 @@
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import assert_true
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.validation import check_random_state

Expand Down Expand Up @@ -658,6 +659,17 @@ def test_radius_neighbors_regressor(n_samples=40,
y_pred = neigh.predict(X[:n_test_pts] + epsilon)
assert_true(np.all(abs(y_pred - y_target) < radius / 2))

# test fix for issue #9654
# test that nan is returned when no nearby observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the test here, similar test will execute 12 times(4 algorithm X 3 weight), but seems that the change is actually unrelated with algorithm? I think a seperate test(e.g., test_radius_neighbors_regressor_nan_output) with 2 cases (uniform weight and distance weight) might be enough.

X_test_nan = np.ones((1, n_features))*-1

empty_warning_msg = ("One or more samples have no neighbors "
"within specified radius; predicting NaN.")
pred = assert_warns_message(UserWarning, empty_warning_msg,
neigh.predict, X_test_nan)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abjer Please only leave two blank lines between functions. This flake8 error will not be detected for some reason but we generally don't want to introduce more flake8 errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have now recommitted.

assert_true(np.all(np.isnan(pred)))


def test_RadiusNeighborsRegressor_multioutput_with_uniform_weight():
# Test radius neighbors in multi-output regression (uniform weight)
Expand Down