New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655
Changes from 9 commits
191b4f7
15d492e
83a6d52
6221c8c
f8febd5
dc15b84
891e38b
a2b49aa
af791f0
d4547bf
780db9d
8decafa
1c619a4
74c9574
0cd5ef3
a1a000e
e4db7d5
bc2eae9
2781b25
3dc86b8
770db3f
266c9a5
a66b531
7c35e8c
b47f5bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,8 +5,12 @@ | |
# Alexandre Gramfort <alexandre.gramfort@inria.fr> | ||
# Sparseness support by Lars Buitinck | ||
# Multi-output support by Arnaud Joly <a.joly@ulg.ac.be> | ||
# Empty radius support by Andreas Bjerre-Nielsen | ||
# | ||
# License: BSD 3 clause (C) INRIA, University of Amsterdam | ||
# License: BSD 3 clause (C) INRIA, University of Amsterdam, | ||
# University of Copenhagen | ||
|
||
import warnings | ||
|
||
import numpy as np | ||
|
||
|
@@ -274,7 +278,7 @@ def predict(self, X): | |
|
||
Returns | ||
------- | ||
y : array of int, shape = [n_samples] or [n_samples, n_outputs] | ||
y : array of float, shape = [n_samples] or [n_samples, n_outputs] | ||
Target values | ||
""" | ||
X = check_array(X, accept_sparse='csr') | ||
|
@@ -288,13 +292,29 @@ def predict(self, X): | |
_y = _y.reshape((-1, 1)) | ||
|
||
if weights is None: | ||
y_pred = np.array([np.mean(_y[ind, :], axis=0) | ||
for ind in neigh_ind]) | ||
|
||
with warnings.catch_warnings(): | ||
warnings.filterwarnings('ignore', "Mean of empty slice") | ||
warnings.filterwarnings('ignore', ( "invalid value " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote this before you fixed it. |
||
"encountered " | ||
"in true_divide")) | ||
|
||
y_pred = np.array([np.mean(_y[ind, :], axis=0) | ||
for ind in neigh_ind]) | ||
else: | ||
y_pred = np.array([(np.average(_y[ind, :], axis=0, | ||
weights=weights[i])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't we just add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, great idea - I've changed this now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, on second thought I guess that it should match the second axis of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's right. Is it tested? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I've tested it - works for one or more _y variables. Should I document these tests? |
||
empty_obs = np.full_like(_y[0], np.nan) | ||
y_pred = np.array([np.average(_y[ind, :], axis=0, | ||
weights=weights[i]) | ||
if len(ind) else empty_obs | ||
for (i, ind) in enumerate(neigh_ind)]) | ||
|
||
if np.max(np.isnan(y_pred)): | ||
empty_warning_msg = ("One or more samples have no neighbors " | ||
"within specified radius; predicting NaN.") | ||
|
||
warnings.warn(empty_warning_msg) | ||
|
||
|
||
if self._y.ndim == 1: | ||
y_pred = y_pred.ravel() | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
from sklearn.utils.testing import assert_raises | ||
from sklearn.utils.testing import assert_true | ||
from sklearn.utils.testing import assert_warns | ||
from sklearn.utils.testing import assert_warns_message | ||
from sklearn.utils.testing import ignore_warnings | ||
from sklearn.utils.validation import check_random_state | ||
|
||
|
@@ -658,6 +659,17 @@ def test_radius_neighbors_regressor(n_samples=40, | |
y_pred = neigh.predict(X[:n_test_pts] + epsilon) | ||
assert_true(np.all(abs(y_pred - y_target) < radius / 2)) | ||
|
||
# test fix for issue #9654 | ||
# test that nan is returned when no nearby observations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you put the test here, similar test will execute 12 times(4 algorithm X 3 weight), but seems that the change is actually unrelated with algorithm? I think a seperate test(e.g., test_radius_neighbors_regressor_nan_output) with 2 cases (uniform weight and distance weight) might be enough. |
||
X_test_nan = np.ones((1, n_features))*-1 | ||
|
||
empty_warning_msg = ("One or more samples have no neighbors " | ||
"within specified radius; predicting NaN.") | ||
pred = assert_warns_message(UserWarning, empty_warning_msg, | ||
neigh.predict, X_test_nan) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @abjer Please only leave two blank lines between functions. This flake8 error will not be detected for some reason but we generally don't want to introduce more flake8 errors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I have now recommitted. |
||
assert_true(np.all(np.isnan(pred))) | ||
|
||
|
||
def test_RadiusNeighborsRegressor_multioutput_with_uniform_weight(): | ||
# Test radius neighbors in multi-output regression (uniform weight) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. It seems errstate won't cover this.