Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set #9655

Merged
merged 25 commits into from Nov 21, 2017

Conversation

abjer
Copy link
Contributor

@abjer abjer commented Aug 30, 2017

Reference Issue

Fixes #9654

What does this implement/fix? Explain your changes.

RadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. This PR fixes this inconsistency. This PR also fixes raised error when no available data points for RadiusNeighborRegression using non-uniform weights.

Fixes problem with raise error when no available data points for RadiusNeighborRegression using non-uniform weights.
@abjer abjer changed the title [WIP] Fix to issue 9654 [WIP] Return nan for RadiusNeighborsRegressor with non-uniform weights Aug 30, 2017
@@ -291,9 +291,28 @@ def predict(self, X):
y_pred = np.array([np.mean(_y[ind, :], axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be issuing a warning in this case? I think the tests should assert either that a warning is raised, or that none is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a test now that asserts no warning is raised.

@@ -291,9 +291,28 @@ def predict(self, X):
y_pred = np.array([np.mean(_y[ind, :], axis=0)
for ind in neigh_ind])
else:
y_pred = np.array([(np.average(_y[ind, :], axis=0,
weights=weights[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just add if len(ind) else np.nan here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, great idea - I've changed this now.

Copy link
Contributor Author

@abjer abjer Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on second thought I guess that it should match the second axis of _y? I have made a commit where it is inserting np.full(_y.shape[1], np.nan) under the else clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Is it tested?

Copy link
Contributor Author

@abjer abjer Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've tested it - works for one or more _y variables. Should I document these tests?

y_pred_alt_isnan = np.all(np.isnan(y_pred_alt))
raise_zero_div_error = False

except ZeroDivisionError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary. The test will fail if there is an error raised.

I was asking about handling the warning which numpy issues when performing np.mean([]). I'm not sure whether it's good or bad for this warning to be issued, but we should be consistent in the weighted case. warnings are tested with our assert_warns_message and assert_no_warnings helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point.

I looked into raising the warning - apparently warnings only arise when there are uniform weights. What do you think of the following solution?

# test fix for issue #9654
# test that nan is returned when no nearby observations        
X_test_nan = np.ones((1,n_features))*-1
if weights=='uniform':        
    assert_warns_message(RuntimeWarning, "Mean of empty slice.", 
                         neigh.predict, X_test_nan)

assert_true(np.all(np.isnan(neigh.predict(X_test_nan))))

@@ -291,9 +291,28 @@ def predict(self, X):
y_pred = np.array([np.mean(_y[ind, :], axis=0)
for ind in neigh_ind])
else:
y_pred = np.array([(np.average(_y[ind, :], axis=0,
weights=weights[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Is it tested?

@abjer
Copy link
Contributor Author

abjer commented Sep 8, 2017

@jnothman I think it is ready now for merge - do you agree?

@jnothman
Copy link
Member

jnothman commented Sep 9, 2017 via email

@abjer
Copy link
Contributor Author

abjer commented Sep 9, 2017

Alright, cheers!

@jnothman
Copy link
Member

@jnothman I think it is ready now for merge - do you agree?

You should change the title from WIP to MRG

@jnothman jnothman changed the title [WIP] Return nan for RadiusNeighborsRegressor with non-uniform weights [MRG] Return nan for RadiusNeighborsRegressor with non-uniform weights Sep 10, 2017
# test that nan is returned when no nearby observations
X_test_nan = np.ones((1, n_features))*-1
if weights == 'uniform':
assert_warns_message(RuntimeWarning, "Mean of empty slice.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should issue a warning in all casess, not dependent on weights.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also get the output, i.e. pred = assert_warns_message(...) so that you can then do the isnan assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again, thanks for the feedback! Just to be clear on what you want me to do: raise a warning for RadiusNeighborRegressor when it returns one or more nan values irrespective of using weights or not. I guess this should be consistent with and without weights. So should we copy the warning from np.mean (without weights) and issue it when y_pred contains nan rows (and there are weights)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so. Either that, or we can be kinder to the user: hide the numpy error, and raise our own which is more explicit, with a message like "Some samples have no neighbors; predicting NaN."

Added new warning when prediction of RadiusNeighborRegression returns empty values. Also suppressed warning for numpy warnings.

Moreover, added copyright and fixed erroneous description of output.
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM, thanks!

Please add an entry to whats_new

Fixed more PEP8 and used @jnothman idea of using full_like
for ind in neigh_ind])

with warnings.catch_warnings():
warnings.filterwarnings('ignore', "Mean of empty slice")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It seems errstate won't cover this.


with warnings.catch_warnings():
warnings.filterwarnings('ignore', "Mean of empty slice")
warnings.filterwarnings('ignore', ( "invalid value "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

                warnings.filterwarnings('ignore',
                                        "invalid value encountered in divide")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this before you fixed it.

@jnothman jnothman changed the title [MRG] Return nan for RadiusNeighborsRegressor with non-uniform weights [MRG+1] Return nan for RadiusNeighborsRegressor with non-uniform weights Sep 27, 2017
@jnothman
Copy link
Member

LGTM

@abjer
Copy link
Contributor Author

abjer commented Sep 27, 2017

Great :)

@jnothman
Copy link
Member

Please still add an entry in whats_new

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set
when using non uniform weights. Also raises a new warning, irrespective of
weighting, when no neighbors are found for one or more samples. Moreover,
all numpy errors related to missing neighbors are suppressed. :issue:`9654`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually prefer the pull request number here, #9655.

- Fix bug so that the method ``predict`` for
:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set
when using non uniform weights. Also raises a new warning, irrespective of
weighting, when no neighbors are found for one or more samples. Moreover,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just "... samples, in place of unclear numpy errors."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks

@abjer
Copy link
Contributor Author

abjer commented Oct 7, 2017

@jnothman do you need me to do anything more or is this ready now as it is?

@abjer
Copy link
Contributor Author

abjer commented Oct 18, 2017

@jnothman: good suggestion - I've pushed an update

@abjer
Copy link
Contributor Author

abjer commented Nov 16, 2017

@jnothman hi again joel, is there anything I can do to get a 2nd review? do you have some candidates for whom I should tag?

@jnothman
Copy link
Member

This is pretty straightforward, though a more thorough PR description might help a reviewer come on board. Key idea: RadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. Perhaps @qinhanmin2014, @lesteve...

@abjer
Copy link
Contributor Author

abjer commented Nov 16, 2017

Thanks a lot Joel - will follow those suggestions

@abjer
Copy link
Contributor Author

abjer commented Nov 16, 2017

Hi @qinhanmin2014 and @lesteve would you mind taking a look at this PR?

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some concerns before my personal LGTM. Kindly forgive if some of them are naive.
@jnothman would be grateful if you can check these comments.

with warnings.catch_warnings():
warnings.filterwarnings('ignore', "Mean of empty slice")
warn_div = "invalid value encountered in true_divide"
warnings.filterwarnings('ignore', warn_div)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1)I don't think our code should change the default behaviour of other libraries. In this case, after user run RadiusNeighborsRegressor, he will not get any warning when he execute np.mean([]), it might be better to at least reset the warnings.
(2) invalid value encountered in true_divide seems a bit magical, could you please add some comments? Also, is there any specific reason you are blocking warnings here? It seems that you can use the same way as in the else clause. i.e.

empty_obs = np.full_like(_y[0], np.nan)
y_pred = np.array([np.mean(_y[ind, :], axis=0, weights=weights[i]) if len(ind) else empty_obs
                             for (i, ind) in enumerate(neigh_ind)])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The with catch_warnings context will reset when exited. But is there a reason we are not using numpy's own error manager?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The with catch_warnings context will reset when exited. But is there a reason we are not using numpy's own error manager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I can implement the numpy warnings if you wish it to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Thanks a lot for the confirm. If you think blocking warnings here is the right way, I won't oppose. My main concern is the redundant tests for this edge case
@abjer If you don't have specific reason, please consider to address @jnothman's comment here. Also, please consider whether we should cut off some redundant tests.
Another thing @abjer could you please update the document to tell users what we are doing in this special case? I think it might not be so straightforward for ordinary users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abjer You need to solve the conflict before this PR can be merged (e.g., merge current master into this branch). Thanks a lot for your issue and PR and also for your great patience :)

@@ -658,6 +659,17 @@ def test_radius_neighbors_regressor(n_samples=40,
y_pred = neigh.predict(X[:n_test_pts] + epsilon)
assert_true(np.all(abs(y_pred - y_target) < radius / 2))

# test fix for issue #9654
# test that nan is returned when no nearby observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the test here, similar test will execute 12 times(4 algorithm X 3 weight), but seems that the change is actually unrelated with algorithm? I think a seperate test(e.g., test_radius_neighbors_regressor_nan_output) with 2 cases (uniform weight and distance weight) might be enough.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please fix the flake8 errors according to Travis log (See the bottom of https://travis-ci.org/scikit-learn/scikit-learn/jobs/304461869)? I think my LGTM is very close :)

y_pred = np.array([np.mean(_y[ind, :], axis=0)
for ind in neigh_ind])
y_pred = np.array([np.mean(_y[ind, :],
axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider to fill the current line before starting a new line, other places the same.


empty_warning_msg = ("One or more samples have no neighbors "
"within specified radius; predicting NaN.")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider to remove some unnecessary blank lines both in the code and in the test.

@@ -179,6 +179,14 @@ Metrics
:issue:`10093` by :user:`alexryndin <alexryndin>`
and :user:`Hanmin Qin <qinhanmin2014>`.

Neighbors

- Fix bug so that the method ``predict`` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a bug seems more consistent?

:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set
when using non uniform weights. Also raises a new warning, irrespective of
weighting, when no neighbors are found for samples, in place of unclear
numpy errors.. :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate . (numpy errors..)

@@ -179,6 +179,14 @@ Metrics
:issue:`10093` by :user:`alexryndin <alexryndin>`
and :user:`Hanmin Qin <qinhanmin2014>`.

Neighbors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc does not look fine, please follow what others are doing (not sure but maybe remove some extra whitespace at the beginning?).

X_test_nan)
assert_true(np.all(np.isnan(pred)))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abjer Please only leave two blank lines between functions. This flake8 error will not be detected for some reason but we generally don't want to introduce more flake8 errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have now recommitted.

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] Return nan for RadiusNeighborsRegressor with non-uniform weights [MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set Nov 20, 2017
Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ping @jnothman
Latest change since your +1
(1) not block warnings, use similar way as another case instead.
(2)remove redundant tests, only keep two tests for the two cases.

@abjer
Copy link
Contributor Author

abjer commented Nov 20, 2017

@qinhanmin2014 thank you so much for the review and very constructive suggestions

@jnothman
Copy link
Member

Great!

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from what's new being unclear, this is good to merge


- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can
handle empty neighbor set when using non uniform weights. Also raises a new
warning, irrespective of, when no neighbors are found for samples, in place of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Irrespective of" can be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what in place of duplicate means

- Fixed a bug so ``predict`` in :class:`neighbors.RadiusNeighborsRegressor` can
handle empty neighbor set when using non uniform weights. Also raises a new
warning, irrespective of, when no neighbors are found for samples, in place of
duplicate. (numpy errors..) :issue:`9655`. By :user:`Andreas Bjerre-Nielsen <abjer>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "numpy errors" here. Drop it

@qinhanmin2014
Copy link
Member

Apart from what's new being unclear, this is good to merge

Sorry that I somehow miss the what's new in the final check

@abjer I believe current what's new is just a misoperation? The previous what's new seems fine (except for the format I've pointed out). So please correct current what's new accordingly. I'll merge when CIs are green.

@abjer
Copy link
Contributor Author

abjer commented Nov 21, 2017

@jnothman thanks for pointing that out - I have submitted a new version. I removed the mention of duplicate errors as @qinhanmin2014 idea of computation fixed the issue. Thanks both of you for the big support in this PR.

@jnothman jnothman merged commit 8e1efb0 into scikit-learn:master Nov 21, 2017
@qinhanmin2014
Copy link
Member

Thanks @abjer for your contributions and your patience :)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RadiusNeighborRegression error
3 participants