Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

Merged
merged 2 commits into from
Apr 12, 2018

Conversation

lagru
Copy link
Contributor

@lagru lagru commented Apr 10, 2018

as well as a typo I noticed in THANKS.txt.

@lagru
Copy link
Contributor Author

lagru commented Apr 10, 2018

@larsoner These are only small changes to the docstring and I it would be nice to have this included in 1.1.0.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

The second commit addresses the behavior of the functions if NaNs are present in the signal. This came up in #8708. It might be worth discussing whether this needs to be addressed further. It might be worth it to raise a warning message if any NaNs are encountered (a candidate for 1.2?).

@lagru lagru changed the title DOC: MAINT: Correct formatting & a few typos in signal.find_peaks DOC: Describe signal.find_peaks, .peak_prominences behavior for NaNs Apr 11, 2018
@larsoner
Copy link
Member

larsoner commented Apr 11, 2018 via email

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

I don't think we need to warn so long as it's documented clearly in the docstring. Maybe put it first in the notes, and somewhere in an example? -- @larsoner

Yeah, I was debating that with myself as well. I'll change it. For the future, is there a general rule of thumb when to use a warning or a simple note? .. note:: could be used here too.

@lagru lagru force-pushed the find-peaks branch 2 times, most recently from f97dbe6 to ade2b61 Compare April 11, 2018 12:50
@lagru lagru changed the title DOC: Describe signal.find_peaks, .peak_prominences behavior for NaNs DOC: Describe signal.find_peaks and related functions behavior for NaNs Apr 11, 2018
@larsoner
Copy link
Member

+1 for .. note (or if you want, .. warning even).

I don't know of any rule of thumb. You don't want to warn too much or too little. To me this would fall into the "too much" category but this is just my subjective judgment based on expected use cases.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

I decided not to use the note-directive here... it doesn't seem to be a standard in SciPy and this felt more natural.

I'll have a look and try to come up with an elegant way to deal with NaNs. Otherwise I'll add warning messages if NaNs are found in the input. In the meantime I'm hesitant to introduce any in-code changes here as I don't want to block this from being potentially merged before 1.1 branches. So that would be another PR.

@larsoner
Copy link
Member

I'll have a look and try to come up with an elegant way to deal with NaNs. Otherwise I'll add warning messages if NaNs are found in the input.

I am arguing against such a warning so long as the behavior of NaN is properly documented. Are you saying we should have a warning if NaN is in the input?

@larsoner
Copy link
Member

... and actually the tests should probably be updated to test that NaN is handled how we say it will be handled.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

Are you saying we should have a warning if NaN is in the input? -- @larsoner

Yes, unless I think of another way to deal with NaNs without completely rewriting the algorithms.

... and actually the tests should probably be updated to test that NaN is handled how we say it will be handled. -- @larsoner

I agree although I don't see it as a priority because in most cases the documentation states to "expect the unexpected" if NaNs are contained in the input. I hope nobody tries to use that as a feature. 😉
If its okay I'll think of meaningful tests in another PR in the next few days.

@larsoner
Copy link
Member

Since the comparisons always evaluate to False, isn't nan behavior then the same as (assuming all other points are finite) setting the nan points to inf but never giving those indices as peak locations?

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

Not really. If we compare a finite number to nan it is both "not smaller", "not larger" and "not equal" at the same time. That means depending on the comparator chosen at different points in the algorithm nan is treated as +inf or -inf. Replacing with either one of those values would change the behavior.

In some cases this might actually be useful but your proposed case we don't gain anything. If we replace with +inf, we find new peaks at those positions. If never return those inf-peaks we don't gain anything compared to just leaving them nan because the neighbouring samples still can't be peaks.

(Or did I misunderstand your suggested solution?)

@larsoner
Copy link
Member

I'm not proposing a solution -- I'm trying to describe the behavior of the current algorithm for the user when their input has NaNs. If it can be distilled to something equivalent that is simple to explain to the user then we can just clearly document this behavior rather than emitting a warning. So something like "Locations with the value NaN will be treated like np.inf when evaluating whether immediate neighbors are peaks, but will never be returned as peak locations themselves" would work, assuming it's correct.

It might not be possible to distill the other functions like prominence the same way, though.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

Well in case you didn't see it, I think the documentation for find_peaks already states almost what you want:

Samples bordering NaNs will never be considered peaks. This is true as
well for flat peaks. To avoid this, NaNs should either be removed from `x`
or replaced with (interpolated) values.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

It might not be possible to distill the other functions like prominence the same way, though.

Correct, those are more complicated to judge. That's why my warning in those cases was a bit more general.

@larsoner
Copy link
Member

Ahh okay. In that case, can you make a PR to add warnings for those other functions? I agree it makes sense to have them. It would be good to have for 1.1.0 if possible. Adding them, then using pytest.warns context manager in the tests should be enough for now. This should still be forward-compatible with a future nan_policy-type argument that could try to do something smart(er).

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

Ahh okay. In that case, can you make a PR to add warnings for those other functions?

I'll try. But if today is the deadline I have a feeling there won't be enough time left to properly review & iterate...

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

@larsoner On another note... do you consider this ready to be merged today?

@larsoner
Copy link
Member

I can rapidly iterate today if you can :)

But more seriously 1) I don't think it should be too difficult or require too much iteration since it's warning emission, and 2) I'd also consider it a form bug fix which means that tomorrow isn't a hard deadline (can backport).

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Apr 11, 2018

The updated docstring says

The algorithm treats NaNs the same way as a window border and stops
searching for an intersection possibly leading to unwanted results.

Is this the API that you designed, or is it an accident of the implementation (i.e. the code was not written with NaNs in mind, but if there are NaNs in the data, that is how it happens to be treated)? If the latter, it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined". That way there is no commitment to an API. Then if explicit handling of NaN is added later, there is no need to worry about backwards compatbility.

@larsoner
Copy link
Member

If the latter, it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined". That way there is no commitment to an API. Then if explicit handling of NaN is added later, there is no need to worry about backwards compatbility.

This makes sense to me. I was probably over-engineering a bit before. I think we can just say this in all the functions for now.

@WarrenWeckesser any opinion on .. note:: vs .. warning:: vs no-directive for these statements?

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

If the latter, [...]

Yes, I didn't have NaNs in mind when writing those functions so its just behavior resulting from the implementation.

[...] it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined".

I agree, here.

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Apr 11, 2018

No directive is probably fine, but that's not a strong opinion.

@lagru
Copy link
Contributor Author

lagru commented Apr 11, 2018

To avoid having to wait for CI to build the doc, here the images with .. warning::
find_peaks, peak_prominences, peak_widths.

@larsoner
Copy link
Member

Looks good to me

The behavior of these functions is undefined for data containing NaN.
This should at least be clearly documented with a warning in the
docstrings. In the future support for NaNs might be added.
@pv pv merged commit 209c33a into scipy:master Apr 12, 2018
@lagru lagru deleted the find-peaks branch April 12, 2018 18:39
@pv pv added this to the 1.1.0 milestone Apr 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants