DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

lagru · 2018-04-10T21:01:47Z

as well as a typo I noticed in THANKS.txt.

lagru · 2018-04-10T21:08:06Z

@larsoner These are only small changes to the docstring and I it would be nice to have this included in 1.1.0.

lagru · 2018-04-11T10:47:00Z

The second commit addresses the behavior of the functions if NaNs are present in the signal. This came up in #8708. It might be worth discussing whether this needs to be addressed further. It might be worth it to raise a warning message if any NaNs are encountered (a candidate for 1.2?).

larsoner · 2018-04-11T12:16:21Z

I don't think we need to warn so long as it's documented clearly in the docstring. Maybe put it first in the notes, and somewhere in an example?

…

lagru · 2018-04-11T12:22:30Z

I don't think we need to warn so long as it's documented clearly in the docstring. Maybe put it first in the notes, and somewhere in an example? -- @larsoner

Yeah, I was debating that with myself as well. I'll change it. For the future, is there a general rule of thumb when to use a warning or a simple note? .. note:: could be used here too.

larsoner · 2018-04-11T13:12:28Z

+1 for .. note (or if you want, .. warning even).

I don't know of any rule of thumb. You don't want to warn too much or too little. To me this would fall into the "too much" category but this is just my subjective judgment based on expected use cases.

lagru · 2018-04-11T15:12:26Z

I decided not to use the note-directive here... it doesn't seem to be a standard in SciPy and this felt more natural.

I'll have a look and try to come up with an elegant way to deal with NaNs. Otherwise I'll add warning messages if NaNs are found in the input. In the meantime I'm hesitant to introduce any in-code changes here as I don't want to block this from being potentially merged before 1.1 branches. So that would be another PR.

larsoner · 2018-04-11T15:33:22Z

I'll have a look and try to come up with an elegant way to deal with NaNs. Otherwise I'll add warning messages if NaNs are found in the input.

I am arguing against such a warning so long as the behavior of NaN is properly documented. Are you saying we should have a warning if NaN is in the input?

larsoner · 2018-04-11T15:33:55Z

... and actually the tests should probably be updated to test that NaN is handled how we say it will be handled.

lagru · 2018-04-11T15:56:38Z

Are you saying we should have a warning if NaN is in the input? -- @larsoner

Yes, unless I think of another way to deal with NaNs without completely rewriting the algorithms.

... and actually the tests should probably be updated to test that NaN is handled how we say it will be handled. -- @larsoner

I agree although I don't see it as a priority because in most cases the documentation states to "expect the unexpected" if NaNs are contained in the input. I hope nobody tries to use that as a feature. 😉
If its okay I'll think of meaningful tests in another PR in the next few days.

larsoner · 2018-04-11T16:02:00Z

Since the comparisons always evaluate to False, isn't nan behavior then the same as (assuming all other points are finite) setting the nan points to inf but never giving those indices as peak locations?

lagru · 2018-04-11T16:12:08Z

Not really. If we compare a finite number to nan it is both "not smaller", "not larger" and "not equal" at the same time. That means depending on the comparator chosen at different points in the algorithm nan is treated as +inf or -inf. Replacing with either one of those values would change the behavior.

In some cases this might actually be useful but your proposed case we don't gain anything. If we replace with +inf, we find new peaks at those positions. If never return those inf-peaks we don't gain anything compared to just leaving them nan because the neighbouring samples still can't be peaks.

(Or did I misunderstand your suggested solution?)

larsoner · 2018-04-11T16:18:27Z

I'm not proposing a solution -- I'm trying to describe the behavior of the current algorithm for the user when their input has NaNs. If it can be distilled to something equivalent that is simple to explain to the user then we can just clearly document this behavior rather than emitting a warning. So something like "Locations with the value NaN will be treated like np.inf when evaluating whether immediate neighbors are peaks, but will never be returned as peak locations themselves" would work, assuming it's correct.

It might not be possible to distill the other functions like prominence the same way, though.

lagru · 2018-04-11T16:22:34Z

Well in case you didn't see it, I think the documentation for find_peaks already states almost what you want:

Samples bordering NaNs will never be considered peaks. This is true as
well for flat peaks. To avoid this, NaNs should either be removed from `x`
or replaced with (interpolated) values.

lagru · 2018-04-11T16:25:09Z

It might not be possible to distill the other functions like prominence the same way, though.

Correct, those are more complicated to judge. That's why my warning in those cases was a bit more general.

larsoner · 2018-04-11T16:27:45Z

Ahh okay. In that case, can you make a PR to add warnings for those other functions? I agree it makes sense to have them. It would be good to have for 1.1.0 if possible. Adding them, then using pytest.warns context manager in the tests should be enough for now. This should still be forward-compatible with a future nan_policy-type argument that could try to do something smart(er).

lagru · 2018-04-11T18:59:09Z

Ahh okay. In that case, can you make a PR to add warnings for those other functions?

I'll try. But if today is the deadline I have a feeling there won't be enough time left to properly review & iterate...

lagru · 2018-04-11T19:00:24Z

@larsoner On another note... do you consider this ready to be merged today?

larsoner · 2018-04-11T19:02:58Z

I can rapidly iterate today if you can :)

But more seriously 1) I don't think it should be too difficult or require too much iteration since it's warning emission, and 2) I'd also consider it a form bug fix which means that tomorrow isn't a hard deadline (can backport).

WarrenWeckesser · 2018-04-11T20:31:42Z

The updated docstring says

The algorithm treats NaNs the same way as a window border and stops
searching for an intersection possibly leading to unwanted results.

Is this the API that you designed, or is it an accident of the implementation (i.e. the code was not written with NaNs in mind, but if there are NaNs in the data, that is how it happens to be treated)? If the latter, it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined". That way there is no commitment to an API. Then if explicit handling of NaN is added later, there is no need to worry about backwards compatbility.

larsoner · 2018-04-11T20:38:20Z

If the latter, it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined". That way there is no commitment to an API. Then if explicit handling of NaN is added later, there is no need to worry about backwards compatbility.

This makes sense to me. I was probably over-engineering a bit before. I think we can just say this in all the functions for now.

@WarrenWeckesser any opinion on .. note:: vs .. warning:: vs no-directive for these statements?

lagru · 2018-04-11T20:47:36Z

If the latter, [...]

Yes, I didn't have NaNs in mind when writing those functions so its just behavior resulting from the implementation.

[...] it might be safer to simply state in the docstring that "the behavior of the function with data containing NaNs is undefined".

I agree, here.

WarrenWeckesser · 2018-04-11T20:48:12Z

No directive is probably fine, but that's not a strong opinion.

lagru · 2018-04-11T20:55:46Z

To avoid having to wait for CI to build the doc, here the images with .. warning::
find_peaks, peak_prominences, peak_widths.

larsoner · 2018-04-11T20:59:31Z

Looks good to me

The behavior of these functions is undefined for data containing NaN. This should at least be clearly documented with a warning in the docstrings. In the future support for NaNs might be added.

DOC: MAINT: Correct formatting & a few typos in signal.find_peaks

2ba3b63

lagru changed the title ~~DOC: MAINT: Correct formatting & a few typos in signal.find_peaks~~ DOC: Describe signal.find_peaks, .peak_prominences behavior for NaNs Apr 11, 2018

lagru force-pushed the find-peaks branch 2 times, most recently from f97dbe6 to ade2b61 Compare April 11, 2018 12:50

lagru changed the title ~~DOC: Describe signal.find_peaks, .peak_prominences behavior for NaNs~~ DOC: Describe signal.find_peaks and related functions behavior for NaNs Apr 11, 2018

lagru mentioned this pull request Apr 11, 2018

MAINT: WIP: Add checks for NaN to peak finding functions #8709

Closed

lagru force-pushed the find-peaks branch from ade2b61 to e6c2c05 Compare April 11, 2018 21:02

DOC: Add note about behavior for NaNs

3a8f6c6

The behavior of these functions is undefined for data containing NaN. This should at least be clearly documented with a warning in the docstrings. In the future support for NaNs might be added.

lagru force-pushed the find-peaks branch from e6c2c05 to 3a8f6c6 Compare April 11, 2018 21:09

pv merged commit 209c33a into scipy:master Apr 12, 2018

lagru deleted the find-peaks branch April 12, 2018 18:39

pv added this to the 1.1.0 milestone Apr 12, 2018

lagru mentioned this pull request Aug 22, 2018

MAINT: Use warnings for unexpected peak properties #9126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

lagru commented Apr 10, 2018

lagru commented Apr 10, 2018 •

edited

Loading

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018 via email

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

WarrenWeckesser commented Apr 11, 2018 •

edited

Loading

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018 •

edited

Loading

WarrenWeckesser commented Apr 11, 2018 •

edited

Loading

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

DOC: Describe signal.find_peaks and related functions behavior for NaNs #8705

Conversation

lagru commented Apr 10, 2018

lagru commented Apr 10, 2018 • edited Loading

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018 via email

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

WarrenWeckesser commented Apr 11, 2018 • edited Loading

larsoner commented Apr 11, 2018

lagru commented Apr 11, 2018 • edited Loading

WarrenWeckesser commented Apr 11, 2018 • edited Loading

lagru commented Apr 11, 2018

larsoner commented Apr 11, 2018

lagru commented Apr 10, 2018 •

edited

Loading

WarrenWeckesser commented Apr 11, 2018 •

edited

Loading

lagru commented Apr 11, 2018 •

edited

Loading

WarrenWeckesser commented Apr 11, 2018 •

edited

Loading