New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add developer guidelines for implementing the nan_policy parameter. #12069
Conversation
Following Ralf's suggestion on the mailing list, I added a commit that moves the page about ufuncs in |
8d87e01
to
348692d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WarrenWeckesser I left a couple comments/questions for spots where (I think) things could be clarified.
Let me know if unclear. Thanks for writing, super helpful to have these quidelines.
I'd also be keen to know the topics of the other, future, guidelines.
* ``nan_policy='raise'``: | ||
Raise a ``ValueError``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worthwhile to recommend here in guidelines to use the _contains_nan()
:
Line 215 in 2f32958
def _contains_nan(a, nan_policy='propagate'): |
and allow that method to handle the error or should the public function be returning the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Getting back to this after a long delay...)
I see that as an implementation detail, and not really part of the API specification. We might rewrite, rename, or relocate that function in the future, and that should not change the API spec. And for some functions, we might not even use that function. Also, I think of this document as useful for both developers and users, and we don't need to subject the users to those implementation details here.
Having said that, it might be useful to add a section (in this doc, or perhaps a separate one) about the current recommended best practices for implementing the API, with the understanding that the implementation details are subject to change. To minimize the additional review needed for this PR, I think we can defer that to a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think of this document as useful for both developers and users, and we don't need to subject the users to those implementation details here.
This makes more sense, I was reading this as guidance for those of us maintaining/implementing functions that support nan_policy
and not necessarily for users.
Having said that, it might be useful to add a section (in this doc, or perhaps a separate one) about the current recommended best practices for implementing the API, with the understanding that the implementation details are subject to change. To minimize the additional review needed for this PR, I think we can defer that to a later PR.
Agreed, for several reasons, let's avoid implementation details here.
straightforward specification, but occasionally there are subtleties | ||
to be considered--see below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
there are subtleties to be considered when using `nan_policy='propagate'` in conjunction with `axis` argument--see below
or are there other subleties as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are others as well. The issue reported in #7818 is not related to handling an axis
argument. But see my longer comment below related to this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment, this seems fairly complicated and likely to require some case by case judgment calls.
* It may result in warnings that, for the user, are unexpected. E.g. "Why | ||
are you warning me about invalid values in double scalars when I have | ||
already told you to propagate the ``nan`` values?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guideline written here doesn't seem like it gives clear guidance.
It would be nice to have a clear stance on handling warnings in this case, e.g. should they generally be ignored, allowed to be raised, or determined on a case-by-case basis?
Later in the paragraph the sentence:
Generally, whether ``'propagate'`` needs to be handled as a special case will be decided on a case-by-case basis.
So we reserve the right to consider as special case on a case-by-case basis but isn't this always the case?
I'd think that this section would be more helpful by specifying in general what one should do, and treat the special cases as such, exceptions to the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this should provide clearer guidance. Or maybe no guidance--perhaps my attempt to discuss the subtleties in this document is premature and can be removed.
The motivating example--indeed, the only example at the moment--is the issue reported in #7818. The uniform filter is just a sliding windowed mean of the data. There are at least three ways such a calculation could be implemented, and each one propagates nan
differently. In the "naive" implementation, the nan
s in the output are limited to a short section with length equal to the window length. For the implementation in ndimage.uniform_filter
, the nan
in the output propagates from the first occurrence of the input nan
to the end of the array. If the filter was calculated using the FFT, all the output values would be nan
. Mathematically (and in the absence of nan
), all three implementations are equivalent--and a user shouldn't have to care about how that calculation is actually implemented. But when we consider propagating a nan
, the implementation details can become part of the API. For the case of the uniform filter, do we want to promise that nan
will propagate according to the naive implementation (as the user in that issue expected)? This might mean checking for nan
in advance, or checking for it during the computation, and switching to the naive implementation if a nan
is encountered. Or do we not bother trying to promise exactly how nan
s will propagate, and instead state that the behavior should not be relied on because it is implementation-dependent and might change if the underlying implementation changes? I'm not sure at the moment, which is why I think maybe I should take this out of the document.
Regarding warnings: I've been trying to cook up an example or two where I can say "Here's a naive implementation of the 'propagate' behavior for a function that generates an undesirable warning", but I haven't had much luck. All my simple examples fail to generate a warning, and I don't recall if I originally had an example that motivated me to write this. So yet another reason to perhaps just take this section out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this should provide clearer guidance. Or maybe no guidance--perhaps my attempt to discuss the subtleties in this document is premature and can be removed.
The motivating example
... [eliding full quoting of a long section]
I'm not sure at the moment, which is why I think maybe I should take this out of the document.
I'm in agreement on taking this part out of the document, if examples come up later we can always add this language, or some updated version of it, into the document.
Regarding warnings: I've been trying to cook up an example or two where I can say "Here's a naive implementation of the 'propagate' behavior for a function that generates an undesirable warning", but I haven't had much luck. All my simple examples fail to generate a warning, and I don't recall if I originally had an example that motivated me to write this. So yet another reason to perhaps just take this section out.
Hmm, I can't really think of any obvious example either so maybe is best to just take it out for now.
@WarrenWeckesser a ping on this. Would be nice to get this in, it's almost done and very useful documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change @WarrenWeckesser I'm marking resolved.
I'll leave this one open until Sunday November 15th to allows others to comment.
If no comments by around 10am EST, I'll merge.
EDIT: the failures in travis + ci are unrelated
* How a ``nan`` propagates will be implementation dependent, and in some | ||
cases may surprise a user. A concrete example is reported in GitHub | ||
issue https://github.com/scipy/scipy/issues/7818, where a user was | ||
surprised that a ``nan`` propagated all the way to the end of an array, | ||
rather than being limited to outputs for which a naive implementation | ||
would only be affected by inputs within the specified window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second part of the section seems clear though.
execute the code in the function without checking for ``nan``, and let the | ||
chips fall where they may. There are two potential problems with this: | ||
|
||
* It may result in warnings that, for the user, are unexpected. E.g. "Why | ||
are you warning me about invalid values in double scalars when I have | ||
already told you to propagate the ``nan`` values?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execute the code in the function without checking for ``nan``, and let the | |
chips fall where they may. There are two potential problems with this: | |
* It may result in warnings that, for the user, are unexpected. E.g. "Why | |
are you warning me about invalid values in double scalars when I have | |
already told you to propagate the ``nan`` values?" | |
execute the code in the function without checking for ``nan``. | |
There are potential problems with this approach. | |
For example: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WarrenWeckesser I left a suggestion here to cut this part of the section given we both seem to agree it's not very clear and we can always clarify and add later.
I'll leave this open for another few days to give @WarrenWeckesser time to commit the suggestion and/or any other updates, I'll aim to merge Tuesday Nov 17th in the afternoon. |
@rlucas7, thanks for the review. I'll update the PR this evening. |
…ate'. Also fix a couple typos and copyedit a bit.
…ate'. Also fix a couple typos and copyedit a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look good to me thanks @WarrenWeckesser
I'll leave these up for another day or so and Merge Sunday the 22nd November unless further comments.
This is a doc only change PR and all the travis builds are green so merging. https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.6.0#other-changes ? |
This is the first of several developer guidelines that I am preparing. In this one, I have tried to document how
nan_policy
is intended to be used. It will be helpful to have an explicit design specification when reviewing PRs that add thenan_policy
parameter to new functions.