Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add developer guidelines for implementing the nan_policy parameter. #12069

Merged
merged 6 commits into from Nov 25, 2020

Conversation

WarrenWeckesser
Copy link
Member

This is the first of several developer guidelines that I am preparing. In this one, I have tried to document how nan_policy is intended to be used. It will be helpful to have an explicit design specification when reviewing PRs that add the nan_policy parameter to new functions.

@WarrenWeckesser WarrenWeckesser added the Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org label May 9, 2020
@WarrenWeckesser
Copy link
Member Author

Following Ralf's suggestion on the mailing list, I added a commit that moves the page about ufuncs in special to the API Dev. Guide section.

Copy link
Member

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarrenWeckesser I left a couple comments/questions for spots where (I think) things could be clarified.

Let me know if unclear. Thanks for writing, super helpful to have these quidelines.
I'd also be keen to know the topics of the other, future, guidelines.

Comment on lines +56 to +57
* ``nan_policy='raise'``:
Raise a ``ValueError``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worthwhile to recommend here in guidelines to use the _contains_nan():

def _contains_nan(a, nan_policy='propagate'):

and allow that method to handle the error or should the public function be returning the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Getting back to this after a long delay...)

I see that as an implementation detail, and not really part of the API specification. We might rewrite, rename, or relocate that function in the future, and that should not change the API spec. And for some functions, we might not even use that function. Also, I think of this document as useful for both developers and users, and we don't need to subject the users to those implementation details here.

Having said that, it might be useful to add a section (in this doc, or perhaps a separate one) about the current recommended best practices for implementing the API, with the understanding that the implementation details are subject to change. To minimize the additional review needed for this PR, I think we can defer that to a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think of this document as useful for both developers and users, and we don't need to subject the users to those implementation details here.

This makes more sense, I was reading this as guidance for those of us maintaining/implementing functions that support nan_policy and not necessarily for users.

Having said that, it might be useful to add a section (in this doc, or perhaps a separate one) about the current recommended best practices for implementing the API, with the understanding that the implementation details are subject to change. To minimize the additional review needed for this PR, I think we can defer that to a later PR.

Agreed, for several reasons, let's avoid implementation details here.

Comment on lines 60 to 61
straightforward specification, but occasionally there are subtleties
to be considered--see below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

there are subtleties to be considered when using `nan_policy='propagate'` in conjunction with `axis` argument--see below

or are there other subleties as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are others as well. The issue reported in #7818 is not related to handling an axis argument. But see my longer comment below related to this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, this seems fairly complicated and likely to require some case by case judgment calls.

doc/source/dev/api-dev/nan_policy.rst Outdated Show resolved Hide resolved
Comment on lines 146 to 148
* It may result in warnings that, for the user, are unexpected. E.g. "Why
are you warning me about invalid values in double scalars when I have
already told you to propagate the ``nan`` values?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guideline written here doesn't seem like it gives clear guidance.
It would be nice to have a clear stance on handling warnings in this case, e.g. should they generally be ignored, allowed to be raised, or determined on a case-by-case basis?

Later in the paragraph the sentence:

Generally, whether ``'propagate'`` needs to be handled as a special case will be decided on a case-by-case basis.

So we reserve the right to consider as special case on a case-by-case basis but isn't this always the case?
I'd think that this section would be more helpful by specifying in general what one should do, and treat the special cases as such, exceptions to the general case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should provide clearer guidance. Or maybe no guidance--perhaps my attempt to discuss the subtleties in this document is premature and can be removed.

The motivating example--indeed, the only example at the moment--is the issue reported in #7818. The uniform filter is just a sliding windowed mean of the data. There are at least three ways such a calculation could be implemented, and each one propagates nan differently. In the "naive" implementation, the nans in the output are limited to a short section with length equal to the window length. For the implementation in ndimage.uniform_filter, the nan in the output propagates from the first occurrence of the input nan to the end of the array. If the filter was calculated using the FFT, all the output values would be nan. Mathematically (and in the absence of nan), all three implementations are equivalent--and a user shouldn't have to care about how that calculation is actually implemented. But when we consider propagating a nan, the implementation details can become part of the API. For the case of the uniform filter, do we want to promise that nan will propagate according to the naive implementation (as the user in that issue expected)? This might mean checking for nan in advance, or checking for it during the computation, and switching to the naive implementation if a nan is encountered. Or do we not bother trying to promise exactly how nans will propagate, and instead state that the behavior should not be relied on because it is implementation-dependent and might change if the underlying implementation changes? I'm not sure at the moment, which is why I think maybe I should take this out of the document.

Regarding warnings: I've been trying to cook up an example or two where I can say "Here's a naive implementation of the 'propagate' behavior for a function that generates an undesirable warning", but I haven't had much luck. All my simple examples fail to generate a warning, and I don't recall if I originally had an example that motivated me to write this. So yet another reason to perhaps just take this section out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should provide clearer guidance. Or maybe no guidance--perhaps my attempt to discuss the subtleties in this document is premature and can be removed.

The motivating example
... [eliding full quoting of a long section]
I'm not sure at the moment, which is why I think maybe I should take this out of the document.

I'm in agreement on taking this part out of the document, if examples come up later we can always add this language, or some updated version of it, into the document.

Regarding warnings: I've been trying to cook up an example or two where I can say "Here's a naive implementation of the 'propagate' behavior for a function that generates an undesirable warning", but I haven't had much luck. All my simple examples fail to generate a warning, and I don't recall if I originally had an example that motivated me to write this. So yet another reason to perhaps just take this section out.

Hmm, I can't really think of any obvious example either so maybe is best to just take it out for now.

doc/source/dev/api-dev/nan_policy.rst Show resolved Hide resolved
@rgommers
Copy link
Member

rgommers commented Nov 8, 2020

@WarrenWeckesser a ping on this. Would be nice to get this in, it's almost done and very useful documentation.

Copy link
Member

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @WarrenWeckesser I'm marking resolved.

I'll leave this one open until Sunday November 15th to allows others to comment.

If no comments by around 10am EST, I'll merge.

EDIT: the failures in travis + ci are unrelated

Comment on lines 149 to 154
* How a ``nan`` propagates will be implementation dependent, and in some
cases may surprise a user. A concrete example is reported in GitHub
issue https://github.com/scipy/scipy/issues/7818, where a user was
surprised that a ``nan`` propagated all the way to the end of an array,
rather than being limited to outputs for which a naive implementation
would only be affected by inputs within the specified window.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second part of the section seems clear though.

Comment on lines 143 to 148
execute the code in the function without checking for ``nan``, and let the
chips fall where they may. There are two potential problems with this:

* It may result in warnings that, for the user, are unexpected. E.g. "Why
are you warning me about invalid values in double scalars when I have
already told you to propagate the ``nan`` values?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
execute the code in the function without checking for ``nan``, and let the
chips fall where they may. There are two potential problems with this:
* It may result in warnings that, for the user, are unexpected. E.g. "Why
are you warning me about invalid values in double scalars when I have
already told you to propagate the ``nan`` values?"
execute the code in the function without checking for ``nan``.
There are potential problems with this approach.
For example:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WarrenWeckesser I left a suggestion here to cut this part of the section given we both seem to agree it's not very clear and we can always clarify and add later.

@rlucas7
Copy link
Member

rlucas7 commented Nov 15, 2020

I'll leave this one open until Sunday November 15th to allows others to comment.
If no comments by around 10am EST, I'll merge.

I'll leave this open for another few days to give @WarrenWeckesser time to commit the suggestion and/or any other updates, I'll aim to merge Tuesday Nov 17th in the afternoon.

@WarrenWeckesser
Copy link
Member Author

WarrenWeckesser commented Nov 17, 2020

@rlucas7, thanks for the review. I'll update the PR this evening.

Copy link
Member

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good to me thanks @WarrenWeckesser
I'll leave these up for another day or so and Merge Sunday the 22nd November unless further comments.

@rlucas7
Copy link
Member

rlucas7 commented Nov 25, 2020

This is a doc only change PR and all the travis builds are green so merging.
@WarrenWeckesser note sure if this makes sense as a mention in the release notes under other changes:

https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.6.0#other-changes

?

@rlucas7 rlucas7 merged commit d756d73 into scipy:master Nov 25, 2020
@tylerjereddy tylerjereddy added this to the 1.6.0 milestone Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants