Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updated documentation for nan_policy=propagate as per #9947 #9948

Closed
wants to merge 6 commits into from
Closed

DOC: Updated documentation for nan_policy=propagate as per #9947 #9948

wants to merge 6 commits into from

Conversation

Sedosa
Copy link

@Sedosa Sedosa commented Mar 16, 2019

Updated documentation for nan_policy='propagate' as per #9947 and added an example for stats.mode to make behaviour clear.

scipy/stats/stats.py Outdated Show resolved Hide resolved
@pvanmulbregt pvanmulbregt added the Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org label Mar 18, 2019
@Kai-Striega
Copy link
Member

Kai-Striega commented Mar 18, 2019

@Sedosa thanks for taking the time to make a contribution. There are only a few nitpicks which stand out to me:

  1. "'propagate' performs calculations keeping nan values" -> "'propagate' attempts to perform the calculations keeping nan values". The function may encounter error(s) due to nan values, and is therefore not guaranteed to actually perform the calculations.
  2. "Defines how to handle when input contains nan" -> "Defines how to handle when the input contains nan". I know this wasn't part of the initial issue, but it would be nice to be grammatically correct.

@rgommers would you mind taking a look at this? The issue occurs that nan_policy='propagate' does not behave as documented in stats.mode() (and maybe some other functions). See #9815. Where the proposed solution was:

I think the behaviour of the function is fine: for me, propagate means to keep NaN values, then perform the calculation and see what happens. Often this results in NaN (e.g. if the calculation involves adding the elements), but in the case of mode, every NaN is simply counted as a single occurence (of course, this needs to be clearly stated).

This PR changes the existing documentation of propagate from "returns nan" to follow the above definition and adds an example for mode when nan_policy='propagate' is used.

@rgommers
Copy link
Member

This PR changes the existing documentation of propagate from "returns nan" to follow the above definition and adds an example for mode when nan_policy='propagate' is used.

sounds good to me

[ 1., 2., 2., nan],
[ 1., 1., 1., nan]])
>>> stats.mode(a, nan_policy='propagate')
ModeResult(mode=array([[1., 1., 2., 3.]]), count=array([[3, 2, 2, 1]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this seems wrong? given the discussion, should be [1., 1., 2., nan]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the discussion to imply the absence of nan in the returned array was what warranted the example..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still ongoing in gh-9815 it seems, let's continue there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an example with a 1d vector is easier to understand to illustrate the behaviour.

scipy/stats/stats.py Outdated Show resolved Hide resolved
@Kai-Striega
Copy link
Member

@rgommers should we consider merging this? The other discussions seem to have stalled and this documents the current behaviour better than the existing documentation. The only real improvement I can see would be to change the example to be 1d.

@chrisb83
Copy link
Member

this documents the current behaviour better than the existing documentation.

I agree

@rgommers
Copy link
Member

@rgommers should we consider merging this? The other discussions seem to have stalled and this documents the current behaviour better than the existing documentation.

I'd rather finish the discussion, because if we merge this and leave it that effectively decides the discussion by explicitly documenting behavior (so we shouldn't change it later without deprecation because it's a bug).

Could both of you indicate on the issue what your preferred behavior is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants