Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added nan_policy to circ functions in stats #10710

Merged
merged 18 commits into from Oct 20, 2019

Conversation

aburrell
Copy link
Contributor

@aburrell aburrell commented Aug 23, 2019

Adds nan_policy as an optional keyword argument into the circular stat functions: stats.circmean, stats.circstd, and stats.circvar. NaN handling for all of these routines happens within stats.morestats._circfuncs_common.

What does this implement/fix?

This makes the circular statistical functions NaN tolerant, using the same method (user specified NaN policy of propagation, omission, or raising an error) as other functions within stats.

Additional information

Here is an example:

import numpy as np
from scipy import stats

stats.circmean([0.1, 2*np.pi+0.2, 6*np.pi+0.3])
stats.circmean([0.1, 2*np.pi+0.2, 6*np.pi+0.3, np.nan], nan_policy='omit')

Both return 0.2

- Added nan_policy as an optional keyword
   argument to circvar, circstd, circmean, and
   _circfuncs_common.
- Updated docstrings for altered functions.
- Added comments to _circfuncs_common.
- Following the example of median_test, added NaN
   handling to samples in _circfuncs_common.
Added unit tests for the new optional keyword
argument in the circular functions.
Added name and description of contribution to THANKS.txt
@tylerjereddy tylerjereddy added scipy.stats enhancement A new feature or improvement labels Aug 23, 2019
Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See brief associated mailing list discussion with @chrisb83 here: https://mail.python.org/pipermail/scipy-dev/2019-August/023684.html

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aburrell, thanks for this. Adding nan_policy is a good idea and this PR largely looks good. There's a couple of changes needed though.

scipy/stats/morestats.py Outdated Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
aburrell and others added 7 commits August 26, 2019 07:27
At the request of a reviewer, changed the general optional keyword arguements to a specific nan_policy keyword arguement.
Since the nan_policy keyword is now a parameter, we don't need to test for unknown keyword arguement input.
The nan_policy option 'omit' would not work when the axis
keyword was not None.  Fixed this by:
- having circfuncs_common return separate angle arrays for the
  sine and cosine sums.
- masking NaN the sine and cosine arrays with angles whose sine
  or cosine, respectively, equal zero.
Added the `pytest.mark.parametrize` decorator to appropriate unit tests in the TEstCircFuncs class.
Added unit tests for each of the different nan_policy flags with
arrays of data that contain NaN.  Tests behavior when axis flag is
None, 0, and 1.
Fixed bugs that occured when using arrays containing np.nan and
outputting the circular statistics along a specified axis.
- Changed _circfuncs_common to also output the NaN mask.
- Added check to see if all the data along a specified axis are NaN.
- Changed method of calculating mean in circstd and circvar if
   the data is masked.
@rgommers
Copy link
Member

The failures are all because of warnings:

RuntimeWarning: invalid value encountered in less

In CI we change the default behavior for warnings to result in test errors, because we want to prevent introducing new warnings when running the SciPy test suite. You should also be seeing warnings locally. If you silence the warnings by putting the relevant lines within a with np.errstate(invalid="ignore"): context manager, then CI should pass. There's a lot of examples of that in the stats tests already.

Changed processing order to prevent RuntimeWarnings in circmean, circvar, and
circstd.
- Changed masking in circmean to not try and evaluate inequality with NaN
- Fixed logic in circstd and circvar masking to avoid division by zero
Fixed several instances of inappropriate whitespace:
- No whitespace around an operator.
- Too much whitespace around an operator.
- Wrong number of spaces causing bad indentation.
@aburrell
Copy link
Contributor Author

It wasn't particularly difficult to fix the code to avoid introducing new RuntimeWarnings. I think it's ready for another review now.

@rgommers rgommers added this to the 1.4.0 milestone Aug 28, 2019
scipy/stats/morestats.py Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
scipy/stats/morestats.py Outdated Show resolved Hide resolved
Implemented reviewer suggestions to improve the clarity of the code:
- Used np.asarray to reduce instances of dimension checking.
- Replaced np.copyto with masking reassignment.
- Added comments to clarify the axis-checking portion of the circmean routine
Simplified the NaN axis evaluation in circmean by using shape instead of sum.
Implemented reviewer simplification suggestions and fixed a bug introduced in the last commit:
- Fixed bug in circmean that was triggered when axis=None.
- Moved sin/cos calculation to _circfuncs_common.
- Replaced copyto with a simple mask in _circfuncs_common.
Fixed bug that output a single NaN when an array was expected in circmean.
Added a unit test for a multi-dimensional all NaN input when the 'omit' flag is
used, ensuring that the output is also an array of the expected dimensions.
Removed the backslash within brackets that was causing pyflakes to
fail in for python 3.6.
@rgommers
Copy link
Member

rgommers commented Oct 2, 2019

All my comments were addressed and this now looks good to me. @pv's comments also seem addressed, but I don't have the bandwidth to re-review everything in detail right now, so I'll hold off on doing the actual merge (will do so before 1.4.0 if no one else beats me to it).

@rgommers rgommers merged commit 2c7c004 into scipy:master Oct 20, 2019
@rgommers
Copy link
Member

All looks good, time to merge this. Thanks @aburrell!

@aburrell
Copy link
Contributor Author

Thank you @rgommers, @tylerjereddy, and @pv! This was a good experience contributing code, and I appreciate the time you all put in to improve my contribution to scipy.

@rgommers
Copy link
Member

Thanks @aburrell, glad it was a good experience! Will try to get the next PR merged faster:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants