New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added nan_policy to circ functions in stats
#10710
Conversation
- Added nan_policy as an optional keyword argument to circvar, circstd, circmean, and _circfuncs_common. - Updated docstrings for altered functions. - Added comments to _circfuncs_common. - Following the example of median_test, added NaN handling to samples in _circfuncs_common.
Added unit tests for the new optional keyword argument in the circular functions.
Added name and description of contribution to THANKS.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See brief associated mailing list discussion with @chrisb83 here: https://mail.python.org/pipermail/scipy-dev/2019-August/023684.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aburrell, thanks for this. Adding nan_policy
is a good idea and this PR largely looks good. There's a couple of changes needed though.
At the request of a reviewer, changed the general optional keyword arguements to a specific nan_policy keyword arguement.
Since the nan_policy keyword is now a parameter, we don't need to test for unknown keyword arguement input.
The nan_policy option 'omit' would not work when the axis keyword was not None. Fixed this by: - having circfuncs_common return separate angle arrays for the sine and cosine sums. - masking NaN the sine and cosine arrays with angles whose sine or cosine, respectively, equal zero.
Added the `pytest.mark.parametrize` decorator to appropriate unit tests in the TEstCircFuncs class.
Added unit tests for each of the different nan_policy flags with arrays of data that contain NaN. Tests behavior when axis flag is None, 0, and 1.
Fixed bugs that occured when using arrays containing np.nan and outputting the circular statistics along a specified axis. - Changed _circfuncs_common to also output the NaN mask. - Added check to see if all the data along a specified axis are NaN. - Changed method of calculating mean in circstd and circvar if the data is masked.
The failures are all because of warnings:
In CI we change the default behavior for warnings to result in test errors, because we want to prevent introducing new warnings when running the SciPy test suite. You should also be seeing warnings locally. If you silence the warnings by putting the relevant lines within a |
Changed processing order to prevent RuntimeWarnings in circmean, circvar, and circstd. - Changed masking in circmean to not try and evaluate inequality with NaN - Fixed logic in circstd and circvar masking to avoid division by zero
Fixed several instances of inappropriate whitespace: - No whitespace around an operator. - Too much whitespace around an operator. - Wrong number of spaces causing bad indentation.
It wasn't particularly difficult to fix the code to avoid introducing new RuntimeWarnings. I think it's ready for another review now. |
Implemented reviewer suggestions to improve the clarity of the code: - Used np.asarray to reduce instances of dimension checking. - Replaced np.copyto with masking reassignment. - Added comments to clarify the axis-checking portion of the circmean routine
Simplified the NaN axis evaluation in circmean by using shape instead of sum.
Implemented reviewer simplification suggestions and fixed a bug introduced in the last commit: - Fixed bug in circmean that was triggered when axis=None. - Moved sin/cos calculation to _circfuncs_common. - Replaced copyto with a simple mask in _circfuncs_common.
Fixed bug that output a single NaN when an array was expected in circmean.
Added a unit test for a multi-dimensional all NaN input when the 'omit' flag is used, ensuring that the output is also an array of the expected dimensions.
Removed the backslash within brackets that was causing pyflakes to fail in for python 3.6.
All my comments were addressed and this now looks good to me. @pv's comments also seem addressed, but I don't have the bandwidth to re-review everything in detail right now, so I'll hold off on doing the actual merge (will do so before 1.4.0 if no one else beats me to it). |
All looks good, time to merge this. Thanks @aburrell! |
Thank you @rgommers, @tylerjereddy, and @pv! This was a good experience contributing code, and I appreciate the time you all put in to improve my contribution to scipy. |
Thanks @aburrell, glad it was a good experience! Will try to get the next PR merged faster:) |
Adds nan_policy as an optional keyword argument into the circular stat functions:
stats.circmean
,stats.circstd
, andstats.circvar
. NaN handling for all of these routines happens withinstats.morestats._circfuncs_common
.What does this implement/fix?
This makes the circular statistical functions NaN tolerant, using the same method (user specified NaN policy of propagation, omission, or raising an error) as other functions within
stats
.Additional information
Here is an example:
Both return 0.2