Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: adds warning to scipy.stats.brunnermunzel #15847

Merged
merged 5 commits into from
Mar 22, 2022
Merged

Conversation

sg-s
Copy link
Contributor

@sg-s sg-s commented Mar 22, 2022

Reference issue

closes #15843

What does this implement/fix?

emits a warning when scipy.stats.brunnermunzel cannot compute p-value
because both numerator and denominator are 0, and therefore p-value is nan
also suggests an alternative (using distribution="normal")

This change does not change any output/no code is changed. The only change
is emitting a warning in some cases where output p-value is nan

also add two tests:

  1. test_brunnermunzel_return_nan tests that the correct warning is raised when using a t-distribution
  2. test_brunnermunzel_normal_dist tests that p=0 when using a normal dist with the problematic dataset

scipy.stats.brunnermunzel returns a p-value of nan
in some cases, because the numerator and denominator
of the t-distribution were 0. It is not clear
why nan is returned, especially when it appears
that p should be 0 (when effect sizes are large).
Now, a warning is emitted when p is nan, and an
alternative is suggested

Fixes scipy#15843
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

Thanks. If the suggestions above look OK to you, I'll push them, but there's one more thing needed before merge. Can you add your example as a test in scipy/stats/tests/test_stats.py class TestBrunnerMunzel?
Please add a comment about the purpose of the test, mention the original issue (gh-15843), and use with pytest.warns(RuntimeWarning, match='p-value cannot be estimated') to check that the warning is emitted as intended.

@mdhaber mdhaber added scipy.stats maintenance Items related to regular maintenance tasks labels Mar 22, 2022
@sg-s
Copy link
Contributor Author

sg-s commented Mar 22, 2022

@mdhaber will do, thanks for the review and changes!

sg-s and others added 3 commits March 22, 2022 15:21
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
tests check if warning is emitted, and also
that p=0 when normal distribution is used
@sg-s
Copy link
Contributor Author

sg-s commented Mar 22, 2022

@mdhaber thanks for the suggestions. added your changes, and also added two tests which check for warning and correct p when using a normal distribution. i verified that those two tests pass locally.

scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

Thanks! I'd suggest finding an option in your IDE that removes trailing spaces when it saves. Helps avoid PEP8 issues.
I fixed those and added "(see gh-15843)" to the test descriptions. Fingers crossed that CI comes back green! If so, I'll merge.

@sg-s
Copy link
Contributor Author

sg-s commented Mar 22, 2022

thank you! fyi i had to disable my autoformatter (black) because it changed everything else in the file.

also the imports are in a non pep8 order in some of the modules so i had to disable isort too

@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

thank you! fyi i had to disable my autoformatter

Thank you for doing that!

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge when tests finish.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

Thanks for the bug report and fix @sg-s! The Main Windows Python310 seems to have hung, as it frequently does, so I'll merge now.
Just curious - what are you using brunnermunzel for, and how did you run across the issue?

@mdhaber mdhaber merged commit 78ff9f1 into scipy:main Mar 22, 2022
@sg-s
Copy link
Contributor Author

sg-s commented Mar 22, 2022

@mdhaber i have a dataset with relatively small n (3-10) where in some cases there was clearly a difference by eye, and t-tests were saying it wasn't different. turns out my data isn't normal, and also is heteroskedastic and what i should have done is a permutation test (which is what i used eventually). but brunnermunzel was one of the things i tried because it seemed to have lower restrictions.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

Great! Did you use permutation_test to do the permutation test? (I'll fix that issue with nans shortly.) Always good to know how the code is being used. And if you think another test would be good to add, let us know. Thanks!

@sg-s
Copy link
Contributor Author

sg-s commented Mar 23, 2022

yup, ended up using permutation_test

@mdhaber mdhaber added this to the 1.9.0 milestone Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: scipy.stats.brunnermunzel incorrectly returns nan for undocumented reason
2 participants