Skip to content

MAINT: stats.wilcoxon: improve documentation and tests #21867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 28, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Nov 11, 2024

Reference issue

gh-21567

What does this implement/fix?

gh-21567 made the following improvements to scipy.stats.wilcoxon:

  • Respect the method parameter if it is explicitly passed by the user. In particular, if method='exact' is specified, use it regardless of whether there are ties/zeros.
  • Improve 'auto' method by performing a permutation test if the sample size is small and there are ties or zeros.
  • Rename method='approx' to method='asymptotic' for consistency with closely related tests mannwhitneyu, kendalltau, and page_trend_test.

During the review process, I noticed some other items deserving of maintenance. This PR makes those adjustments; rationale is noted inline.

@github-actions github-actions bot added the maintenance Items related to regular maintenance tasks label Nov 11, 2024
Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisb83 Could you take a look at these follow-up suggestions?

Comment on lines 3580 to 3582
- The default, ``method='auto'``, selects between the two: when
``len(d) <= 50`` and there are no zeros and no ties, the exact method
is used; if the sample size is small and there are zeros or ties, the
p-value is computed using `permutation_test`;
otherwise, the approximate method is used. The p-value computed by
the permutation test is deterministic since it is only used if the
sample size is small enough to iterate over all possible outcomes.
- The default, ``method='auto'``, selects between the two:
``method='exact'`` is used when ``len(d) <= 50``, and
``method='asymptotic'`` is used otherwise.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not yet defined "ties" or "zeros". Above, it specifies

Assume that all elements of d are independent and identically distributed observations, and all are distinct and nonzero.

So let's document what happens under this assumption, then define ties and zeros and document how those affect the behavior.

Comment on lines -114 to -118
if n_zero > 0 and method == "exact":
warnings.warn("Exact p-value calculation does not work if there are "
"zeros. Consider using method `asymptotic` or "
"`stats.PermutationMethod`",
stacklevel=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in #21567 (comment):

  • 'exact' is also not exact if there are ties, but there is no similar warning in that case.
  • Users have expressed confusion about what a "zero" is in the past.
  • The recommended normal approximation / method='asymptotic' is not exact, either, so this doesn't motivate why one would be better than the other. (A permuation test would also not be exact unless the sample size is small.)

It's tough to address all the subtleties of accurate hypothesis testing in a warning; that's why we've elaborated on this stuff in the documentation. And I'm concerned that incomplete warnings like this can do more harm than good: it may give users the impression that the code will help them avoid shooting themselves in the foot, but scipy.stats doesn't do that consistently. If we think a more consistent policy across the board is important, that is something we can consider, but for now, let's not give a false impression.

Comment on lines -120 to -121
if 0 < d.shape[-1] < 10 and method == "asymptotic":
warnings.warn("Sample size too small for normal approximation.", stacklevel=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here: does this imply that a sample size of 10 is large enough for a normal approximation?

Note that there wasn't actually a test for this warning before. It was always just filtered out, so presumably it was not deemed to be critical.

@@ -1471,7 +1471,7 @@ def test_wilcoxon_bad_arg(self):
assert_raises(ValueError, stats.wilcoxon, [1, 2], [1, 2], "dummy")
assert_raises(ValueError, stats.wilcoxon, [1, 2], [1, 2],
alternative="dummy")
assert_raises(ValueError, stats.wilcoxon, [1]*10, mode="xyz")
assert_raises(ValueError, stats.wilcoxon, [1]*10, method="xyz")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method is the new name for mode. I didn't change the tests when the change was made, but now seems like a good time to do it. In test_approx_mode below, I test that mode is still accepted.

w, p = stats.wilcoxon(d)
assert_equal((w, p), stats.wilcoxon(d, method="asymptotic"))

d[d == 0] = 1 # tie
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test with a zero; we should test with a tie.

# n <= 50: if there are zeros in d = x-y, use PermutationMethod
# this is a slower test to show that results are deterministic if n=13
pm = stats.PermutationMethod()
d = np.arange(0, 13)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method='exact' gives the same result in this case. Let's test a case that allows us to distinguish between the two methods.

# `method='auto'`, there are zeros or ties in `d = x-y`, and `len(d) <= 13`.
d = np.arange(-5, 8) # zero
res = stats.wilcoxon(d)
ref = (27.5, 0.3955078125) # stats.wilcoxon(d, method=PermutationMethod())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save some time. It's OK to hard-code values given by SciPy code given that the intent of the test is to check that the result is deterministic.

ref = (27.5, 0.3955078125) # stats.wilcoxon(d, method=PermutationMethod())
assert_equal(res, ref)

d[d == 0] = 1 # tie
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we need to check for ties, too.


# n <= 50: if there are zeros in d = x-y, use PermutationMethod
pm = stats.PermutationMethod()
d = np.arange(0, 5)
d = np.arange(-2, 5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method='exact' gives the same result in this case. Let's test a case that allows us to distinguish between the two methods.

@mdhaber mdhaber requested a review from chrisb83 November 11, 2024 19:31
@mdhaber
Copy link
Contributor Author

mdhaber commented Nov 11, 2024

If the other tests pass, I'll fix the lint error and commit with [lint only].

@mdhaber
Copy link
Contributor Author

mdhaber commented Nov 11, 2024

Remaining doc failure seems unrelated, since it's about interpolate._interpnd.CloughTocher2DInterpolator.

@mdhaber
Copy link
Contributor Author

mdhaber commented Nov 18, 2024

@chrisb83 Would you be willing to take a look here before we branch? The adjustments are pretty straightforward, I think.

@chrisb83
Copy link
Member

The majority of the changes is straight-forward (improved documentation, simplification of some tests).

I am not sure how we deal with the removal of warnings.. For me, it is ok, though I could imagine that other people prefer to keep them. A warning is more visible to the user than the documentation of the limitations. So instead of removing the warning, one could also try to improve them.

@mdhaber
Copy link
Contributor Author

mdhaber commented Nov 19, 2024

So instead of removing the warning, one could also try to improve them.

Certainly one could. This is part of what I meant by "If we think a more consistent policy across the board is important, that is something we can consider". That is not the direction I chose to go with this PR because it is a much bigger project. Instead, I chose to make an incremental improvement by removing the incomplete and potentially misleading warnings. However, I would welcome someone work on better warnings throughout stats. I'd suggest that this person open an issue about it, recommending that we audit functions for pitfalls and warn according to a unified policy. In the meantime, removing these incomplete warnings makes it clearer to users and developers that we do not currently protect against inaccurate results due to poor choice of method.

@chrisb83
Copy link
Member

OK, let's remove the warnings:

  • if the user run the test with the default method="auto", then we aim to run the test in a way that avoids those warnings anyways
  • if the user choses a specific method, we can assume that this choice is made on purpose, so the existing incomplete / misleading warnings probably do not add much value (or they are just not needed, e.g., if someone wants to check how accurate the asymptotic method is for small samples)

@mdhaber mdhaber merged commit 1d57f31 into scipy:main Nov 28, 2024
37 checks passed
@j-bowhay j-bowhay added this to the 1.15.0 milestone Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants