detect when ties in ranks exist and appropriately warn users and fall back to approx #13690

jreynolds01 · 2021-03-15T01:21:39Z

Reference issue

What does this implement/fix?

Fixes appropriate detection of ties and fall back to approx mode.

Additional information

tylerjereddy · 2021-03-16T02:54:30Z

scipy/stats/morestats.py

+        warnings.warn("Exact p-value calculation does not work if there are "
+                      "zeros. Switching to normal approximation.")
+
+    if np.unique(abs(d)).size < d.size and mode == "exact":


Maybe obvious, but just a quick note that these two conditions are never true together in our test suite at the moment (see below from codecov). That doesn't mean changes are needed, but maybe useful to know for a stats reviewer when they take a look.

mdhaber · 2021-03-20T00:22:47Z

This is something about our API that needs to be discussed more generally. @jreynolds01, are you on the SciPy-dev mailing list? A message recently went out about gh-13650. It would be great if you would express your opinion on the mailing list.

I'm conflicted here. The default method is 'auto', and it switches to the asymptotic method when there are ties. That behavior is appropriate. If the user chooses to ignore the documentation (which we should make more explicit) that the exact method does not adjust for ties and override the default by choosing 'exact', there is something to be said for allowing that - mostly because in some functions this will eliminate the need to check for ties when the exact method is selected.

mdhaber

Thanks for the contribution @jreynolds01. I took a closer look, though, and I'm afraid there is either a misunderstanding about the test or the variable d.

wilcoxon is used to perform a paired-sample test. When only x is provided, x is the difference between a pair of samples, and d = x. When two samples x and y are provided (where x[i] is paired with y[i]), d = x - y. In either case, d==0 means that a pair of observations is tied.

So it is always accurate to detect when d==0 and give the error message "Exact p-value calculation does not work if there are ties", as in master. This PR would change that message to "Exact p-value calculation does not work if there are zeros". This would be true - although perhaps a little less clear about what the problem really is - when only x is provided. But it would be confusing when x and y are provided, because it is not about either x or y having a zero element.

~~I don't think the new check if np.unique(abs(d)).size < d.size is appropriate, because the test has no problem when the difference d between a pair of samples is not unique.~~ It does change the null distribution of the test statistic.

Also, there is no guarantee that switching to the normal approximation in case of ties will be any more accurate when the sample size is small. For this test (and others) Hollander and Wolfe "Nonparametric Statistical Methods" recommends "If there are ties among the N observations, use average ranks to compute W, and carry on as in [the case without ties]" (unless the sample is large, in which case it suggests the asymptotic approximation with tie correction). So for small samples there is precedent to use the same null distribution (computed without considering ties) even in the presence of ties. I believe this acceptable because ties tend to reduce the variance of the null distribution, so the returned p-values are conservative (more chance of type-II error, less chance of type I error).

@jreynolds01 perhaps the best thing to do here would be to actually consider ties in the calculation of the null distribution. Would you be interested in extending the method I implemented here to handle the case of ties?

mdhaber · 2022-02-20T19:47:34Z

This has merge conflicts since the move from morestats.py to _morestats.py. To avoid additional conflicts with gh-13438, let's close this and consider this conversation over there. (It's a slight increase to the scope of gh-13438, but probably still easier that way than dealing with merge conflicts.)
Note that scipy.stats.permutation_test can calculate the exact distribution of the test statistic even when there are ties, but if someone is interested in implementing a more efficient algorithm for wilcoxon in the presence of ties, that would be the best solution. (E.g., see references linked from here).
Thanks @jreynolds01!

detect when ties in ranks exist and appropriately warn users.

34aff27

mdhaber added the scipy.stats label Mar 15, 2021

jreynolds01 changed the title ~~detect when ties in ranks exist and appropriately warn users.~~ detect when ties in ranks exist and appropriately warn users and fall back to approx Mar 15, 2021

tylerjereddy reviewed Mar 16, 2021

View reviewed changes

mdhaber reviewed Apr 11, 2021

View reviewed changes

mdhaber mentioned this pull request Apr 12, 2021

Wilcoxon does not appropriately detect ties when mode=exact. #13689

Closed

mdhaber mentioned this pull request Feb 20, 2022

ENH: wilcoxon exact calculation for n>25 #13438

Merged

mdhaber closed this Feb 20, 2022

mdhaber mentioned this pull request Dec 7, 2022

Unnecessary approximation in scipy.stats.wilcoxon(x, y) #17530

Closed

mdhaber mentioned this pull request Dec 28, 2022

BUG: Wrong p-values with Wilcoxon signed-rank test because of wrong rank data #17667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect when ties in ranks exist and appropriately warn users and fall back to approx #13690

detect when ties in ranks exist and appropriately warn users and fall back to approx #13690

jreynolds01 commented Mar 15, 2021

tylerjereddy Mar 16, 2021

mdhaber commented Mar 20, 2021

mdhaber left a comment •

edited

Loading

mdhaber commented Feb 20, 2022 •

edited

Loading

detect when ties in ranks exist and appropriately warn users and fall back to approx #13690

detect when ties in ranks exist and appropriately warn users and fall back to approx #13690

Conversation

jreynolds01 commented Mar 15, 2021

Reference issue

What does this implement/fix?

Additional information

tylerjereddy Mar 16, 2021

Choose a reason for hiding this comment

mdhaber commented Mar 20, 2021

mdhaber left a comment • edited Loading

Choose a reason for hiding this comment

mdhaber commented Feb 20, 2022 • edited Loading

mdhaber left a comment •

edited

Loading

mdhaber commented Feb 20, 2022 •

edited

Loading