-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect when ties in ranks exist and appropriately warn users and fall back to approx #13690
Conversation
warnings.warn("Exact p-value calculation does not work if there are " | ||
"zeros. Switching to normal approximation.") | ||
|
||
if np.unique(abs(d)).size < d.size and mode == "exact": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something about our API that needs to be discussed more generally. @jreynolds01, are you on the SciPy-dev mailing list? A message recently went out about gh-13650. It would be great if you would express your opinion on the mailing list. I'm conflicted here. The default method is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @jreynolds01. I took a closer look, though, and I'm afraid there is either a misunderstanding about the test or the variable d
.
wilcoxon
is used to perform a paired-sample test. When only x
is provided, x
is the difference between a pair of samples, and d = x
. When two samples x
and y
are provided (where x[i]
is paired with y[i]
), d = x - y
. In either case, d==0
means that a pair of observations is tied.
So it is always accurate to detect when d==0
and give the error message "Exact p-value calculation does not work if there are ties", as in master. This PR would change that message to "Exact p-value calculation does not work if there are zeros". This would be true - although perhaps a little less clear about what the problem really is - when only x
is provided. But it would be confusing when x
and y
are provided, because it is not about either x
or y
having a zero element.
I don't think the new check It does change the null distribution of the test statistic.if np.unique(abs(d)).size < d.size
is appropriate, because the test has no problem when the difference d
between a pair of samples is not unique.
Also, there is no guarantee that switching to the normal approximation in case of ties will be any more accurate when the sample size is small. For this test (and others) Hollander and Wolfe "Nonparametric Statistical Methods" recommends "If there are ties among the N observations, use average ranks to compute W, and carry on as in [the case without ties]" (unless the sample is large, in which case it suggests the asymptotic approximation with tie correction). So for small samples there is precedent to use the same null distribution (computed without considering ties) even in the presence of ties. I believe this acceptable because ties tend to reduce the variance of the null distribution, so the returned p-values are conservative (more chance of type-II error, less chance of type I error).
@jreynolds01 perhaps the best thing to do here would be to actually consider ties in the calculation of the null distribution. Would you be interested in extending the method I implemented here to handle the case of ties?
This has merge conflicts since the move from |
Reference issue
Closes #13689
What does this implement/fix?
Fixes appropriate detection of ties and fall back to approx mode.
Additional information