Update the Mann-Whitney-Wilcoxon test #4933

Open
wants to merge 18 commits into
from

Projects

None yet

8 participants

@ahwagner
ahwagner commented Jun 2, 2015

Includes an exact test for small sample sizes, updated documentation, and user-configurable alternative hypotheses beyond the reported p-value is the one-sided hypothesis of the existing test. Default behavior is now a two-sided test, breaking backward compatibility.

@argriffing
Contributor

The backwards compatibility is a problem right? If someone using this function upgrades scipy then the interpretation of their statistical tests will silently change?

@josef-pkt
Member

I think the best way would be to give the new function a new name and deprecate the old one.
Then I would also return a Bunch or something like that so we can return more attributes.

@argriffing
Contributor

The implementation has some R comments... R has more of a GPL culture and if any of this was translated from GPL code it's probably not allowed into scipy.

Edit: for some reason I thought zprob looked like the name of an R function

@phobson phobson and 2 others commented on an outdated diff Jun 3, 2015
scipy/stats/stats.py
@@ -4218,7 +4218,8 @@ def ks_2samp(data1, data2):
return Ks_2sampResult(d, prob)
-def mannwhitneyu(x, y, use_continuity=True):
+def mannwhitneyu(x, y, use_continuity=True, use_exact=None,
@phobson
phobson Jun 3, 2015

Should use_exact default to False instead of None?

@argriffing
argriffing Jun 3, 2015 Contributor

It looks like None acts like 'auto' -- the exactness will be determined by data size.

@ahwagner
ahwagner Jun 3, 2015

None is supposed to act like 'auto'... renaming for clarity is not a problem. False has a distinct meaning, as in override the default behavior to use the approximation.

@ahwagner
ahwagner commented Jun 3, 2015

@argriffing Backwards compatibility is definitely a problem, but default behavior really should be the two-sided test. I like the idea @josef-pkt suggested--we could rename to mannwhitneywilcoxon instead and deprecate mannwhitneyu. This is my first contribution to scipy--are there any guidelines/procedures/examples I should follow for deprecating a function?

All code added to this function was designed fresh, based on the procedures outlined here and here, not translated from existing sources.

@argriffing
Contributor

This is my first contribution to scipy--are there any guidelines/procedures/examples I should follow for deprecating a function?

It's a decorator, like this:

import numpy as np
...
@np.deprecate(new_name="mannwhitneywilcoxon")
def mannwhitneyu(...):
    ...

I don't see it in the online docs, but that decorator is defined here.

@argriffing
Contributor

The CI is not liking super with no args on Python 2, and it is nitpicking the inline comment formatting https://travis-ci.org/scipy/scipy/jobs/65170719#L137.

@josef-pkt
Member

One more thought, if we use this opportunity for "house cleaning" also with respect to the related GSOC:

5, 6 years ago when I got started (I had no idea about these tests) there were several discussions and tests how good these related hypothesis tests are and the proposal to merge those into a new function. There should be several related tickets/issues.
I don't remember the names of the 3 functions that are almost the same.

BTW: It's good to see both the small sample distribution and the alternative option for this.

@argriffing
Contributor

Also the factorial check is slow for large n.

In [1]: from math import factorial

In [2]: import numpy as np

In [3]: from scipy.special import betaln

In [4]: n1 = 10

In [5]: n2 = 100000

In [6]: np.log(float(factorial(n1 + n2) / factorial(n1) / factorial(n2)))
Out[6]: 100.02539205737777

In [7]: -np.log(n1 + n2 + 1) - betaln(n1 + 1, n2 + 1)
Out[7]: 100.02539205743172

In [8]: %timeit factorial(n1 + n2) / factorial(n1) / factorial(n2)
1 loops, best of 3: 5.85 s per loop

In [8]: %timeit ( -np.log(n1 + n2 + 1) - betaln(n1 + 1, n2 + 1) )
100000 loops, best of 3: 3.56 µs per loop
@ahwagner ahwagner and 1 other commented on an outdated diff Jun 3, 2015
scipy/stats/stats.py
"""
x = asarray(x)
y = asarray(y)
n1 = len(x)
n2 = len(y)
- ranked = rankdata(np.concatenate((x, y)))
- rankx = ranked[0:n1] # get the x-ranks
- u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx, axis=0) # calc U for x
- u2 = n1*n2 - u1 # remainder is U for y
- bigu = max(u1, u2)
- smallu = min(u1, u2)
+ Stats = namedtuple('Stats', ['u', 'p'])
+ if use_exact == 'auto':
+ use_exact = (n1 < 10 or n2 < 10) and n1 + n2 < 100000 \
@ahwagner
ahwagner Jun 3, 2015

This line limits the factorial calculation to a worst-case scenario of 100000! / (99999! * 1!), which can take up to a few seconds.

@argriffing
argriffing Jun 3, 2015 Contributor

You can check it a million times faster with tricks like #4933 (comment).

@ahwagner
ahwagner Jun 3, 2015

This is a pretty neat trick, thanks for pointing it out. I'll add it in with the other changes.

@ahwagner
ahwagner commented Jun 3, 2015

@josef-pkt I found the comment you were referring to and rather like the idea of merging the ranksums and mannwhitneyu tests, as I believe they are synonymous. The third test, kruskal, is different enough that it should be separate IMO. Also, it looks like the related wilcoxon signed-rank test used to be a function, and indeed is still on the intro comments of stats.py. The signed-rank test would be good to merge into the current function, and used when paired=True.

What do you guys think?

@rgommers
Member
rgommers commented Jun 3, 2015

Note that this overlaps with gh-4899. That also has a few more tests. And maybe the comments at the bottom of gh-2118 still apply (I didn't check in detail)?

@rgommers
Member
rgommers commented Jun 3, 2015

Is the deprecation and new function really needed? Is there something really wrong with the current code, or is the only reason wanting to change the default to alternative='two-sided'?

@ahwagner
ahwagner commented Jun 3, 2015

@rgommers the current code is actually really wrong. It currently does a two-sided test, and then reports a halved p-value as a "one-sided" test result. This is (obviously) not the same as a one-sided p-value.

@ahwagner
ahwagner commented Jun 3, 2015

@rgommers thanks for noting that this addresses the issue in #4899 . The comments in #2118 discuss which U value to report, and the behavior of my code is consistent with the reporting behavior of the wilcoxon method of the R stats package (v 3.2.0). I like this method of reporting the U statistic for its consistency with R, and u1 and u2 can be directly inferred from the input vectors and p-value.

@ahwagner ahwagner Restore previous mannwhitneyu test with deprecate warning.
Deprecate ranksums (subsumed by wilcoxon).
Rename revised mannwhitneyu to wilcoxon.
Change namedtuple to bunch output.
Revise wilcoxon tests to match new output.
5d26f78
@ahwagner
ahwagner commented Jun 3, 2015

Not wanting to revise my commit message, the following changes were also included in this update:

  • logbeta trick for quick check of possible combinations
  • fixed super for py2 compatibility
@argriffing
Contributor

logbeta trick

If you want to keep the same behavior, the check should be like < np.log(100000) instead of < 100.

@ahwagner
ahwagner commented Jun 4, 2015

I think that the current code contains the improvements and fixes suggested above, and doesn't break any backwards compatibility. Thanks @argriffing @josef-pkt @rgommers for the feedback.

@aeklant
Contributor
aeklant commented Jun 4, 2015

Hi @ahwagner,

I got a little late to the party, but it seems a concensus has been reached to keep this new function and deprecate mannwhitneyu and ranksums in which case a note should be added to doc/release/0.17.0-notes.rst (see gh-4914 for an example) and also inform it to the mailing list, so that anyone who doesn't happen to be following this PR can get a chance to speak their mind if they have something to say before merging.

@ahwagner
ahwagner commented Jun 4, 2015

Thanks, @aeklant. I've added in a note in doc/release/0.17.0-notes.rst. I also just posted this notice to the mailing list.

@ahwagner

@argriffing @josef-pkt @rgommers I believe that the current commit is ready for review and merge--all issues mentioned here and in the mailing list have been addressed.

@argriffing
Contributor

I'll defer to the stats GSoC project about merging this.

@aeklant
Contributor
aeklant commented Jun 13, 2015

There is another open PR, gh-4882 that was meant to deal with this. I know @rgommers had some doubts about the deprecation of mannwhitneyu and ranksums in favor of a new function but I didn't see any objections being raised in the mailing list.

Personally, I think this should be merged because:

  • It fixes a so far faulty implementation.
  • It returns bunches which has been discussed for a while now as the way to go with new functions (see gh-3665) and this looks like a good way to make the transition.
  • We can still merge a last update to mannwhitneyu as was planned. In the future it can even be changed to delegate to mww if the users find the name change confusing.

I would like to hear what the rest of you think. @rgommers @josef-pkt @argriffing

@ev-br ev-br and 2 others commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ """
+ Computes two-sample unpaired Mann-Whitney-Wilcoxon tests.
+
+ Parameters
+ ----------
+ x : array_like
+ The first set of measurements.
+ y : array_like
+ The second set of measurements.
+ correction : bool, optional
+ If True, apply continuity correction by adjusting the Wilcoxon rank
+ statistic by 0.5 towards the mean value when computing the z-statistic.
+ Default is False.
+ exact : bool, optional
+ Whether an exact test should be performed on the data. See notes for
+ default behavior.
@ev-br
ev-br Jun 13, 2015 Member

Bool type seems inconsistent with the default value 'auto'

@ahwagner
ahwagner Jun 15, 2015

I agree. The user arguments input here should be boolean--but the default value was changed to 'auto' instead of None for intuitiveness of function (see above). Perhaps this is a good time to have a broader discussion about SciPy convention and style regarding this issue? @rgommers @phobson

@rgommers
rgommers Jun 15, 2015 Member

Just accepting one of True, False, 'auto' seems OK to me. Then the type definition can be tweaked a little to address @ev-br's comment: exact : {True, False, 'auto'}, optional

@ev-br ev-br and 1 other commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ pvalue : float
+ The pvalue of the test.
+ exact : bool
+ Indicates if an exact pvalue was calculated.
+ alternative : string
+ Describes the alternative hypothesis.
+ u1 : float
+ The U-value corresponding to the set of measurements in x
+ u2 : float
+ The U-value corresponding to the set of measurements in y
+
+ Notes
+ -----
+ Exact tests should be used for smaller sample sizes. Concretely, as
+ len(x) and len(y) increase to and beyond 8, the distribution of U differs
+ negligibly from the normal distribution[1]. The default behavior of this
@ev-br
ev-br Jun 13, 2015 Member

ReST syntax: Reference needs a trailing underscore, [1]_

@ev-br ev-br commented on the diff Jun 13, 2015
scipy/stats/stats.py
+
+ Returns
+ -------
+ Bunch object with following attributes:
+ statistic : float
+ The test statistic.
+ pvalue : float
+ The pvalue of the test.
+ exact : bool
+ Indicates if an exact pvalue was calculated.
+ alternative : string
+ Describes the alternative hypothesis.
+ u1 : float
+ The U-value corresponding to the set of measurements in x
+ u2 : float
+ The U-value corresponding to the set of measurements in y
@ev-br
ev-br Jun 13, 2015 Member

Not sure if this is going to render correctly in the online docs.

@ahwagner
ahwagner Jun 15, 2015

Glancing around, I had selected a width of 80 characters as the max line length, e.g. ks_2samp, ttest_ind_from_stats. Or perhaps you're referring to something else wrong with this? Would you clarify what's missing/incorrect?

@rgommers
rgommers Jun 15, 2015 Member

Probably @ev-br meant the formatting of this whole list. I suspect that the definition list is OK but it needs a blank below Bunch object with following attributes: ...

I'll try to build the docs now to see how it looks.

@rgommers
rgommers Jun 15, 2015 Member

The doc build revealed another issue, the function name needs to be added to the listing in stats/__init__.py for it to show up in the html/pdf docs.

@rgommers
rgommers Jun 15, 2015 Member

It actually looks fine.

@ev-br
ev-br Jun 15, 2015 Member

Yes, precisely, my worry was the list formatting (mysterious are the ways of numpydoc)

@ev-br ev-br commented on the diff Jun 13, 2015
scipy/stats/stats.py
+ i += 1
+ a[i] -= 1
+ j = i - 1
+ while j >= 0:
+ a[j] = a[i] - (i - j)
+ j -= 1
+ # count(a < a1) = U2
+ u1 = 0
+ for i, x in enumerate(a):
+ u1 += n1 - x + i
+ u2 = n1*n2 - u1
+ # store min U value to array
+ if alternative == 'two-sided':
+ u.append(min(u1, u2))
+ else:
+ u.append(u1)
@ev-br
ev-br Jun 13, 2015 Member

I'm pretty sure this while loop can be simplified by vectorizing inner loops. Unless this was a conscious decision to use python-level looping, was it?

@ahwagner
ahwagner Jun 15, 2015

I'm open to suggestions. The above code was pretty fast and the intuitive way (for me) to implement the permutations--but I'm happy to improve this if you could provide an example.

@ev-br
ev-br Jun 15, 2015 Member

It's clarity, not speed. For instance,

a = np.arange(n1, n1+n2)
u1 = n1*n2 + n2*(n2-1)/2 - a.sum()

seems to be simpler than this u1=0; for i, x in enumerate(a) loop. Note that I'm not criticizing your code, I'm just asking to sum the arithmetic series.

Likewise for the while j>0 loop above.

@argriffing
argriffing Jun 16, 2015 Contributor
a = np.arange(n1, n1+n2)
u1 = n1*n2 + n2*(n2-1)/2 - a.sum()

I don't understand that code. Does it have a typo? u1 would be 0, right?

@ev-br
ev-br Jun 16, 2015 Member

Ah, right. I should have made it clearer:

Using a numpy array for a in line 4370, for example, a = np.arange(n1, n1+n1), the while loop in lines 4384-4385 can be replaced by u1 = n1*n2 + n2*(n2-1)/2 - a.sum().

@ev-br ev-br and 1 other commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ a[j] = a[i] - (i - j)
+ j -= 1
+ # count(a < a1) = U2
+ u1 = 0
+ for i, x in enumerate(a):
+ u1 += n1 - x + i
+ u2 = n1*n2 - u1
+ # store min U value to array
+ if alternative == 'two-sided':
+ u.append(min(u1, u2))
+ else:
+ u.append(u1)
+ u1 = 0
+ u = np.array(u)
+ for i, x in enumerate(sorted(rankx)):
+ u1 += x - 1 - i
@ev-br
ev-br Jun 13, 2015 Member

Seems to be equivalent to

u1 = rankx.sum() - n1*(n1+1)/2
@ev-br ev-br and 1 other commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ u.append(u1)
+ u1 = 0
+ u = np.array(u)
+ for i, x in enumerate(sorted(rankx)):
+ u1 += x - 1 - i
+ u2 = n1 * n2 - u1
+ if alternative == 'two-sided':
+ smallu = min(u1, u2)
+ elif alternative == 'greater':
+ smallu = u1
+ else:
+ smallu = u2
+ p = sum(u <= smallu) / len(u)
+ u = smallu
+ else:
+ u1 = n1*n2 + (n1*(n1+1))/2.0 - np.sum(rankx, axis=0) # calc U for x
@ev-br
ev-br Jun 13, 2015 Member

How is this different from a calculation of u2 in the exact branch --- u1 and u2 seem to be flipped around unless I'm mistaken.

@ev-br
ev-br Jun 13, 2015 Member

And just a nitpick: extra parenthesis are best avoided unless you want to force a specific order of operations.

@ahwagner
ahwagner Jun 15, 2015

u1 and u2 appear flipped because the approximation branch uses the survival function, which is 1-cdf (this is legacy code, from the original mannwhitneyu). The exact test (new code) calculates directly from the empirical cumulative distribution.

@ev-br
ev-br Jun 15, 2015 Member

Can you please add a comment to this effect then. Saves confusion down the line.

@ahwagner
ahwagner Jun 15, 2015

I was wrong. After adding tests to verify that u1 and u2 are as expected, they are, in fact, reversed in the exact code (and this reversal was corrected in the subsequent testing). This has been fixed.

@ev-br ev-br and 1 other commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ alternative : string
+ Describes the alternative hypothesis.
+ u1 : float
+ The U-value corresponding to the set of measurements in x
+ u2 : float
+ The U-value corresponding to the set of measurements in y
+
+ Notes
+ -----
+ Exact tests should be used for smaller sample sizes. Concretely, as
+ len(x) and len(y) increase to and beyond 8, the distribution of U differs
+ negligibly from the normal distribution[1]. The default behavior of this
+ test is to calculate the number of possible sequences for inputs of
+ length(x) and length(y), and to do an exact calculation if the number of
+ possible combinations is <100000. The default behavior may be overridden
+ with the use_exact flag.
@ev-br
ev-br Jun 13, 2015 Member

It might be just my ignorance, but from this explanation I still do not understand what the exact flag does. Is it exact versus what, approximate? a normal approximation?

@ahwagner
ahwagner Jun 15, 2015

Yes, versus a normal approximation. I'll be sure to update the description accordingly.

@ev-br ev-br and 1 other commented on an outdated diff Jun 13, 2015
scipy/stats/stats.py
+ For Mann-Whitney-Wilcoxon tests, the reported U statistic is the U used to
+ test the hypothesis. The u1 and u2 statistics are returned as well,
+ corresponding to x and y, respectively.
+
+
+ .. versionadded:: 0.17.0
+
+ References
+ ----------
+ .. [1] HB Mann and DR Whitney (1947). "On a Test of Whether one of Two
+ Random Variables is Stochastically Larger than the Other".
+ Annals of Mathematical Statistics. doi:10.1214/aoms/1177730491
+ .. [2] http://en.wikipedia.org/wiki/Mann-Whitney_U_test
+ """
+ x = asarray(x)
+ y = asarray(y)
@ev-br
ev-br Jun 13, 2015 Member

I guess it might make sense to specify the behavior for higher-dimensional input. Is the calculation below doing the right thing for eg x.ndim > 1? (Then it should be tested). Or if it doesn't, it's better to raise an error early on IMO.

@ahwagner
ahwagner Jun 15, 2015

Implemented error catch.

@ev-br
Member
ev-br commented Jun 13, 2015

I've left several inline comments, mostly dealing with implementation details. As far as merging is concerned, I also defer to the stats GSoC.

@ev-br
Member
ev-br commented Jun 13, 2015

And while you're at it, it might make sense to also address #4067. (This is not strictly necessary for this PR though).

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
'f_value_wilks_lambda', 'f_value', 'f_value_multivariate',
'ss', 'square_of_sums', 'fastsort', 'rankdata', 'nanmean',
'nanstd', 'nanmedian', 'combine_pvalues', ]
+# This class will be used by place_poles to return its results
@rgommers
rgommers Jun 14, 2015 Member

This comment line can be removed; place_poles is an unrelated function in scipy.signal. Keep the link to the activestate recipe though.

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
+ Computes two-sample unpaired Mann-Whitney-Wilcoxon tests.
+
+ Parameters
+ ----------
+ x : array_like
+ The first set of measurements.
+ y : array_like
+ The second set of measurements.
+ correction : bool, optional
+ If True, apply continuity correction by adjusting the Wilcoxon rank
+ statistic by 0.5 towards the mean value when computing the z-statistic.
+ Default is False.
+ exact : bool, optional
+ Whether an exact test should be performed on the data. See notes for
+ default behavior.
+ alternative : string, optional
@rgommers
rgommers Jun 14, 2015 Member

style nitpick: for type specifiers, use str instead of string. Same for alternative : string below.

@rgommers
Member

Copy of my comment on the mailing list: "the name mww is very nondescriptive, we usually try to avoid such names. mann_whitney_wilcoxon is long, but already better.

Another option is to merge your mww with stats.wilcoxon, could be done by adding a paired=True keyword to that function. This is the choice that R has made (except defaults to Mann-Whitney): http://stat.ethz.ch/R-manual/R-patched/library/stats/html/wilcox.test.html"

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
@@ -4275,6 +4277,158 @@ def mannwhitneyu(x, y, use_continuity=True):
return MannwhitneyuResult(smallu, distributions.norm.sf(z))
+# TODO: add paired=False, do a signed-rank test if paired=True or y is not provided. See morestats.wilcoxon
+def mww(x, y, correction=True, exact='auto',
+ alternative='two-sided'):
@rgommers
rgommers Jun 14, 2015 Member

The comment above can be split and the signature un-split, to make line length look a bit more regular.

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
+# TODO: add paired=False, do a signed-rank test if paired=True or y is not provided. See morestats.wilcoxon
+def mww(x, y, correction=True, exact='auto',
+ alternative='two-sided'):
+ """
+ Computes two-sample unpaired Mann-Whitney-Wilcoxon tests.
+
+ Parameters
+ ----------
+ x : array_like
+ The first set of measurements.
+ y : array_like
+ The second set of measurements.
+ correction : bool, optional
+ If True, apply continuity correction by adjusting the Wilcoxon rank
+ statistic by 0.5 towards the mean value when computing the z-statistic.
+ Default is False.
@rgommers
rgommers Jun 14, 2015 Member

Do you want the default to be False? R has True as default. My impression is that usually you do want this correction.

@ahwagner
ahwagner Jun 15, 2015

This was a typo. Matching description to function behavior.

@rgommers rgommers and 2 others commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
+
+ The reported p-value is for a two-sided hypothesis. To get the one-sided
+ p-value set alternative to 'greater' or 'less' (default is 'two-sided').
+
+ For Mann-Whitney-Wilcoxon tests, the reported U statistic is the U used to
+ test the hypothesis. The u1 and u2 statistics are returned as well,
+ corresponding to x and y, respectively.
+
+
+ .. versionadded:: 0.17.0
+
+ References
+ ----------
+ .. [1] HB Mann and DR Whitney (1947). "On a Test of Whether one of Two
+ Random Variables is Stochastically Larger than the Other".
+ Annals of Mathematical Statistics. doi:10.1214/aoms/1177730491
@rgommers
rgommers Jun 14, 2015 Member

Can you format this like so:

H.B. Mann and D.R. Whitney, "On a test of whether one of two random variables is
stochastically larger than the other", The Annals of Mathematical Statistics, 
Vol. 18, pp. 50-60, 1947.

Note that this is the standard formatting for all citations (see https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt)

@ev-br
ev-br Jun 14, 2015 Member

I'd suggest to keep the DOI link as well.

@ahwagner
ahwagner Jun 15, 2015

Reformatted to match style guide.

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
+ Random Variables is Stochastically Larger than the Other".
+ Annals of Mathematical Statistics. doi:10.1214/aoms/1177730491
+ .. [2] http://en.wikipedia.org/wiki/Mann-Whitney_U_test
+ """
+ x = asarray(x)
+ y = asarray(y)
+ n1 = len(x)
+ n2 = len(y)
+ if exact == 'auto':
+ exact = (n1 < 10 or n2 < 10) and n1 + n2 < 100000 \
+ and -np.log(n1 + n2 + 1) - special.betaln(n1 + 1, n2 + 1) < np.log(100000)
+ ranked = rankdata(np.concatenate((x,y)))
+ rankx = ranked[0:n1] # get the x-ranks
+ T = tiecorrect(ranked)
+ if T == 0:
+ raise ValueError('All numbers are identical in mannwhitneyu')
@rgommers
rgommers Jun 14, 2015 Member

This still references the old function name. It can simply be left out I think.

@rgommers rgommers and 1 other commented on an outdated diff Jun 14, 2015
scipy/stats/stats.py
+
+ .. versionadded:: 0.17.0
+
+ References
+ ----------
+ .. [1] HB Mann and DR Whitney (1947). "On a Test of Whether one of Two
+ Random Variables is Stochastically Larger than the Other".
+ Annals of Mathematical Statistics. doi:10.1214/aoms/1177730491
+ .. [2] http://en.wikipedia.org/wiki/Mann-Whitney_U_test
+ """
+ x = asarray(x)
+ y = asarray(y)
+ n1 = len(x)
+ n2 = len(y)
+ if exact == 'auto':
+ exact = (n1 < 10 or n2 < 10) and n1 + n2 < 100000 \
@rgommers
rgommers Jun 14, 2015 Member

Style nitpick: can you avoid the backslash by using brackets around the whole expression?

@rgommers
Member

After looking into this some more, I agree that it makes sense to deprecate both mannwhitneyu and ranksums.

@rgommers
Member

@ahwagner for future commits it might be nice to use the standard prefixes when writing your commit messages: http://docs.scipy.org/doc/numpy-dev/dev/gitwash/development_workflow.html#writing-the-commit-message

@rgommers
Member

This PR closes issue gh-1428, regression test for that ranksums issue is included in the tests.

@ahwagner

@rgommers, I like the idea of merging with stats.wilcoxon, but this would move us away from returning results as Bunch objects (without breaking backward compatibility). With this in mind, my preference would be to rename this function to be more in line with the SciPy naming convention. mann_whitney_u is slightly shorter than mann_whitney_wilcoxon, but when spoken conflicts with the now deprecated mannwhitneyu. I'm okay with renaming this to whatever--what is your preference?

@rgommers
Member

mann_whitney_u sounds fine to me as well.

Any other takers?

@ev-br
Member
ev-br commented Jun 15, 2015

As one more nitpick, it's not strictly necessary to list specific comments in the commit messages. No harm in doing that, just don't bother next time.

@ev-br ev-br commented on the diff Jun 15, 2015
THANKS.txt
@@ -147,6 +147,7 @@ Irvin Probst (ENSTA Bretagne) for pole placement.
Ian Henriksen for Cython wrappers for BLAS and LAPACK
Fukumu Tsutsumi for bug fixes.
J.J. Green for interpolation bug fixes.
+Alex Wagner for rewriting and improving the Mann-Whitney-Wilcoxon test (stats.mww).
@ev-br
ev-br Jun 15, 2015 Member

Once the name of the function settles, this will need to be changed, here and in the release notes.

@ev-br ev-br commented on the diff Jun 15, 2015
scipy/stats/stats.py
+ test the hypothesis. The u1 and u2 statistics are returned as well,
+ corresponding to x and y, respectively.
+
+
+ .. versionadded:: 0.17.0
+
+ References
+ ----------
+ .. [1] H.B. Mann and D.R. Whitney, "On a test of whether one of two random
+ variables is stochastically larger than the other", The Annals of
+ Mathematical Statistics, Vol. 18, pp. 50-60, 1947.
+ .. [2] http://en.wikipedia.org/wiki/Mann-Whitney_U_test
+ """
+ x = asarray(x)
+ y = asarray(y)
+ if len(np.shape(x)) > 1 or len(np.shape(y)) > 1:
@ev-br
ev-br Jun 15, 2015 Member

x.ndim > 1

@ev-br ev-br commented on the diff Jun 15, 2015
scipy/stats/stats.py
+ """
+ x = asarray(x)
+ y = asarray(y)
+ if len(np.shape(x)) > 1 or len(np.shape(y)) > 1:
+ raise ValueError('Expected 1-d arrays.')
+ n1 = len(x)
+ n2 = len(y)
+ if exact == 'auto':
+ exact = ((n1 < 10 or n2 < 10) and n1 + n2 < 100000
+ and -np.log(n1 + n2 + 1) - special.betaln(n1 + 1, n2 + 1) < np.log(100000))
+ ranked = rankdata(np.concatenate((x,y)))
+ rankx = ranked[0:n1] # get the x-ranks
+ T = tiecorrect(ranked)
+ if T == 0:
+ raise ValueError('All numbers are identical')
+ if alternative not in ('two-sided', 'less', 'greater'):
@ev-br
ev-br Jun 15, 2015 Member

Style nitpick: input validation best done before any other calculations (rationale: why ranking x and y if it's failing anyway)

@ev-br

Can you keep the DOI please --- one click is better than a google search :-).

@rgommers
Member

@phobson was in favor of merging into wilcoxon on the ML. The argument that that doesn't allow Bunches carries some weight though. So maybe the new function plus merging wilcoxon in as an option with a paired=False kw is the best way to go.

Some more options for names: wilcox or wilcox_test.

@ev-br ev-br commented on the diff Jun 16, 2015
scipy/stats/stats.py
+ u.append(min(u1, u2))
+ else:
+ u.append(u1)
+ u1 = 0
+ u = np.array(u)
+ # for i, x in enumerate(sorted(rankx)):
+ # u1 += x - 1 - i
+ u1 = rankx.sum() - n1*(n1+1)/2
+ u2 = n1 * n2 - u1
+ if alternative == 'two-sided':
+ smallu = min(u1, u2)
+ elif alternative == 'greater':
+ smallu = u1
+ else:
+ smallu = u2
+ p = sum(u <= smallu) / len(u)
@ev-br
ev-br Jun 16, 2015 Member

Nitpick: u is an array, so can do u[u <= smallu].sum() / u.size

@ev-br ev-br commented on the diff Jun 16, 2015
scipy/stats/stats.py
+ else:
+ smallu = u2
+ p = sum(u <= smallu) / len(u)
+ u = smallu
+ else:
+ u1 = n1*n2 + n1*(n1+1)/2.0 - np.sum(rankx, axis=0) # calc U for x
+ u2 = n1*n2 - u1 # remainder is U for y
+ if alternative == 'two-sided':
+ bigu = max(u1, u2)
+ smallu = min(u1, u2)
+ elif alternative == 'greater':
+ bigu = u2
+ smallu = u1
+ else:
+ bigu = u1
+ smallu = u2
@ev-br
ev-br Jun 16, 2015 Member

This bit, starting from line 4407 (u1 = ...) is duplicated between the two if exact branches, up to what seems to be swapping around u1 and u2. I suggest deduplicating.

@josef-pkt
Member

Related to merging paired and independent:
We have paired and independent t_tests as separate functions.
Do we have the other options for paired, for examples is there an exact p-value for it?

(In statsmodels I have paired t_test only as special case of one sample t_test, because it's the same code. I didn't check if that's the same for rank tests wilcoxon)

@argriffing
Contributor

I have a question about the speed. Is the complexity of this test dominated by rankdata? If so, does the implementation reflect this? Or does it have unnecessarily N^2 parts?

@toobaz
toobaz commented Jul 17, 2015

@ahwagner : I like your "auto" behavior (a short note of mine may interest you, as long as its content is not entirely obvious to you)

But do you have any pointer for the algorithm you use for the exact calculation of the MWW significance? I never saw it, and although I think I understand it, it is not clear to me how it compares to the standard recursive one in terms of computational cost (warning: I reimplemented the recursive one in a file which can be downloaded from the page above, but don't look at such file, since the caching strategy is taken from R!).

@ev-br
Member
ev-br commented Aug 26, 2015

This starts to look abandoned (@ahwagner?).
And this is not mergeable anymore, because of the changes to mannwhitneyu over the summer's GSoC work by @aeklant.

To help move it forward, here's a rebased branch: https://github.com/ev-br/scipy/tree/pr/4933
It should apply cleanly to the current master (I squashed the commits to simplify rebasing).
I also tweaked the implementation details of the new mann_whitney_u to address my comments above (I did not address @argriffing's #4933 (comment) though).

IMO, this is very close. The only things which are needed are

  • respond to #4933 (comment)
  • a review of the refactored small sample code path would be useful. I only did a mechanical refactoring.
  • A test comparing the exact=True and exact=False branches is a nice to have.
  • There are two sets of tests added for mannwhitneyu and mann_whitney_u independently. We likely want to keep the test coverage of "old" mannwhitneyu, but also it'd be nice to port the useful ones to the "new" mann_whitney_u.
@ev-br ev-br added the needs-work label Aug 26, 2015
@ahwagner

I agree that it’s close—but since around mid-June I’ve been swamped with grants/papers/ongoing projects. I’m happy to continue contributing as I can (or collaborate with others that want to pick up where I left off), but I don’t plan to address this PR again for a few weeks yet.

On Aug 26, 2015, at 4:25 PM, Evgeni Burovski notifications@github.com wrote:

This starts to look abandoned (@ahwagner https://github.com/ahwagner?).
And this is not mergeable anymore, because of the changes to mannwhitneyu over the summer's GSoC work by @aeklant https://github.com/aeklant.

To help move it forward, here's a rebased branch: https://github.com/ev-br/scipy/tree/pr/4933 https://github.com/ev-br/scipy/tree/pr/4933
It should apply cleanly to the current master (I squashed the commits to simplify rebasing).
I also tweaked the implementation details of the new mann_whitney_u to address my comments above (I did not address @argriffing https://github.com/argriffing's #4933 (comment) #4933 (comment) though).

IMO, this is very close. The only things which are needed are

respond to #4933 (comment) #4933 (comment)
a review of the refactored small sample code path would be useful. I only did a mechanical refactoring.
A test comparing the exact=True and exact=False branches is a nice to have.
There are two sets of tests added for mannwhitneyu and mann_whitney_u independently. We likely want to keep the test coverage of "old" mannwhitneyu, but also it'd be nice to port the useful ones to the "new" mann_whitney_u.

Reply to this email directly or view it on GitHub #4933 (comment).


This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you.

@josef-pkt
Member

as reference Julia has exact also, but partially delegates to rmath for p-values in the no-tie case
https://github.com/JuliaStats/HypothesisTests.jl/blob/master/src/mann_whitney.jl
(MIT licensed)

@rgommers
Member

@ahwagner are you still interested in this? If not, we'll close this PR and try to find a volunteer to continue based on @ev-br's update of this PR.

@ahwagner
ahwagner commented Jan 3, 2017

@rgommers I haven't looked at it in a while, that's for sure. I'm happy to continue contributing towards this PR, though. I'll review @ev-br's updates and the comment history, and try to put some additional work into it soon.

@rgommers
Member
rgommers commented Jan 3, 2017

@ahwagner great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment