mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

mikoff · 2016-03-06T19:37:03Z

I think that there is a strange behavior in Mann Whitney U-test implementation.
According to the wiki and other sources, the lowest U-value should be used to consult significance tables. The same value should be returned by the procedure.
However, in current implementation the returned U-value depends on the order of sample sizes in the procedure call and returns the test variable (U) for the first sample size (x).
May be it is better to return the tuple list of U-values, or the maximum U-value which always the same and equals to len(x)*len(y)

ev-br · 2016-03-07T13:09:03Z

There is an effort on cleaning up/enhancing Mann-Whitney U test, #4933
Unfortunately it stalled and is up for grabs for someone to finish it up.

ev-br · 2016-03-07T13:11:17Z

Also, I think stats version should be worked on first, and mstats version should follow. Is it something you'd be interested in working on, @mikoff ?

mikoff · 2016-03-13T17:44:39Z

@ev-br
Yep, i would be happy to participate. Should i try to take #4993?

ev-br · 2016-03-14T03:38:52Z

Both gh-4933 and gh-4993 (now that you mention it) look stalled, so working on either of them would be welcome I think. We generally do not formally assign issues to people, so you're welcome to pick either of them.

I suggest grabbing a branch from github, rebasing on master, and then working off the rebased one. For gh-4933 I suppose it's easier to start with https://github.com/ev-br/scipy/tree/pr/4933. The only thing we require is that the original author information is preserved (IOW, if you squash the commits, make sure the original authors are still listed). If you get stuck, ping us. It's OK to submit a WIP pull request, too.

mdhaber · 2021-05-22T02:12:04Z

@ev-br @chrisb83 @WarrenWeckesser
Regarding the test statistic returned, gh-4933 maintained the behavior of stats.mannwhitneyu: it is always the statistic associated with the first sample. This is now documented:

        statistic : float
            The Mann-Whitney U statistic corresponding with sample `x`. See
            Notes for the test statistic corresponding with sample `y`.

The notes explain how to calculate the other statistic.

This makes perfect sense to me. This matches the behavior of R's wilcox.test, and there is plenty of precedent in stats.
For instance, stats.ttest_ind returns the statistic associated with x, not the absolute value of the t-statistic. As

stats.ttest_ind(x, y).statistic == -stats.ttest_ind(y, x).statistic

so

stats.mannwhitneyu(x, y).statistic == len(x)*len(y)-stats.mannwhitneyu(y, x).statistic

From that perspective, the real issue is the unusual behavior of mstats.mannwhitneyu, which always returns the minimum statistic. This is addressed in gh-14106.

ev-br added the scipy.stats label Mar 7, 2016

mdhaber mentioned this issue Dec 29, 2020

ENH: Update the Mann-Whitney-Wilcoxon test #4933

Merged

4 tasks

mdhaber mentioned this issue May 22, 2021

DOC: stats.mstats: mannwhitneyu: the returned statistic is the minimum of the two statistics #14106

Merged

chrisb83 closed this as completed in #14106 May 26, 2021

tylerjereddy added this to the 1.7.0 milestone May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

mikoff commented Mar 6, 2016

ev-br commented Mar 7, 2016

ev-br commented Mar 7, 2016

mikoff commented Mar 13, 2016

ev-br commented Mar 14, 2016

mdhaber commented May 22, 2021

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

Comments

mikoff commented Mar 6, 2016

ev-br commented Mar 7, 2016

ev-br commented Mar 7, 2016

mikoff commented Mar 13, 2016

ev-br commented Mar 14, 2016

mdhaber commented May 22, 2021