Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

Closed
mikoff opened this issue Mar 6, 2016 · 5 comments · Fixed by #14106
Closed

mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933

mikoff opened this issue Mar 6, 2016 · 5 comments · Fixed by #14106
Milestone

Comments

@mikoff
Copy link

mikoff commented Mar 6, 2016

I think that there is a strange behavior in Mann Whitney U-test implementation.
According to the wiki and other sources, the lowest U-value should be used to consult significance tables. The same value should be returned by the procedure.
However, in current implementation the returned U-value depends on the order of sample sizes in the procedure call and returns the test variable (U) for the first sample size (x).
May be it is better to return the tuple list of U-values, or the maximum U-value which always the same and equals to len(x)*len(y)

@ev-br
Copy link
Member

ev-br commented Mar 7, 2016

There is an effort on cleaning up/enhancing Mann-Whitney U test, #4933
Unfortunately it stalled and is up for grabs for someone to finish it up.

@ev-br
Copy link
Member

ev-br commented Mar 7, 2016

Also, I think stats version should be worked on first, and mstats version should follow. Is it something you'd be interested in working on, @mikoff ?

@mikoff
Copy link
Author

mikoff commented Mar 13, 2016

@ev-br
Yep, i would be happy to participate. Should i try to take #4993?

@ev-br
Copy link
Member

ev-br commented Mar 14, 2016

Both gh-4933 and gh-4993 (now that you mention it) look stalled, so working on either of them would be welcome I think. We generally do not formally assign issues to people, so you're welcome to pick either of them.

I suggest grabbing a branch from github, rebasing on master, and then working off the rebased one. For gh-4933 I suppose it's easier to start with https://github.com/ev-br/scipy/tree/pr/4933. The only thing we require is that the original author information is preserved (IOW, if you squash the commits, make sure the original authors are still listed). If you get stuck, ping us. It's OK to submit a WIP pull request, too.

@mdhaber
Copy link
Contributor

mdhaber commented May 22, 2021

@ev-br @chrisb83 @WarrenWeckesser
Regarding the test statistic returned, gh-4933 maintained the behavior of stats.mannwhitneyu: it is always the statistic associated with the first sample. This is now documented:

        statistic : float
            The Mann-Whitney U statistic corresponding with sample `x`. See
            Notes for the test statistic corresponding with sample `y`.

The notes explain how to calculate the other statistic.

This makes perfect sense to me. This matches the behavior of R's wilcox.test, and there is plenty of precedent in stats.
For instance, stats.ttest_ind returns the statistic associated with x, not the absolute value of the t-statistic. As

stats.ttest_ind(x, y).statistic == -stats.ttest_ind(y, x).statistic

so

stats.mannwhitneyu(x, y).statistic == len(x)*len(y)-stats.mannwhitneyu(y, x).statistic

From that perspective, the real issue is the unusual behavior of mstats.mannwhitneyu, which always returns the minimum statistic. This is addressed in gh-14106.

@tylerjereddy tylerjereddy added this to the 1.7.0 milestone May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants