New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mstats_basic.py - mannwhitneyu [scipy/scipy/stats/mstats_basic.py] #5933
Comments
There is an effort on cleaning up/enhancing Mann-Whitney U test, #4933 |
Also, I think stats version should be worked on first, and mstats version should follow. Is it something you'd be interested in working on, @mikoff ? |
Both gh-4933 and gh-4993 (now that you mention it) look stalled, so working on either of them would be welcome I think. We generally do not formally assign issues to people, so you're welcome to pick either of them. I suggest grabbing a branch from github, rebasing on master, and then working off the rebased one. For gh-4933 I suppose it's easier to start with https://github.com/ev-br/scipy/tree/pr/4933. The only thing we require is that the original author information is preserved (IOW, if you squash the commits, make sure the original authors are still listed). If you get stuck, ping us. It's OK to submit a WIP pull request, too. |
@ev-br @chrisb83 @WarrenWeckesser
The notes explain how to calculate the other statistic. This makes perfect sense to me. This matches the behavior of R's stats.ttest_ind(x, y).statistic == -stats.ttest_ind(y, x).statistic so stats.mannwhitneyu(x, y).statistic == len(x)*len(y)-stats.mannwhitneyu(y, x).statistic From that perspective, the real issue is the unusual behavior of |
I think that there is a strange behavior in Mann Whitney U-test implementation.
According to the wiki and other sources, the lowest U-value should be used to consult significance tables. The same value should be returned by the procedure.
However, in current implementation the returned U-value depends on the order of sample sizes in the procedure call and returns the test variable (U) for the first sample size (x).
May be it is better to return the tuple list of U-values, or the maximum U-value which always the same and equals to len(x)*len(y)
The text was updated successfully, but these errors were encountered: