BUG: Fix mannwhitneyu to be backward compatible #6100

Merged
merged 1 commit into from May 5, 2016

Projects

None yet

4 participants

@tavinathanson
Contributor
tavinathanson commented Apr 25, 2016 edited

Per #6062, mannwhitneyu is no longer backward compatible as of 0.17.0. This PR is meant as a 0.17.1 fix before mannwhitneyu eventually gets deprecated in 0.18.

In this PR, all changes were meant to revert to pre-0.17.0 defaults:

  • Add alternative = None which reverts mannwhitneyu's behavior to pre-0.17.0. Default to None.
  • Fix the returned MannwhitneyuResult to always return the appropriate U statistic; it was returning u2 regardless of less or greater. (Edit: I'm told it should indeed return u2, and I updated it accordingly.)
  • Update tests accordingly.
  • Minor spelling fixes.
@ev-br ev-br added this to the 0.17.1 milestone Apr 25, 2016
@ev-br
Member
ev-br commented Apr 26, 2016

OK, I eyeballed the alternative=None code path and it does look to be in agreement with mannwhitneyu from version 0.16.1: https://github.com/scipy/scipy/blob/maintenance/0.16.x/scipy/stats/stats.py#L4265

It'd be nice to have one other pair of eyeballs on this, since this is the last issue blocking the release of 0.17.1.

@josef-pkt josef-pkt commented on an outdated diff Apr 26, 2016
scipy/stats/tests/test_stats.py
assert_equal(p1, p2)
assert_equal(u1, 498)
- assert_equal(u2, 102)
+ assert_equal(u2, 498)
+ assert_equal(u3, 102)
assert_approx_equal(p1, 0.999955905990004, significant=self.significant)
def test_mannwhitneyu_no_correct_greater(self):
@josef-pkt
josef-pkt Apr 26, 2016 Member

It looks to me this test and the previous one could be merged because they overlap

(streamlining, minor point)

@josef-pkt
Member

reading briefly through the changes.
pvalue looks good to me

For the statistic: returning u2 would be better because it is independent of the alternative. It's a statistic for the difference which should compare the sample but not flip "sign" depending on the alternative. AFAICS
I don't remember whether there were any discussions on this, or whether anyone would rely on the statistic.

@argriffing
Contributor

It'd be nice to have one other pair of eyeballs on this

I tend to not use the functions defined in the stats.py file, but I just github-code-searched 'mannwhitneyu' and found BlueBrain/NeuroM#220 (comment). Is that somehow relevant? Also biocore/scikit-bio#1258 (comment).

@josef-pkt josef-pkt referenced this pull request in biocore/scikit-bio Apr 27, 2016
Closed

ANCOM unit test failures with scipy 0.17.0 #1259

@josef-pkt
Member

BlueBrain wasn't reading the docstring. It's not a bug if it's weird but documented how to interpret it.
I added a comment to the scikit-bio issue to point them to this PR and the related issue.

@tavinathanson
Contributor

I've updated the code to return u2, per @josef-pkt's comment. Note that this is not compatible with pre-0.17.0, so we'll have the following behavior:

Pre-0.17.0: always returns smallu statistic and p-value = two-sided / 2 by default.
0.17.0: always returns u2 statistic and p-value = two-sided by default.
0.17.1: always returns u2 statistic and p-value = two-sided / 2 by default.

Does that sound appropriate?

Note that #4933 currently returns smallu rather than u2: https://github.com/scipy/scipy/pull/4933/files#diff-cc307f50f4eb0f9917b55fdf1dbb9b8cR4426

I also merged the less and greater tests.

I'll rebase once we're settled on this.

@ev-br
Member
ev-br commented Apr 28, 2016

Naively, I'd suggest to keep the invariant of "Nove means whatever scipy 0.16.1 is doing, down to all the minute detail."

I've been proven wrong on this one though, so I won't press the point.

@ev-br
Member
ev-br commented Apr 28, 2016

And then make #4933 return the most useful output.

@tavinathanson
Contributor

We can return u = u2 if alternative is not None else smallu. I'm also open to that; it's certainly getting confusing, but confusion seems inevitable here. Anyone think that's the wrong approach? I'll make that change soon unless I hear otherwise.

@josef-pkt
Member

I have both opinions:

  • it's better to stick with complete backwards compatibility
  • smallu is pretty useless, and I strongly doubt that anyone made use of it

just pick one :)

(I'm low in making-up-my-mind capacity when I'm 50-50. I'm slow in several PRs in statsmodels because of it. Should we return one large or two small arrays? I have no idea which is better for future enhancments.)

@tavinathanson
Contributor

Whoops, cross-posted with @josef-pkt. I'll go ahead and make it fully backward compatible.

@josef-pkt
Member

stick with backwards compatibility

I think, smallu can be used as a statistic for permutation or bootstrap tests. I think it's monotonic with respect to the two sided alternative.

I just see @tavinathanson comment. Backwards compatibility in the None case sounds good, and return u2 if user chooses a "sensible" alternative

@josef-pkt
Member

actually that sounds good
we get complete backwards compatibility with None, and proper results with alternative.

In this case, we could even just deprecate None and its use as default.
If users specify an alternative, then a future switch will not affect them.

@tavinathanson
Contributor

@josef-pkt Are you suggesting that we add a deprecation warning for the None argument in this fix, or later?

I'm in favor of deprecating as soon as possible.

@josef-pkt
Member

Are you suggesting that we add a deprecation warning for the None argument in this fix?

I thought of mentioning it in the docstring as it already is, should be enough for now, given the uncertain future with #4933

However with the None, we have now a deprecation path just for None and a fix for the function instead of deprecating the entire function.
Adding a deprecation warning before #4933 in decided and implemented sounds too early, and signalling that we deprecate None when we actually will remove the function is misleading.
However, if we just deprecate None, i.e. recommend users to explicitly specify an alternative, then a deprecation warning would be already useful now, and signal to users that something "strange" is going on with mannwhitneyu.

@tavinathanson
Contributor

@ev-br What are your thoughts on deprecating the None argument? Otherwise, I believe I've addressed all other comments.

@ev-br ev-br and 1 other commented on an outdated diff Apr 30, 2016
scipy/stats/stats.py
Whether to get the p-value for the one-sided hypothesis ('less'
- or 'greater'), or for the two-sided hypothesis ('two-sided', is
- the default)
+ or 'greater'), or for the two-sided hypothesis ('two-sided').
+ Defaults to None, which results in a p-value half the size of
+ the 'two-sided' p-value and a different U statistic. The default
+ behavior is not the same as using 'less' or 'greater', is not
+ recommended, and will soon be deprecated.
@ev-br
ev-br Apr 30, 2016 Member

I'd recommend to avoid making definite statements about the future like this :-). Maybe better state that it only exists for backwards compatibility and is not recommended for use in any new code.

@tavinathanson
tavinathanson May 1, 2016 Contributor

Ah, yes, that makes sense. Given that I just included a DeprecationWarning, I needed to change this wording anyway.

@ev-br ev-br commented on an outdated diff Apr 30, 2016
scipy/stats/stats.py
Returns
-------
statistic : float
- The Mann-Whitney statistics.
+ The Mann-Whitney U statistic, equal to min(U for x, U for y) if
+ `alternative` is equal to None (not recommended; to be deprecated),
+ and U for y otherwise.
@ev-br
Member
ev-br commented Apr 30, 2016

I'm more or less on the fence on whether alternative=None should or should not emit DeprecationWarnings, as long as it's deprecated in the docs. I am still optimistic that gh-4933 will get finished (BTW, @tavinathanson , maybe it could be you?), in which case we likely want to ask users to adjust only once.

@tavinathanson
Contributor

@ev-br I agree that it's non-ideal for users to adjust multiple times. However, given the changes in this function between pre-0.17.0, 0.17.0 and 0.17.1, a DeprecationWarning for the None argument would be very helpful in alerting people to the fact that their analysis results might have changed under their noses. Many people probably use mannwhitneyu deep in their analyses, and would never look at the documentation otherwise. I think the annoyance of adjusting twice is less important than notifying users that issues exist.

I updated the code with DeprecationWarning for now; let me know what you think.

And yeah, I'd be happy to help out with #4933.

@ev-br
Member
ev-br commented May 1, 2016 edited

LGTM.
I'll wait for a couple of days in case there are more comments. Otherwise, I'm going to merge this and wrap up the release.

@ev-br
Member
ev-br commented May 1, 2016

And yeah, I'd be happy to help out with #4933.

That'd be superb!

@tavinathanson tavinathanson commented on an outdated diff May 1, 2016
scipy/stats/stats.py
- return MannwhitneyuResult(u2, distributions.norm.sf(z) * fact2)
+ u = u2
+ # This behavior will soon be deprecated.
@tavinathanson
tavinathanson May 1, 2016 Contributor

Whoops, this line still needs to be updated.

@tavinathanson tavinathanson BUG: Fix mannwhitneyu to be backward compatible
b9c60fd
@josef-pkt
Member

Looks fine to me, no more comments from my side.

@ev-br
Member
ev-br commented May 5, 2016

OK, let's put this in. Thank you @tavinathanson, Josef, Alex.

@ev-br ev-br merged commit 2cc3eb3 into scipy:master May 5, 2016

2 checks passed

codecov/project 78.24% (target 1.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment