Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix mannwhitneyu to be backward compatible #6100

Merged
merged 1 commit into from May 5, 2016

Conversation

tavinathanson
Copy link
Contributor

@tavinathanson tavinathanson commented Apr 25, 2016

Per #6062, mannwhitneyu is no longer backward compatible as of 0.17.0. This PR is meant as a 0.17.1 fix before mannwhitneyu eventually gets deprecated in 0.18.

In this PR, all changes were meant to revert to pre-0.17.0 defaults:

  • Add alternative = None which reverts mannwhitneyu's behavior to pre-0.17.0. Default to None.
  • Fix the returned MannwhitneyuResult to always return the appropriate U statistic; it was returning u2 regardless of less or greater. (Edit: I'm told it should indeed return u2, and I updated it accordingly.)
  • Update tests accordingly.
  • Minor spelling fixes.

@tavinathanson tavinathanson force-pushed the mannwhitneyu_bugfix branch 2 times, most recently from 01ba5cd to 8018324 Compare April 25, 2016 15:55
@ev-br ev-br added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats maintenance Items related to regular maintenance tasks backport-candidate This fix should be ported by a maintainer to previous SciPy versions. and removed maintenance Items related to regular maintenance tasks labels Apr 25, 2016
@ev-br ev-br added this to the 0.17.1 milestone Apr 25, 2016
@ev-br
Copy link
Member

ev-br commented Apr 26, 2016

OK, I eyeballed the alternative=None code path and it does look to be in agreement with mannwhitneyu from version 0.16.1: https://github.com/scipy/scipy/blob/maintenance/0.16.x/scipy/stats/stats.py#L4265

It'd be nice to have one other pair of eyeballs on this, since this is the last issue blocking the release of 0.17.1.


assert_equal(p1, p2)
assert_equal(u1, 498)
assert_equal(u2, 102)
assert_equal(u2, 498)
assert_equal(u3, 102)
assert_approx_equal(p1, 0.999955905990004, significant=self.significant)

def test_mannwhitneyu_no_correct_greater(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me this test and the previous one could be merged because they overlap

(streamlining, minor point)

@josef-pkt
Copy link
Member

reading briefly through the changes.
pvalue looks good to me

For the statistic: returning u2 would be better because it is independent of the alternative. It's a statistic for the difference which should compare the sample but not flip "sign" depending on the alternative. AFAICS
I don't remember whether there were any discussions on this, or whether anyone would rely on the statistic.

@argriffing
Copy link
Contributor

It'd be nice to have one other pair of eyeballs on this

I tend to not use the functions defined in the stats.py file, but I just github-code-searched 'mannwhitneyu' and found BlueBrain/NeuroM#220 (comment). Is that somehow relevant? Also scikit-bio/scikit-bio#1258 (comment).

@josef-pkt
Copy link
Member

BlueBrain wasn't reading the docstring. It's not a bug if it's weird but documented how to interpret it.
I added a comment to the scikit-bio issue to point them to this PR and the related issue.

@tavinathanson
Copy link
Contributor Author

I've updated the code to return u2, per @josef-pkt's comment. Note that this is not compatible with pre-0.17.0, so we'll have the following behavior:

Pre-0.17.0: always returns smallu statistic and p-value = two-sided / 2 by default.
0.17.0: always returns u2 statistic and p-value = two-sided by default.
0.17.1: always returns u2 statistic and p-value = two-sided / 2 by default.

Does that sound appropriate?

Note that #4933 currently returns smallu rather than u2: https://github.com/scipy/scipy/pull/4933/files#diff-cc307f50f4eb0f9917b55fdf1dbb9b8cR4426

I also merged the less and greater tests.

I'll rebase once we're settled on this.

@ev-br
Copy link
Member

ev-br commented Apr 28, 2016

Naively, I'd suggest to keep the invariant of "Nove means whatever scipy 0.16.1 is doing, down to all the minute detail."

I've been proven wrong on this one though, so I won't press the point.

@ev-br
Copy link
Member

ev-br commented Apr 28, 2016

And then make #4933 return the most useful output.

@tavinathanson
Copy link
Contributor Author

We can return u = u2 if alternative is not None else smallu. I'm also open to that; it's certainly getting confusing, but confusion seems inevitable here. Anyone think that's the wrong approach? I'll make that change soon unless I hear otherwise.

@josef-pkt
Copy link
Member

I have both opinions:

  • it's better to stick with complete backwards compatibility
  • smallu is pretty useless, and I strongly doubt that anyone made use of it

just pick one :)

(I'm low in making-up-my-mind capacity when I'm 50-50. I'm slow in several PRs in statsmodels because of it. Should we return one large or two small arrays? I have no idea which is better for future enhancments.)

@tavinathanson
Copy link
Contributor Author

Whoops, cross-posted with @josef-pkt. I'll go ahead and make it fully backward compatible.

@josef-pkt
Copy link
Member

stick with backwards compatibility

I think, smallu can be used as a statistic for permutation or bootstrap tests. I think it's monotonic with respect to the two sided alternative.

I just see @tavinathanson comment. Backwards compatibility in the None case sounds good, and return u2 if user chooses a "sensible" alternative

@josef-pkt
Copy link
Member

actually that sounds good
we get complete backwards compatibility with None, and proper results with alternative.

In this case, we could even just deprecate None and its use as default.
If users specify an alternative, then a future switch will not affect them.

@tavinathanson
Copy link
Contributor Author

@josef-pkt Are you suggesting that we add a deprecation warning for the None argument in this fix, or later?

I'm in favor of deprecating as soon as possible.

@josef-pkt
Copy link
Member

Are you suggesting that we add a deprecation warning for the None argument in this fix?

I thought of mentioning it in the docstring as it already is, should be enough for now, given the uncertain future with #4933

However with the None, we have now a deprecation path just for None and a fix for the function instead of deprecating the entire function.
Adding a deprecation warning before #4933 in decided and implemented sounds too early, and signalling that we deprecate None when we actually will remove the function is misleading.
However, if we just deprecate None, i.e. recommend users to explicitly specify an alternative, then a deprecation warning would be already useful now, and signal to users that something "strange" is going on with mannwhitneyu.

@tavinathanson
Copy link
Contributor Author

@ev-br What are your thoughts on deprecating the None argument? Otherwise, I believe I've addressed all other comments.

Defaults to None, which results in a p-value half the size of
the 'two-sided' p-value and a different U statistic. The default
behavior is not the same as using 'less' or 'greater', is not
recommended, and will soon be deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to avoid making definite statements about the future like this :-). Maybe better state that it only exists for backwards compatibility and is not recommended for use in any new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that makes sense. Given that I just included a DeprecationWarning, I needed to change this wording anyway.

@ev-br
Copy link
Member

ev-br commented Apr 30, 2016

I'm more or less on the fence on whether alternative=None should or should not emit DeprecationWarnings, as long as it's deprecated in the docs. I am still optimistic that gh-4933 will get finished (BTW, @tavinathanson , maybe it could be you?), in which case we likely want to ask users to adjust only once.

@tavinathanson
Copy link
Contributor Author

@ev-br I agree that it's non-ideal for users to adjust multiple times. However, given the changes in this function between pre-0.17.0, 0.17.0 and 0.17.1, a DeprecationWarning for the None argument would be very helpful in alerting people to the fact that their analysis results might have changed under their noses. Many people probably use mannwhitneyu deep in their analyses, and would never look at the documentation otherwise. I think the annoyance of adjusting twice is less important than notifying users that issues exist.

I updated the code with DeprecationWarning for now; let me know what you think.

And yeah, I'd be happy to help out with #4933.

@ev-br
Copy link
Member

ev-br commented May 1, 2016

LGTM.
I'll wait for a couple of days in case there are more comments. Otherwise, I'm going to merge this and wrap up the release.

@ev-br
Copy link
Member

ev-br commented May 1, 2016

And yeah, I'd be happy to help out with #4933.

That'd be superb!


return MannwhitneyuResult(u2, distributions.norm.sf(z) * fact2)
u = u2
# This behavior will soon be deprecated.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, this line still needs to be updated.

@josef-pkt
Copy link
Member

Looks fine to me, no more comments from my side.

@ev-br
Copy link
Member

ev-br commented May 5, 2016

OK, let's put this in. Thank you @tavinathanson, Josef, Alex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants