Add Mann-Whitney U test methods (bug 1395571) by robhudson · Pull Request #174 · mozilla/python_moztelemetry

robhudson · 2017-09-07T22:24:48Z

TODO:

Also return p-values, similar to scipy

coveralls · 2017-09-07T22:59:41Z

Coverage increased (+0.4%) to 78.84% when pulling 9d87447 on add-mann-whitney into 414f6f3 on master.

fbertsch

Great work, Rob!

fbertsch · 2017-09-08T03:49:29Z

+    SAMPLE2 = {1: 1, 2: 5, 3: 2, 4: 4, 5: 3}
+
+    # Basic test.
+    assert stats.mann_whitney_u(SAMPLE1, SAMPLE2) == 94.5


I think you should import scipy's mannwhitneyu test and compare its test statistic to yours. Could even have comparisons on some generated histograms.

This project doesn't currently depend on scipy. Is that an ok requirement to add?

I tried to simply add it to the setup.py similar to numpy but it requires a fortran compiler and failed. It'll be a bit more complicated to add this it seems?

robhudson · 2017-09-26T16:57:22Z

It was noted on IRC that I can borrow the method for calculating p-values from the lua code here:
https://github.com/mozilla-services/lua_sandbox_extensions/blob/d5d65adbe4e075825aa99f5abee01ffff93bcced/lsb/modules/lsb/stats.lua#L311-L319

coveralls · 2017-10-04T22:08:22Z

Coverage increased (+0.4%) to 79.157% when pulling 7876bcb on add-mann-whitney into bf624ef on master.

coveralls · 2017-10-10T16:56:39Z

Coverage increased (+1.01%) to 81.378% when pulling 9389509 on add-mann-whitney into 26e5deb on master.

fbertsch · 2017-10-12T18:04:08Z

+from collections import namedtuple
+
+
+def rank(sample):


This function will use the floor if the input counts are integers:

rank({1: 2, 2: 2, 3: 2, 4: 2}) == {1: 1, 2: 3, 3: 5, 4: 7} rank({1: 2.0, 2: 2.0, 3: 2.0, 4: 2.0}) == {1: 1.5, 2: 3.5, 3: 5.5, 4: 7.5}

Please convert them to doubles.

Please write a brief description of what this function is doing, and specify exactly what the rank will be for buckets with more than one count (i.e. rank is the median rank of all ranks in that bucket).

It will return doubles since I'm importing division from the __future__, which will be more Python 3 if this project ever moves to it. W/o that import you're correct (and living in the past).

heh, good point! I've gotta switch my local default to 3.6...

fbertsch · 2017-10-12T18:37:18Z

+    return ranks
+
+
+def tie_correct(sample):


Add a link to tie correction for MWU on wikipedia.

fbertsch · 2017-10-12T19:00:13Z

+    tie_correction = tie_correct(sample)
+    sd_u = math.sqrt(tie_correction * n1 * n2 * (n1 + n2 + 1) / 12.0)
+    mean_rank = n1 * n2 / 2.0 + 0.5 * use_continuity
+    z = abs((max(u1, u2) - mean_rank) / sd_u)


This should be the min(u1, u2), see here

In step 10 they're using U=32 to compute z, which is the max(Ua, Ub) in the example.

fbertsch · 2017-10-12T19:16:08Z

+    return tc
+
+
+def ndtr(v):


ndtr takes the integral from -infinity to v over the gaussian distribution. This is why the input below is -abs(z). Can we do two things here:

Add tests that this function returns the same value as scipy ndtr

Document what this function is doing, with a reference to the above link

I do like calculating this separately, rather than importing scipy.

fbertsch · 2017-10-12T19:18:39Z

+"""
+This module implements test coverage for the stats functions in stats.py.
+"""
+import itertools


Can we add more tests, with some different distributions? We could add, for example:

Normally distributed histograms

Skewed distributions

Uniformly distributed histograms

And combinations therein. We can create them randomly, and check that our result is within scipy's by some value.

robhudson · 2017-10-12T20:45:57Z

+        {1: 5, 2: 20, 3: 12, ...}
+
+    Returns the U statistic, equal to min(U for sample1, U for sample2).
+


Update comments to match code.

coveralls · 2017-10-13T22:38:15Z

Coverage increased (+1.05%) to 81.418% when pulling 5955828 on add-mann-whitney into 26e5deb on master.

robhudson · 2017-10-13T22:44:41Z

Updated with requested changes.

coveralls · 2017-10-13T22:50:45Z

Coverage increased (+1.05%) to 81.418% when pulling 9f1760a on add-mann-whitney into 26e5deb on master.

fbertsch · 2017-10-16T13:58:17Z

+
+    """
+    try:
+        a = float(a)


It is probably more pythonic to let this exception be raised. Perhaps just line 72 should be float(a) * sqrth. Alternatively, we could just remove validation entirely.

coveralls · 2017-10-16T15:25:16Z

Coverage increased (+1.0%) to 81.338% when pulling 44cd983 on add-mann-whitney into 26e5deb on master.

fbertsch · 2017-10-16T15:29:31Z

+
+
+def test_mann_whitney_u():
+    for sample in ('normalized', 'uniform', 'skewed'):


These tests are close to what I was thinking, but all of these are comparing the same distribution. I would change one of the distributions (for each of: normalized, uniform, and skewed) to be different (different mean, different std dev, different number of samples, etc.), and in addition add tests that compare e.g. normalized to uniform, uniform to skewed, etc.

coveralls · 2017-10-16T17:58:15Z

Coverage increased (+1.0%) to 81.338% when pulling b46e40a on add-mann-whitney into 26e5deb on master.

fbertsch

Frack yeah!

coveralls · 2017-10-16T18:28:00Z

Coverage increased (+1.0%) to 81.338% when pulling 7b704bb on add-mann-whitney into 26e5deb on master.

coveralls · 2017-10-16T19:41:34Z

Coverage increased (+1.0%) to 81.338% when pulling d426a26 on add-mann-whitney into 26e5deb on master.

fbertsch reviewed Sep 8, 2017

View reviewed changes

robhudson force-pushed the add-mann-whitney branch from 9d87447 to 7876bcb Compare October 4, 2017 22:01

mozilla deleted a comment from fbertsch Oct 4, 2017

robhudson force-pushed the add-mann-whitney branch 3 times, most recently from 73dccce to 9389509 Compare October 10, 2017 16:47

fbertsch suggested changes Oct 12, 2017

View reviewed changes

robhudson commented Oct 12, 2017

View reviewed changes

robhudson force-pushed the add-mann-whitney branch from 9389509 to 5955828 Compare October 13, 2017 22:30

robhudson force-pushed the add-mann-whitney branch from 5955828 to 9f1760a Compare October 13, 2017 22:44

fbertsch reviewed Oct 16, 2017

View reviewed changes

robhudson force-pushed the add-mann-whitney branch from 9f1760a to 44cd983 Compare October 16, 2017 15:17

fbertsch reviewed Oct 16, 2017

View reviewed changes

robhudson force-pushed the add-mann-whitney branch from 44cd983 to b46e40a Compare October 16, 2017 17:51

robhudson force-pushed the add-mann-whitney branch from b46e40a to 7b704bb Compare October 16, 2017 18:20

fbertsch approved these changes Oct 16, 2017

View reviewed changes

Add Mann-Whitney U test for comparing histograms (bug 1395571)

d426a26

robhudson force-pushed the add-mann-whitney branch from 7b704bb to d426a26 Compare October 16, 2017 19:32

robhudson merged commit 55c1646 into master Oct 16, 2017

		{1: 5, 2: 20, 3: 12, ...}

		Returns the U statistic, equal to min(U for sample1, U for sample2).



		def test_mann_whitney_u():
		for sample in ('normalized', 'uniform', 'skewed'):

Conversation

robhudson commented Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Sep 7, 2017

Uh oh!

fbertsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robhudson commented Sep 26, 2017

Uh oh!

coveralls commented Oct 4, 2017

Uh oh!

coveralls commented Oct 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 13, 2017

Uh oh!

robhudson commented Oct 13, 2017

Uh oh!

coveralls commented Oct 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 16, 2017

Uh oh!

fbertsch left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 16, 2017

Uh oh!

coveralls commented Oct 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robhudson commented Sep 7, 2017 •

edited

Loading