ENH: Addition of Poisson Means Test (E-test). #14840

swallan · 2021-10-11T08:08:51Z

This PR builds on gh-10561 and implements several remaining comments.

There have been numerous requests on the broader internet for an easily accessible and fast implementation of the "E-test", or the poisson means test. The publishing paper is somewhat popular but there are few, if any, implementations other than the original authors' Fortran. This Fortran has some noted faults, which this implementation mends.

The name poisson_means_test is somewhat vague, but the idea is that while the current implementation is for the E-test, a method parameter could be added that allows the user to specify if they want E-test or C-test, where the default is E-test so it maintains backwards compatibility.

It's a bit more complex to review since this builds off of a relatively stale PR, but I've done my best to address the common concerns in that PR. I have not removed the original contributors commits since I don't want to abstract the credit away, but for merging I think this will be a squash into one merge.

#10561 (comment)

avoid the use of np.divide(..., where)/np.multiply(..., where) in favor of indexing as needed and using the usual operators

This has been done with indexing.

We should also only compute the PMFs where the indicator is actually true

I agree that this would be ideal, but the likeliehood that any one of the PMFs is not used seems somewhat low. If the two arrays that result from a pmf calculation are

x1 = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x2 = array([[0],
            [1],
            [2],
            [3],
            [4],
            [5],
            [6],
            [7],
            [8],
            [9]])

and then they are broadcast together, the indicator does not usually take out entire rows or columns (from my brief testing, not proof based) so I don't think that trying to check if certain rows or columns don't need to be calculated.

Switching the arguments has the downside of changing the test statistic

I agree. Instead of switching the arguments I limited it to swapping out the indicator.

rename them to n1 and n2. If we were to change count1 and count2 to k1 and k2,

Being able to compare to the paper is important, so I think this is a good choice.

n1 and n2 be floats?

I think it is reasonable that these can be floats. #10561 (comment) They are a "count" in some sense, but as long as the unit of measure is maintains across both "counts" then it is fine. However there seems to be some interesting results when you continue to make the "counts" smaller.

stats.poisson_means_test(0, 125.0, 3, 125.0)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 12.5, 3, 12.5)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 1.25, 3, 1.25)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.125, 3, 0.125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.0125, 3, 0.0125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.032362623816529226)

stats.poisson_means_test(0, 0.00125, 3, 0.00125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.000125, 3, 0.000125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

For some reason the p-value is different when the sample size is 0.0125. Kind of perplexing, could be safer just to leave it as an int. If the argument is valid that the unit doesn't matter, the user should be convert their data to an integer in order for the test to work. The author uses sample "size" which is ambiguous, but I think to be conservative we could leave it as an integer input in case floats are indeed not allowed. I definitely think that k1 and k2 (the observed values) are intended to be integers because they are described as counts.

Additional information

I expect there to be one test failure relating to a division error.

It was previously suggested that

Other, unrelated notes that I would put in the code, but they'd disappear once mdmahendri#1 merges: we need to compute the statistic and p-value even if lmbd_hat2 <= 0.

I suspect the reason that the authors indicated that you do not need to calculate the p-value in this scenario is that it leads to ZeroDivisionErrors down the road. Often (but not always) the variance can be 0, which is a common denominator in divisions. I'm still thinking about the best way to put this into action.

Also -- I see the lint errors. My pep8 checker locally just doesn't work I guess. Will mend on next push.

- add example detais and comparison with t-test - change function name to be more expressive - let argument names to be similar with t-test - return a namedtuple similar to other tests in scipy - adding clarity how hypothesis appear in test - change how statistic val returned

tests to - check for type and value errors - check against examples from the publishing paper

Co-authored-by: Matt Haberland <mdhaber@mit.edu>

Merged swallan/poisson-e-test into mdmahendri/master

tupui

Thanks @swallan. Very clear PR with nice comments in the code! I have a few suggestions.

scipy/stats/_hypotests.py

scipy/stats/tests/test_hypotests.py

- make args type float

scipy/stats/_hypotests.py

swallan · 2022-04-05T06:13:47Z

To sum up what is going on with this PR:

There are no external implementations of this method outside of a fortran method that has noted faults. If there are other implementations, they are well hidden.
there is some ambiguity relating to what the input types should allow. I think that is appropriate that the inputs n1 and n1 any real value, and that k1 and k2 should be integers. See comment ENH: add poisson two sample test (E-test) to scipy.stats #10561 (comment) for discussion. To sum it up, ns are "observations" in the sense that they are an observed amount. It could be 100 grams or .1kg. However ks should be integers since the poisson distribution handles discrete values. In the post describing the PR I show how the units can be changed and the results are the same.

mdhaber · 2022-04-05T23:25:58Z

Thanks @swallan. LMK when you've summarized the agreement between this implementation and the Fortran code and are ready for me to take a look at this PR again.

swallan · 2022-04-09T21:36:42Z

I took a look at the results in test_fortran_authors and see that the maximum relative difference for the p-values was 5.84296954e-06.

It was for this value:

0.008053640402974236  # fortran
0.008053687460149835  # scipy

so a difference in the 8th decimal place. Even for a small value I think that is pretty good.

The PMF for poisson is relatively simple to evaluate, so differences from fortran aren't likely caused by any inaccuracies in Scipy.

SciPy's poisson PMF utilizes the logPMF, which I assume was done to get better stability. It was done a long time ago so I can't immediately see why this edit was made, though.

- also maint on names in error tests

mdhaber · 2022-04-09T22:38:32Z

Great. Are you happy with this then? Shall I review more carefully?

swallan · 2022-04-10T00:11:06Z

Yes please!

scipy/stats/_hypotests.py

scipy/stats/__init__.py

scipy/stats/_hypotests.py

mdhaber · 2022-05-02T00:57:16Z

@swallan would you send an email to the mailing list asking whether we should add this test even though there's already a version in statsmodels?

@chrisb83 @josef-pkt what do you think? @josef-pkt I ask you from the statsmodels perspective, and @chrisb83 I ask you because you were involved in the original PR #10561.

josef-pkt · 2022-05-02T12:07:02Z

Update from statsmodels:

statsmodels has currently only the test for the ratio of poisson, not for the difference of 2 poisson rates.

I spent the last month getting almost everything for one and two-sample Poisson
statsmodels/statsmodels#8166

which includes now etest for ratio and diff, with either score or wald test statistic.
I skipped the original Krishnamoorthy and Thomson moment estimate (at least for now) since there is no advantage to it compared to score, AFAICS. (I might still add it for completeness)
The score constrained MLE and pvalue also also explicit, analytical expression and don't need numerical rootfinding as does the score confidence interval.

chrisb83 · 2022-05-03T06:39:12Z

I think the test would fit well into scipy.stats. I personally do not mind the duplication.

mdhaber · 2022-05-03T10:41:11Z

Thanks @josef-pkt @chrisb83.
@swallan after addressing the review, please send an email to the mailing list inviting feedback. Please mention that statsmodels may soon offer enhanced functionality for this test, but that we are considering adding this (the basics) in SciPy nonetheless.

- capitalize Poisson - move latex render to notes - clarify that diff has a default, not optional - write out alternatives in english - clarify dodder (weed) seed - * to protect after positional args / keyword

mdhaber · 2022-06-01T13:30:33Z

Email sent 5/3.

…d tests

scipy/stats/_hypotests.py

mdhaber · 2022-08-01T04:22:22Z

scipy/stats/_hypotests.py

+    # this is the 'pivot statistic' for use in the indicator of the summation
+    # (left side of "I[.]"). Before dividing, mask zero-elements in the
+    # denominator with infinity so that they are `false` in the indicator.
+    if alternative == 'two-sided':
+        mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
+    elif alternative == 'less':
+        mask_out_invalid = lmbds_diff < 0
+    else:
+        mask_out_invalid = lmbds_diff > 0
+
+    var_x1x2[~mask_out_invalid] = np.inf


@swallan I am working on getting this merged, but I need a little bit of your time to finish it. I think I only have one question - where did this come from? I don't see this in the paper, and without it, we get the symmetry property I was expecting. I see that removing it makes one of the test_fortran_authors tests fail, but my guess is that the test (perhaps the authors' code) is wrong about this. Also, it looks like statsmodels really has this test now (here), and I'm not seeing this sort of condition there.

I'm pretty confident that this mask_out_invalid stuff was not intended.

If we perform the test under the null hypothesis, the p-value of the test should be uniformly distributed on the interval [0, 1] (i.e. we should get p-value < 0.01 about 1% of the time, 0.05 about 5% of the time, etc.). We can test this with an experiment that repeatedly samples from two Poisson distributions with $\lambda_1 = \lambda_2 + d$, where $d$ is the hypothesized difference. Here, I do that and I plot the empirical CDF of the observed p-values, which we expect to be a straight line with slope 1 if the p-values follow a uniform distribution.

For instance, with mask_out_invalid, 1130 of the 10000 p-values were <0.05 when the null hypothesis was true, whereas without the mask_out_invalid code, 476 of the 10000 p-values were <0.05 when the null hypothesis was true. So without the mask_out_invalid code, the test controls the Type I error rate, but with the code, the test does not. I'll go ahead and remove it.

import matplotlib.pyplot as plt import numpy as np from scipy import stats N = 10000 n1 = 1000 n2 = 1000 diff = 0.01 lam2 = 0.01 lam1 = lam2 + diff dist1 = stats.poisson(n1*lam1) dist2 = stats.poisson(n2*lam2) pvals = [] for i in range(N): k1 = dist1.rvs() k2 = dist2.rvs() res = stats.poisson_means_test(k1, n1, k2, n2, diff=diff) pvals.append(res.pvalue) pvals = np.sort(pvals) ecdf = np.linspace(0, 1, N+1) plt.plot(ecdf, ecdf) plt.plot(pvals, ecdf[1:]) plt.axis('square')

@tupui @Micky774 thought you'd like this based on our conversation at SciPy : )

Suggested change

# this is the 'pivot statistic' for use in the indicator of the summation

# (left side of "I[.]"). Before dividing, mask zero-elements in the

# denominator with infinity so that they are `false` in the indicator.

if alternative == 'two-sided':

mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff

elif alternative == 'less':

mask_out_invalid = lmbds_diff < 0

else:

mask_out_invalid = lmbds_diff > 0

var_x1x2[~mask_out_invalid] = np.inf

# this is the 'pivot statistic' for use in the indicator of the summation

# (left side of "I[.]").

@tupui @Micky774 thought you'd like this based on our conversation at SciPy : )

Oh this is fantastic, thanks for pinging me on this :). I'm excited to show my students an "in the wild" use case of this concept.

In the initial submission, there was this same calculation using np.divide with a where:

swallan@a3f3a14#diff-6ba7fcf5a871c6742dd082e5fc1801ef3ce2126456b052e9d0a126e70aab5f99L351-L354 (line 352)

with the description

# exclude the same cases that will be excluded in the indicator in the # following `np.multiply` call to avoid invalid divisions. In these # for 'skipped' indices, the result is zero, as seen in the `out=`

So that's where it came from. As to whether it is necessary, I don't know yet. I'm looking at the paper and the implementation to be sure.

I see. OK, well take a look to see if you can find it. If you're also not sure where it's coming from, go ahead and commit the suggestions. If you're happy with the other changes, I'll go ahead and merge!

mdhaber · 2022-08-01T04:30:26Z

scipy/stats/_hypotests.py

+
+    # multiply all combinations of the products together, exclude terms
+    # based on the `indicator` and then sum. (3.5)
+    pvalue = np.sum((prob_x1 * prob_x2)[indicator])


Enhancement idea to study at some point: we might be able to do a little better here by moving the calculation to the log-space, i.e.

Suggested change

pvalue = np.sum((prob_x1 * prob_x2)[indicator])

pvalue = np.exp(special.logsumexp((logprob_x1 + logprob_x2)[indicator]))

mdhaber

With these changes, this LGTM.

mdhaber · 2022-08-01T16:43:31Z

scipy/stats/_hypotests.py

+    # this is the 'pivot statistic' for use in the indicator of the summation
+    # (left side of "I[.]"). Before dividing, mask zero-elements in the
+    # denominator with infinity so that they are `false` in the indicator.
+    if alternative == 'two-sided':
+        mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
+    elif alternative == 'less':
+        mask_out_invalid = lmbds_diff < 0
+    else:
+        mask_out_invalid = lmbds_diff > 0
+
+    var_x1x2[~mask_out_invalid] = np.inf


Suggested change

# this is the 'pivot statistic' for use in the indicator of the summation

# (left side of "I[.]"). Before dividing, mask zero-elements in the

# denominator with infinity so that they are `false` in the indicator.

if alternative == 'two-sided':

mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff

elif alternative == 'less':

mask_out_invalid = lmbds_diff < 0

else:

mask_out_invalid = lmbds_diff > 0

var_x1x2[~mask_out_invalid] = np.inf

# this is the 'pivot statistic' for use in the indicator of the summation

# (left side of "I[.]").

scipy/stats/_hypotests.py

scipy/stats/tests/test_hypotests.py

mdhaber · 2022-08-01T16:50:18Z

scipy/stats/_hypotests.py

+    # `[indicator]` implements the "I[.] ... the indicator function" per
+    # the paragraph following equation (3.5).
+    if alternative == 'two-sided':
+        indicator = np.abs(t_x1x2) >= np.abs(t_k1k2)


I noticed when I checkout out the statsmodels code that there is a tolerance in this comparison, e.g.

Suggested change

indicator = np.abs(t_x1x2) >= np.abs(t_k1k2)

indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) - eps

because numerical error could make the comparison False even though it should be True. We've put that sort of thing in a few places (e.g. permutation test). Without it, the p-values can be a little too low; with it, the worst it will do is make the test a little conservative.
We can leave this as a follow-up, since it's unclear what the tolerance should be.

scipy/stats/_hypotests.py

tupui

I went through the recent changes and my open discussions again and LGTM yes.

…cipygh-14840

mdmahendri and others added 26 commits October 10, 2019 12:41

ENH: add poisson two sample test (E-test) to scipy.stats

34053b9

add more args value check

e49a2d8

move alternative args check before calculation

08ad915

Merge branch 'master' into master

0a5bdc8

MAINT: stats: remove inadvertent additions from merge

2e1c7cb

MAINT: stats: remove accidental whitespace

ef40a81

Merge remote-tracking branch 'scipy/master' into poisson-e-test

d0d805f

DOC: Add comments to follow publishing paper

63efccc

ENH: dataclass not namedtuple result

8beb185

TST: add'l tests, error checking

31fedfa

tests to - check for type and value errors - check against examples from the publishing paper

ENH: use broadcasting instead of repeating for summation

10ad68f

MAINT: slight refactoring and documentation updates

cd16f0c

TST: general cleanup and renaming, sourcing

b8d501c

TST: fortran comparisons

de3d4c0

Merge remote-tracking branch 'upstream/master'

de6b785

MAINT: stats: update code with the latest review

c267422

Merged swallan/poisson-e-test into mdmahendri/master

8210ae8

MAINT: stats: fix typo scipy/stats/_hypotests.py

a07ef26

Co-authored-by: Matt Haberland <mdhaber@mit.edu>

DOC: stats: fix mistakes from merge

548fe85

Merge pull request #1 from mdhaber/poisson_e_test

094820e

Merged swallan/poisson-e-test into mdmahendri/master

Merge remote-tracking branch 'mdmahredri/master' into poisson-e-test

2f79192

Merge remote-tracking branch 'scipy/master' into poisson-e-test

11c5e82

Merge remote-tracking branch 'scipy/master' into poisson-e-test

71c1aa5

ENH: refactoring and documentation fixes

a3f3a14

BUG: typo

024bb1c

github-actions bot added the scipy.stats label Oct 11, 2021

swallan mentioned this pull request Oct 11, 2021

ENH: add poisson two sample test (E-test) to scipy.stats #10561

Closed

tupui requested changes Oct 11, 2021

View reviewed changes

tupui added the enhancement A new feature or improvement label Oct 11, 2021

MAINT: update arg types, pep

16308b6

- make args type float

swallan commented Apr 5, 2022

View reviewed changes

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved

revert accidental change of wrong parameters

cc85ec9

TST: add rtol to poisson dist

687fcc3

- also maint on names in error tests

mdhaber reviewed Apr 10, 2022

View reviewed changes

MAINT: address review comments

7cbfa81

- capitalize Poisson - move latex render to notes - clarify that diff has a default, not optional - write out alternatives in english - clarify dodder (weed) seed - * to protect after positional args / keyword

tupui added this to the 1.10.0 milestone Jun 1, 2022

mdhaber requested a review from chrisb83 June 3, 2022 20:28

mdhaber added 3 commits July 31, 2022 14:42

Merge remote-tracking branch 'upstream/main' into poisson-e-test

1bc7e26

DOC: stats.poisson_means_test: revise docstring

9daf10c

MAINT: stats.poisson_means_test: refactor/improve input validation an…

f5c2d46

…d tests

mdhaber reviewed Jul 31, 2022

View reviewed changes

scipy/stats/_hypotests.py Show resolved Hide resolved

MAINT: stats: minor comment and style adjustments

1e589f7

mdhaber reviewed Aug 1, 2022

View reviewed changes

mdhaber approved these changes Aug 1, 2022

View reviewed changes

tupui reviewed Aug 2, 2022

View reviewed changes

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved

tupui approved these changes Aug 2, 2022

View reviewed changes

MAINT: stats.poisson_means_test: remove mask_out_invalid stuff, see s…

2cc98d3

…cipygh-14840

mdhaber merged commit 06f7854 into scipy:main Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Addition of Poisson Means Test (E-test). #14840

ENH: Addition of Poisson Means Test (E-test). #14840

swallan commented Oct 11, 2021 •

edited

Loading

tupui left a comment

swallan commented Apr 5, 2022

mdhaber commented Apr 5, 2022 •

edited

Loading

swallan commented Apr 9, 2022

mdhaber commented Apr 9, 2022

swallan commented Apr 10, 2022

mdhaber commented May 2, 2022

josef-pkt commented May 2, 2022

chrisb83 commented May 3, 2022

mdhaber commented May 3, 2022 •

edited

Loading

mdhaber commented Jun 1, 2022

mdhaber Aug 1, 2022 •

edited

Loading

mdhaber Aug 1, 2022

mdhaber Aug 1, 2022

Micky774 Aug 1, 2022 •

edited

Loading

swallan Aug 1, 2022 •

edited

Loading

mdhaber Aug 1, 2022

mdhaber Aug 1, 2022 •

edited

Loading

mdhaber left a comment

mdhaber Aug 1, 2022

mdhaber Aug 1, 2022 •

edited

Loading

tupui left a comment

	pvalue = np.sum((prob_x1 * prob_x2)[indicator])
	pvalue = np.exp(special.logsumexp((logprob_x1 + logprob_x2)[indicator]))

	indicator = np.abs(t_x1x2) >= np.abs(t_k1k2)
	indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) - eps

ENH: Addition of Poisson Means Test (E-test). #14840

ENH: Addition of Poisson Means Test (E-test). #14840

Conversation

swallan commented Oct 11, 2021 • edited Loading

Additional information

tupui left a comment

Choose a reason for hiding this comment

swallan commented Apr 5, 2022

mdhaber commented Apr 5, 2022 • edited Loading

swallan commented Apr 9, 2022

mdhaber commented Apr 9, 2022

swallan commented Apr 10, 2022

mdhaber commented May 2, 2022

josef-pkt commented May 2, 2022

chrisb83 commented May 3, 2022

mdhaber commented May 3, 2022 • edited Loading

mdhaber commented Jun 1, 2022

mdhaber Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

mdhaber Aug 1, 2022

Choose a reason for hiding this comment

mdhaber Aug 1, 2022

Choose a reason for hiding this comment

Micky774 Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

swallan Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

mdhaber Aug 1, 2022

Choose a reason for hiding this comment

mdhaber Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber Aug 1, 2022

Choose a reason for hiding this comment

mdhaber Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

swallan commented Oct 11, 2021 •

edited

Loading

mdhaber commented Apr 5, 2022 •

edited

Loading

mdhaber commented May 3, 2022 •

edited

Loading

mdhaber Aug 1, 2022 •

edited

Loading

Micky774 Aug 1, 2022 •

edited

Loading

swallan Aug 1, 2022 •

edited

Loading

mdhaber Aug 1, 2022 •

edited

Loading

mdhaber Aug 1, 2022 •

edited

Loading