Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Addition of Poisson Means Test (E-test). #14840

Merged
merged 37 commits into from
Aug 2, 2022

Conversation

swallan
Copy link
Contributor

@swallan swallan commented Oct 11, 2021

This PR builds on gh-10561 and implements several remaining comments.

There have been numerous requests on the broader internet for an easily accessible and fast implementation of the "E-test", or the poisson means test. The publishing paper is somewhat popular but there are few, if any, implementations other than the original authors' Fortran. This Fortran has some noted faults, which this implementation mends.

The name poisson_means_test is somewhat vague, but the idea is that while the current implementation is for the E-test, a method parameter could be added that allows the user to specify if they want E-test or C-test, where the default is E-test so it maintains backwards compatibility.

It's a bit more complex to review since this builds off of a relatively stale PR, but I've done my best to address the common concerns in that PR. I have not removed the original contributors commits since I don't want to abstract the credit away, but for merging I think this will be a squash into one merge.

#10561 (comment)

avoid the use of np.divide(..., where)/np.multiply(..., where) in favor of indexing as needed and using the usual operators

This has been done with indexing.

We should also only compute the PMFs where the indicator is actually true

I agree that this would be ideal, but the likeliehood that any one of the PMFs is not used seems somewhat low. If the two arrays that result from a pmf calculation are

x1 = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x2 = array([[0],
            [1],
            [2],
            [3],
            [4],
            [5],
            [6],
            [7],
            [8],
            [9]])

and then they are broadcast together, the indicator does not usually take out entire rows or columns (from my brief testing, not proof based) so I don't think that trying to check if certain rows or columns don't need to be calculated.

Switching the arguments has the downside of changing the test statistic

I agree. Instead of switching the arguments I limited it to swapping out the indicator.

rename them to n1 and n2. If we were to change count1 and count2 to k1 and k2,

Being able to compare to the paper is important, so I think this is a good choice.

n1 and n2 be floats?

I think it is reasonable that these can be floats. #10561 (comment) They are a "count" in some sense, but as long as the unit of measure is maintains across both "counts" then it is fine. However there seems to be some interesting results when you continue to make the "counts" smaller.

stats.poisson_means_test(0, 125.0, 3, 125.0)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 12.5, 3, 12.5)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 1.25, 3, 1.25)
PoissonMeansTestResult(statistic=-1.7320508075688772, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.125, 3, 0.125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.0125, 3, 0.0125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.032362623816529226)

stats.poisson_means_test(0, 0.00125, 3, 0.00125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

stats.poisson_means_test(0, 0.000125, 3, 0.000125)
PoissonMeansTestResult(statistic=-1.7320508075688774, pvalue=0.08837900945483518)

For some reason the p-value is different when the sample size is 0.0125. Kind of perplexing, could be safer just to leave it as an int. If the argument is valid that the unit doesn't matter, the user should be convert their data to an integer in order for the test to work. The author uses sample "size" which is ambiguous, but I think to be conservative we could leave it as an integer input in case floats are indeed not allowed. I definitely think that k1 and k2 (the observed values) are intended to be integers because they are described as counts.

Additional information

I expect there to be one test failure relating to a division error.

It was previously suggested that

Other, unrelated notes that I would put in the code, but they'd disappear once mdmahendri#1 merges: we need to compute the statistic and p-value even if lmbd_hat2 <= 0.

I suspect the reason that the authors indicated that you do not need to calculate the p-value in this scenario is that it leads to ZeroDivisionErrors down the road. Often (but not always) the variance can be 0, which is a common denominator in divisions. I'm still thinking about the best way to put this into action.

Also -- I see the lint errors. My pep8 checker locally just doesn't work I guess. Will mend on next push.

mdmahendri and others added 26 commits October 10, 2019 12:41
- add example detais and comparison with t-test
- change function name to be more expressive
- let argument names to be similar with t-test
- return a namedtuple similar to other tests in scipy
- adding clarity how hypothesis appear in test
- change how statistic val returned
tests to
- check for type and value errors
- check against examples from the publishing paper
Co-authored-by: Matt Haberland <mdhaber@mit.edu>
Merged swallan/poisson-e-test into mdmahendri/master
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swallan. Very clear PR with nice comments in the code! I have a few suggestions.

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_hypotests.py Outdated Show resolved Hide resolved
@tupui tupui added the enhancement A new feature or improvement label Oct 11, 2021
- make args type float
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
@swallan
Copy link
Contributor Author

swallan commented Apr 5, 2022

To sum up what is going on with this PR:

  • There are no external implementations of this method outside of a fortran method that has noted faults. If there are other implementations, they are well hidden.
  • there is some ambiguity relating to what the input types should allow. I think that is appropriate that the inputs n1 and n1 any real value, and that k1 and k2 should be integers. See comment ENH: add poisson two sample test (E-test) to scipy.stats #10561 (comment) for discussion. To sum it up, ns are "observations" in the sense that they are an observed amount. It could be 100 grams or .1kg. However ks should be integers since the poisson distribution handles discrete values. In the post describing the PR I show how the units can be changed and the results are the same.

@mdhaber
Copy link
Contributor

mdhaber commented Apr 5, 2022

Thanks @swallan. LMK when you've summarized the agreement between this implementation and the Fortran code and are ready for me to take a look at this PR again.

@swallan
Copy link
Contributor Author

swallan commented Apr 9, 2022

I took a look at the results in test_fortran_authors and see that the maximum relative difference for the p-values was 5.84296954e-06.

It was for this value:

0.008053640402974236  # fortran
0.008053687460149835  # scipy

so a difference in the 8th decimal place. Even for a small value I think that is pretty good.

The PMF for poisson is relatively simple to evaluate, so differences from fortran aren't likely caused by any inaccuracies in Scipy.
Screen Shot 2022-04-09 at 2 27 06 PM
SciPy's poisson PMF utilizes the logPMF, which I assume was done to get better stability. It was done a long time ago so I can't immediately see why this edit was made, though.
Screen Shot 2022-04-09 at 2 31 09 PM

- also maint on names in error tests
@mdhaber
Copy link
Contributor

mdhaber commented Apr 9, 2022

Great. Are you happy with this then? Shall I review more carefully?

@swallan
Copy link
Contributor Author

swallan commented Apr 10, 2022

Yes please!

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/__init__.py Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor

mdhaber commented May 2, 2022

@swallan would you send an email to the mailing list asking whether we should add this test even though there's already a version in statsmodels?

@chrisb83 @josef-pkt what do you think? @josef-pkt I ask you from the statsmodels perspective, and @chrisb83 I ask you because you were involved in the original PR #10561.

@josef-pkt
Copy link
Member

Update from statsmodels:

statsmodels has currently only the test for the ratio of poisson, not for the difference of 2 poisson rates.

I spent the last month getting almost everything for one and two-sample Poisson
statsmodels/statsmodels#8166

which includes now etest for ratio and diff, with either score or wald test statistic.
I skipped the original Krishnamoorthy and Thomson moment estimate (at least for now) since there is no advantage to it compared to score, AFAICS. (I might still add it for completeness)
The score constrained MLE and pvalue also also explicit, analytical expression and don't need numerical rootfinding as does the score confidence interval.

@chrisb83
Copy link
Member

chrisb83 commented May 3, 2022

I think the test would fit well into scipy.stats. I personally do not mind the duplication.

@mdhaber
Copy link
Contributor

mdhaber commented May 3, 2022

Thanks @josef-pkt @chrisb83.
@swallan after addressing the review, please send an email to the mailing list inviting feedback. Please mention that statsmodels may soon offer enhanced functionality for this test, but that we are considering adding this (the basics) in SciPy nonetheless.

- capitalize Poisson
- move latex render to notes
- clarify that diff has a default, not optional
- write out alternatives in english
- clarify dodder (weed) seed
- * to protect after positional args / keyword
@mdhaber
Copy link
Contributor

mdhaber commented Jun 1, 2022

Email sent 5/3.

@tupui tupui added this to the 1.10.0 milestone Jun 1, 2022
@mdhaber mdhaber requested a review from chrisb83 June 3, 2022 20:28
Comment on lines 314 to 324
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]"). Before dividing, mask zero-elements in the
# denominator with infinity so that they are `false` in the indicator.
if alternative == 'two-sided':
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
elif alternative == 'less':
mask_out_invalid = lmbds_diff < 0
else:
mask_out_invalid = lmbds_diff > 0

var_x1x2[~mask_out_invalid] = np.inf
Copy link
Contributor

@mdhaber mdhaber Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swallan I am working on getting this merged, but I need a little bit of your time to finish it. I think I only have one question - where did this come from? I don't see this in the paper, and without it, we get the symmetry property I was expecting. I see that removing it makes one of the test_fortran_authors tests fail, but my guess is that the test (perhaps the authors' code) is wrong about this. Also, it looks like statsmodels really has this test now (here), and I'm not seeing this sort of condition there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confident that this mask_out_invalid stuff was not intended.

If we perform the test under the null hypothesis, the p-value of the test should be uniformly distributed on the interval [0, 1] (i.e. we should get p-value < 0.01 about 1% of the time, 0.05 about 5% of the time, etc.). We can test this with an experiment that repeatedly samples from two Poisson distributions with $\lambda_1 = \lambda_2 + d$, where $d$ is the hypothesized difference. Here, I do that and I plot the empirical CDF of the observed p-values, which we expect to be a straight line with slope 1 if the p-values follow a uniform distribution.

image

For instance, with mask_out_invalid, 1130 of the 10000 p-values were <0.05 when the null hypothesis was true, whereas without the mask_out_invalid code, 476 of the 10000 p-values were <0.05 when the null hypothesis was true. So without the mask_out_invalid code, the test controls the Type I error rate, but with the code, the test does not. I'll go ahead and remove it.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

N = 10000

n1 = 1000
n2 = 1000

diff = 0.01
lam2 = 0.01
lam1 = lam2 + diff

dist1 = stats.poisson(n1*lam1)
dist2 = stats.poisson(n2*lam2)

pvals = []
for i in range(N):
    k1 = dist1.rvs()
    k2 = dist2.rvs()
    res = stats.poisson_means_test(k1, n1, k2, n2, diff=diff)
    pvals.append(res.pvalue)

pvals = np.sort(pvals)
ecdf = np.linspace(0, 1, N+1)
plt.plot(ecdf, ecdf)
plt.plot(pvals, ecdf[1:])
plt.axis('square')

@tupui @Micky774 thought you'd like this based on our conversation at SciPy : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]"). Before dividing, mask zero-elements in the
# denominator with infinity so that they are `false` in the indicator.
if alternative == 'two-sided':
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
elif alternative == 'less':
mask_out_invalid = lmbds_diff < 0
else:
mask_out_invalid = lmbds_diff > 0
var_x1x2[~mask_out_invalid] = np.inf
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]").

Copy link
Contributor

@Micky774 Micky774 Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tupui @Micky774 thought you'd like this based on our conversation at SciPy : )

Oh this is fantastic, thanks for pinging me on this :). I'm excited to show my students an "in the wild" use case of this concept.

Copy link
Contributor Author

@swallan swallan Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial submission, there was this same calculation using np.divide with a where:

swallan@a3f3a14#diff-6ba7fcf5a871c6742dd082e5fc1801ef3ce2126456b052e9d0a126e70aab5f99L351-L354 (line 352)

with the description

    # exclude the same cases that will be excluded in the indicator in the
    # following `np.multiply` call to avoid invalid divisions. In these
    # for 'skipped' indices, the result is zero, as seen in the `out=`

So that's where it came from. As to whether it is necessary, I don't know yet. I'm looking at the paper and the implementation to be sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. OK, well take a look to see if you can find it. If you're also not sure where it's coming from, go ahead and commit the suggestions. If you're happy with the other changes, I'll go ahead and merge!


# multiply all combinations of the products together, exclude terms
# based on the `indicator` and then sum. (3.5)
pvalue = np.sum((prob_x1 * prob_x2)[indicator])
Copy link
Contributor

@mdhaber mdhaber Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhancement idea to study at some point: we might be able to do a little better here by moving the calculation to the log-space, i.e.

Suggested change
pvalue = np.sum((prob_x1 * prob_x2)[indicator])
pvalue = np.exp(special.logsumexp((logprob_x1 + logprob_x2)[indicator]))

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, this LGTM.

Comment on lines 314 to 324
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]"). Before dividing, mask zero-elements in the
# denominator with infinity so that they are `false` in the indicator.
if alternative == 'two-sided':
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
elif alternative == 'less':
mask_out_invalid = lmbds_diff < 0
else:
mask_out_invalid = lmbds_diff > 0

var_x1x2[~mask_out_invalid] = np.inf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]"). Before dividing, mask zero-elements in the
# denominator with infinity so that they are `false` in the indicator.
if alternative == 'two-sided':
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff
elif alternative == 'less':
mask_out_invalid = lmbds_diff < 0
else:
mask_out_invalid = lmbds_diff > 0
var_x1x2[~mask_out_invalid] = np.inf
# this is the 'pivot statistic' for use in the indicator of the summation
# (left side of "I[.]").

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_hypotests.py Outdated Show resolved Hide resolved
# `[indicator]` implements the "I[.] ... the indicator function" per
# the paragraph following equation (3.5).
if alternative == 'two-sided':
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2)
Copy link
Contributor

@mdhaber mdhaber Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed when I checkout out the statsmodels code that there is a tolerance in this comparison, e.g.

Suggested change
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2)
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) - eps

because numerical error could make the comparison False even though it should be True. We've put that sort of thing in a few places (e.g. permutation test). Without it, the p-values can be a little too low; with it, the worst it will do is make the test a little conservative.
We can leave this as a follow-up, since it's unclear what the tolerance should be.

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the recent changes and my open discussions again and LGTM yes.

@mdhaber mdhaber merged commit 06f7854 into scipy:main Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants