-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Addition of Poisson Means Test (E-test). #14840
Conversation
- add example detais and comparison with t-test - change function name to be more expressive - let argument names to be similar with t-test - return a namedtuple similar to other tests in scipy - adding clarity how hypothesis appear in test - change how statistic val returned
tests to - check for type and value errors - check against examples from the publishing paper
Co-authored-by: Matt Haberland <mdhaber@mit.edu>
Merged swallan/poisson-e-test into mdmahendri/master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swallan. Very clear PR with nice comments in the code! I have a few suggestions.
- make args type float
To sum up what is going on with this PR:
|
Thanks @swallan. LMK when you've summarized the agreement between this implementation and the Fortran code and are ready for me to take a look at this PR again. |
- also maint on names in error tests
Great. Are you happy with this then? Shall I review more carefully? |
Yes please! |
@swallan would you send an email to the mailing list asking whether we should add this test even though there's already a version in statsmodels? @chrisb83 @josef-pkt what do you think? @josef-pkt I ask you from the statsmodels perspective, and @chrisb83 I ask you because you were involved in the original PR #10561. |
Update from statsmodels: statsmodels has currently only the test for the ratio of poisson, not for the difference of 2 poisson rates. I spent the last month getting almost everything for one and two-sample Poisson which includes now etest for ratio and diff, with either score or wald test statistic. |
I think the test would fit well into scipy.stats. I personally do not mind the duplication. |
Thanks @josef-pkt @chrisb83. |
- capitalize Poisson - move latex render to notes - clarify that diff has a default, not optional - write out alternatives in english - clarify dodder (weed) seed - * to protect after positional args / keyword
Email sent 5/3. |
scipy/stats/_hypotests.py
Outdated
# this is the 'pivot statistic' for use in the indicator of the summation | ||
# (left side of "I[.]"). Before dividing, mask zero-elements in the | ||
# denominator with infinity so that they are `false` in the indicator. | ||
if alternative == 'two-sided': | ||
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff | ||
elif alternative == 'less': | ||
mask_out_invalid = lmbds_diff < 0 | ||
else: | ||
mask_out_invalid = lmbds_diff > 0 | ||
|
||
var_x1x2[~mask_out_invalid] = np.inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swallan I am working on getting this merged, but I need a little bit of your time to finish it. I think I only have one question - where did this come from? I don't see this in the paper, and without it, we get the symmetry property I was expecting. I see that removing it makes one of the test_fortran_authors
tests fail, but my guess is that the test (perhaps the authors' code) is wrong about this. Also, it looks like statsmodels really has this test now (here), and I'm not seeing this sort of condition there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty confident that this mask_out_invalid
stuff was not intended.
If we perform the test under the null hypothesis, the p-value of the test should be uniformly distributed on the interval [0, 1] (i.e. we should get p-value < 0.01 about 1% of the time, 0.05 about 5% of the time, etc.). We can test this with an experiment that repeatedly samples from two Poisson distributions with
For instance, with mask_out_invalid
, 1130 of the 10000 p-values were <0.05 when the null hypothesis was true, whereas without the mask_out_invalid
code, 476 of the 10000 p-values were <0.05 when the null hypothesis was true. So without the mask_out_invalid
code, the test controls the Type I error rate, but with the code, the test does not. I'll go ahead and remove it.
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
N = 10000
n1 = 1000
n2 = 1000
diff = 0.01
lam2 = 0.01
lam1 = lam2 + diff
dist1 = stats.poisson(n1*lam1)
dist2 = stats.poisson(n2*lam2)
pvals = []
for i in range(N):
k1 = dist1.rvs()
k2 = dist2.rvs()
res = stats.poisson_means_test(k1, n1, k2, n2, diff=diff)
pvals.append(res.pvalue)
pvals = np.sort(pvals)
ecdf = np.linspace(0, 1, N+1)
plt.plot(ecdf, ecdf)
plt.plot(pvals, ecdf[1:])
plt.axis('square')
@tupui @Micky774 thought you'd like this based on our conversation at SciPy : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# this is the 'pivot statistic' for use in the indicator of the summation | |
# (left side of "I[.]"). Before dividing, mask zero-elements in the | |
# denominator with infinity so that they are `false` in the indicator. | |
if alternative == 'two-sided': | |
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff | |
elif alternative == 'less': | |
mask_out_invalid = lmbds_diff < 0 | |
else: | |
mask_out_invalid = lmbds_diff > 0 | |
var_x1x2[~mask_out_invalid] = np.inf | |
# this is the 'pivot statistic' for use in the indicator of the summation | |
# (left side of "I[.]"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the initial submission, there was this same calculation using np.divide
with a where
:
swallan@a3f3a14#diff-6ba7fcf5a871c6742dd082e5fc1801ef3ce2126456b052e9d0a126e70aab5f99L351-L354 (line 352)
with the description
# exclude the same cases that will be excluded in the indicator in the # following `np.multiply` call to avoid invalid divisions. In these # for 'skipped' indices, the result is zero, as seen in the `out=`
So that's where it came from. As to whether it is necessary, I don't know yet. I'm looking at the paper and the implementation to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. OK, well take a look to see if you can find it. If you're also not sure where it's coming from, go ahead and commit the suggestions. If you're happy with the other changes, I'll go ahead and merge!
|
||
# multiply all combinations of the products together, exclude terms | ||
# based on the `indicator` and then sum. (3.5) | ||
pvalue = np.sum((prob_x1 * prob_x2)[indicator]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhancement idea to study at some point: we might be able to do a little better here by moving the calculation to the log-space, i.e.
pvalue = np.sum((prob_x1 * prob_x2)[indicator]) | |
pvalue = np.exp(special.logsumexp((logprob_x1 + logprob_x2)[indicator])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, this LGTM.
scipy/stats/_hypotests.py
Outdated
# this is the 'pivot statistic' for use in the indicator of the summation | ||
# (left side of "I[.]"). Before dividing, mask zero-elements in the | ||
# denominator with infinity so that they are `false` in the indicator. | ||
if alternative == 'two-sided': | ||
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff | ||
elif alternative == 'less': | ||
mask_out_invalid = lmbds_diff < 0 | ||
else: | ||
mask_out_invalid = lmbds_diff > 0 | ||
|
||
var_x1x2[~mask_out_invalid] = np.inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# this is the 'pivot statistic' for use in the indicator of the summation | |
# (left side of "I[.]"). Before dividing, mask zero-elements in the | |
# denominator with infinity so that they are `false` in the indicator. | |
if alternative == 'two-sided': | |
mask_out_invalid = np.abs(lmbd_x1 - lmbd_x2) > diff | |
elif alternative == 'less': | |
mask_out_invalid = lmbds_diff < 0 | |
else: | |
mask_out_invalid = lmbds_diff > 0 | |
var_x1x2[~mask_out_invalid] = np.inf | |
# this is the 'pivot statistic' for use in the indicator of the summation | |
# (left side of "I[.]"). |
# `[indicator]` implements the "I[.] ... the indicator function" per | ||
# the paragraph following equation (3.5). | ||
if alternative == 'two-sided': | ||
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed when I checkout out the statsmodels code that there is a tolerance in this comparison, e.g.
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) | |
indicator = np.abs(t_x1x2) >= np.abs(t_k1k2) - eps |
because numerical error could make the comparison False even though it should be True. We've put that sort of thing in a few places (e.g. permutation test). Without it, the p-values can be a little too low; with it, the worst it will do is make the test a little conservative.
We can leave this as a follow-up, since it's unclear what the tolerance should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the recent changes and my open discussions again and LGTM yes.
This PR builds on gh-10561 and implements several remaining comments.
There have been numerous requests on the broader internet for an easily accessible and fast implementation of the "E-test", or the poisson means test. The publishing paper is somewhat popular but there are few, if any, implementations other than the original authors' Fortran. This Fortran has some noted faults, which this implementation mends.
The name
poisson_means_test
is somewhat vague, but the idea is that while the current implementation is for the E-test, a method parameter could be added that allows the user to specify if they want E-test or C-test, where the default is E-test so it maintains backwards compatibility.It's a bit more complex to review since this builds off of a relatively stale PR, but I've done my best to address the common concerns in that PR. I have not removed the original contributors commits since I don't want to abstract the credit away, but for merging I think this will be a squash into one merge.
#10561 (comment)
This has been done with indexing.
I agree that this would be ideal, but the likeliehood that any one of the PMFs is not used seems somewhat low. If the two arrays that result from a
pmf
calculation areand then they are broadcast together, the indicator does not usually take out entire rows or columns (from my brief testing, not proof based) so I don't think that trying to check if certain rows or columns don't need to be calculated.
I agree. Instead of switching the arguments I limited it to swapping out the indicator.
Being able to compare to the paper is important, so I think this is a good choice.
I think it is reasonable that these can be floats. #10561 (comment) They are a "count" in some sense, but as long as the unit of measure is maintains across both "counts" then it is fine. However there seems to be some interesting results when you continue to make the "counts" smaller.
For some reason the p-value is different when the sample size is 0.0125. Kind of perplexing, could be safer just to leave it as an int. If the argument is valid that the unit doesn't matter, the user should be convert their data to an integer in order for the test to work. The author uses sample "size" which is ambiguous, but I think to be conservative we could leave it as an integer input in case floats are indeed not allowed. I definitely think that
k1
andk2
(the observed values) are intended to be integers because they are described as counts.Additional information
I expect there to be one test failure relating to a division error.
It was previously suggested that
I suspect the reason that the authors indicated that you do not need to calculate the p-value in this scenario is that it leads to ZeroDivisionErrors down the road. Often (but not always) the variance can be 0, which is a common denominator in divisions. I'm still thinking about the best way to put this into action.
Also -- I see the lint errors. My pep8 checker locally just doesn't work I guess. Will mend on next push.