-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Added Fisher's method and Stouffer's Z-score method #3109
Conversation
|
||
Parameters | ||
---------- | ||
x : array_like of p-values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably say p not x
Coverage remained the same when pulling 3454d735e9c9134a676a33d02d7362870cade4c5 on cancan101:fisher_test into c4314b0 on scipy:master. |
Okay. Issues should be resolved. |
bump? |
I realize it's been way too long, but nonetheless --- how about a bit more descriptive names? @josef-pkt |
I'll look at it today. |
These look like multiple comparison or multiple test correction methods http://en.wikipedia.org/wiki/Multiple_comparisons maybe these could be somehow categorized or named in a way that would reflect this? Right now they are in the docs section called "Inferential Stats" but maybe a new docs section should be added for these tests? Also bonferroni multiple test correction could be added to that hypothetical new section. |
@argriffing no, it's different Here we are combining several tests into a joint hypothesis, so we end up with one test. Wikipedia mentions some extension of Fisher's method to the correlated case. If we want to be open to these kind of enhancement, then we should choose more generic names
based on a quick look at the Wikipedia page and at https://en.wikipedia.org/wiki/Extensions_of_Fisher%27s_method the second name
|
return (Xsq, pval) | ||
|
||
|
||
def stouffers_method(p, w=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would spell out argument names for more informative signatures
pvalues, weights
Wittlock looks like a good reference http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2005.00917.x/abstract combining p_values cannot use the information that we might have unbalanced tests (tests with unequal sample sizes and noise variance, power) Brown mentions one-sided http://www.jstor.org/stable/2529826 (I didn't look at the paper) http://compute1.lsrc.duke.edu/softwares/MetaP/metap.php has two versions each (with/without trend, whatever that is) |
Looks good to me, overall cosmetic changes, names suggested change of function names for forward compatibility extensions look interesting, but not needed in this PR |
@josef-pkt So for the time being, what do you think about:
|
Yes that sounds good. I was misreading before the arguments for stouffer. I didn't look carefully enough and thought it takes z_values as argument instead of p_values. you still need the weights argument, that only applies for 'stouffer', and is ignored if 'fisher' |
at line 4194, shouldn't: |
@jancr Definitely. Good call. |
bump Would love to see this PR go through, and I'm happy to help out. |
@thenovices that would be useful. There's only a few minor issues to address it looks like. Would be good to hear from @cancan101 if he can finalize this PR, otherwise maybe you could do it. Rebasing the PR on latests master and addressing all comments from @josef-pkt and the one above on cdf/sf from @jancr should be enough to get this in a mergeable state. |
Does someone else want to take this PR over? I have been pretty swamped with some other stuff. |
Sure, I'm more than happy to take over. I'm traveling right now so I won't get to it for a few days. What's the best way to take over the PR? |
@thenovices I set it so that you can push to my repos (ie cancan101/scipy) so you should be able to push to this branch which will update the PR. |
Great, thanks for that. |
|
||
Notes | ||
----- | ||
Fisher's method (also known as Fisher's combined probability test) [1] uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about the references: I think ReST syntax needs a trailing underscore, like so [1]_. Best try building the docs locally..
The CI error is caused by Travis's impatience with tests running over 50 minutes. |
raise ValueError("pvalues and weights must be of the same size.") | ||
|
||
Zi = distributions.norm.isf(pvalues) | ||
Z = np.sum(weights * Zi) / ((np.sum(weights**2))**0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structural optimization
in this case I would put the Z
calculation inside the if weights is None
and else to avoid handling weights when there are None.
(It's not a complicated piece of code where we loose the overview if we have two code paths.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josef-pkt I'm not sure exactly what change you had in mind, but if the weights
arg is None
, and therefore treated as discrete uniform, then Z = np.sum(weights * Zi) / ((np.sum(weights**2))**0.5)
is not the same as np.mean(Zi)
. It would be something like np.sum(Zi) / np.sqrt(len(pvalues))
which is starting to be complicated to the point of 'losing overview' with two code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already have len(pvalues)
4 times, you could give it an name like k
or n
z = (weights * zi).sum() / np.sqrt(np.sum(weights**2))
else:
z = zi.sum() / np.sqrt(k)
looks readable to me
asarray
is missing for this
needs raise
if not 1-D, because numbers would be wrong for 2d, or maybe it would raise at a uninformative point
could make it vectorized along axis=0, I guess
Thanks for all your comments and suggestions, everyone. I've incorporated them in my latest commits here. |
bump |
5fbb1a9
to
f5ce988
Compare
Bump -- just rebased. |
@josef-pkt or someone else -- could we get this merged in soon so this PR gets wrapped up? |
looks good to me (assuming the reference numbers are correct) the doc string formatting for enumerating 'fisher' and 'stouffer' in a list might not be correct |
@ev-br @argriffing @rgommers bump Also, any idea on the proper docstring formatting for a list of items? |
This is the proper docstring format for a list. https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/gofplots.py#L204 |
Thank you @jseabold. Fixed in the latest commit. |
Could you squash your commits please? I think it could/should be just two: one for each of you two co-authors of this feature. If you don't feel confident with |
Refactored Fisher's method and Stouffer's method into a single method called stats.combine_pvalues. Added documentation for this method, and added tests for weighted Stouffer's method. Minor naming and style fixes. Update THANKS.txt Update autosummary Use sf instead of 1 - cdf Fix reST reference syntax Coerce to ndarray, use more numpy functions Add ref for test examples; test using python lists Remove leftover text from git conflict Add proper docstring format for list items
Sure thing, squashed! |
ENH Added Fisher's method and Stouffer's Z-score method
OK, merged. Thanks @thenovices, @cancan101, all. Sorry it tool so long. |
Thanks!! |
Closes #3092