ENH Added Fisher's method and Stouffer's Z-score method #3109

cancan101 · 2013-12-03T06:23:28Z

argriffing · 2013-12-03T06:42:15Z

scipy/stats/stats.py

+
+    Parameters
+    ----------
+    x : array_like of p-values


this should probably say p not x

coveralls · 2013-12-03T06:56:56Z

Coverage remained the same when pulling 3454d735e9c9134a676a33d02d7362870cade4c5 on cancan101:fisher_test into c4314b0 on scipy:master.

coveralls · 2013-12-04T03:27:47Z

Coverage remained the same when pulling a98cb69 on cancan101:fisher_test into 10d88c4 on scipy:master.

cancan101 · 2013-12-04T03:43:52Z

Okay. Issues should be resolved.

cancan101 · 2014-01-09T03:46:56Z

bump?

ev-br · 2014-01-09T11:02:46Z

I realize it's been way too long, but nonetheless --- how about a bit more descriptive names? @josef-pkt

josef-pkt · 2014-01-09T13:31:23Z

I'll look at it today.
Since I never heard of either, I need to read up a bit.

argriffing · 2014-01-09T14:07:05Z

These look like multiple comparison or multiple test correction methods http://en.wikipedia.org/wiki/Multiple_comparisons maybe these could be somehow categorized or named in a way that would reflect this? Right now they are in the docs section called "Inferential Stats" but maybe a new docs section should be added for these tests? Also bonferroni multiple test correction could be added to that hypothetical new section.

josef-pkt · 2014-01-09T14:19:27Z

@argriffing no, it's different
multiple comparison, multiple testing (available in statsmodels) adjusts the p-values in parallel for many tests.

Here we are combining several tests into a joint hypothesis, so we end up with one test.

Wikipedia mentions some extension of Fisher's method to the correlated case. If we want to be open to these kind of enhancement, then we should choose more generic names

combine_pvalues(pvalues, method='fisher')
method can be fisher brown or kost.

based on a quick look at the Wikipedia page and at https://en.wikipedia.org/wiki/Extensions_of_Fisher%27s_method

the second name

combine_ztests ? combine_zscore doesn't refer to hypothesis testing

josef-pkt · 2014-01-09T14:41:21Z

scipy/stats/stats.py

+    return (Xsq, pval)
+
+
+def stouffers_method(p, w=None):


I would spell out argument names for more informative signatures

pvalues, weights

josef-pkt · 2014-01-09T15:00:38Z

Wittlock looks like a good reference http://onlinelibrary.wiley.com/doi/10.1111/j.1420-9101.2005.00917.x/abstract
and a discussion why weighted Stouffer's can have more power than combining p_values

combining p_values cannot use the information that we might have unbalanced tests (tests with unequal sample sizes and noise variance, power)

Brown mentions one-sided http://www.jstor.org/stable/2529826 (I didn't look at the paper)

http://compute1.lsrc.duke.edu/softwares/MetaP/metap.php has two versions each (with/without trend, whatever that is)

josef-pkt · 2014-01-09T15:02:58Z

Looks good to me, overall

cosmetic changes, names

suggested change of function names for forward compatibility

extensions look interesting, but not needed in this PR

cancan101 · 2014-01-09T22:49:33Z

@josef-pkt So for the time being, what do you think about:

combine_pvalues(pvalues, method='fisher')
where method can be fisher or stouffer

josef-pkt · 2014-01-09T23:30:40Z

Yes that sounds good.

I was misreading before the arguments for stouffer. I didn't look carefully enough and thought it takes z_values as argument instead of p_values.

you still need the weights argument, that only applies for 'stouffer', and is ignored if 'fisher'

jancr · 2014-03-13T16:07:50Z

at line 4194, shouldn't:
pval = 1 - distributions.chi2.cdf(Xsq, 2 * len(p))
be:
pval = distributions.chi2.sf(Xsq, 2 * len(p))
to prevent float underflow issues

cancan101 · 2014-03-25T03:04:27Z

@jancr Definitely. Good call.

branliu0 · 2014-08-06T21:49:52Z

bump

Would love to see this PR go through, and I'm happy to help out.

rgommers · 2014-08-12T19:54:27Z

@thenovices that would be useful. There's only a few minor issues to address it looks like. Would be good to hear from @cancan101 if he can finalize this PR, otherwise maybe you could do it. Rebasing the PR on latests master and addressing all comments from @josef-pkt and the one above on cdf/sf from @jancr should be enough to get this in a mergeable state.

cancan101 · 2014-08-12T19:57:56Z

Does someone else want to take this PR over? I have been pretty swamped with some other stuff.

branliu0 · 2014-08-14T00:59:47Z

Sure, I'm more than happy to take over. I'm traveling right now so I won't get to it for a few days. What's the best way to take over the PR?

cancan101 · 2014-08-14T01:06:14Z

@thenovices I set it so that you can push to my repos (ie cancan101/scipy) so you should be able to push to this branch which will update the PR.

branliu0 · 2014-08-14T01:08:30Z

Great, thanks for that.

coveralls · 2014-08-22T16:35:28Z

Coverage increased (+0.0%) when pulling b4ba453 on cancan101:fisher_test into e942cbe on scipy:master.

ev-br · 2014-08-22T17:09:32Z

scipy/stats/stats.py

+
+    Notes
+    -----
+    Fisher's method (also known as Fisher's combined probability test) [1] uses


I'm not entirely sure about the references: I think ReST syntax needs a trailing underscore, like so [1]_. Best try building the docs locally..

argriffing · 2014-08-22T19:12:52Z

The CI error is caused by Travis's impatience with tests running over 50 minutes.

coveralls · 2014-08-23T10:28:44Z

Coverage increased (+0.0%) when pulling b4ba453 on cancan101:fisher_test into e942cbe on scipy:master.

josef-pkt · 2014-08-23T15:16:29Z

scipy/stats/stats.py

+            raise ValueError("pvalues and weights must be of the same size.")
+
+        Zi = distributions.norm.isf(pvalues)
+        Z = np.sum(weights * Zi) / ((np.sum(weights**2))**0.5)


structural optimization
in this case I would put the Z calculation inside the if weights is None and else to avoid handling weights when there are None.
(It's not a complicated piece of code where we loose the overview if we have two code paths.)

@josef-pkt I'm not sure exactly what change you had in mind, but if the weights arg is None, and therefore treated as discrete uniform, then Z = np.sum(weights * Zi) / ((np.sum(weights**2))**0.5) is not the same as np.mean(Zi). It would be something like np.sum(Zi) / np.sqrt(len(pvalues)) which is starting to be complicated to the point of 'losing overview' with two code paths.

you already have len(pvalues) 4 times, you could give it an name like k or n

z = (weights * zi).sum() / np.sqrt(np.sum(weights**2)) else: z = zi.sum() / np.sqrt(k)

looks readable to me

asarray is missing for this

needs raise if not 1-D, because numbers would be wrong for 2d, or maybe it would raise at a uninformative point
could make it vectorized along axis=0, I guess

branliu0 · 2014-08-31T15:00:37Z

Thanks for all your comments and suggestions, everyone. I've incorporated them in my latest commits here.

coveralls · 2014-08-31T22:11:15Z

Coverage increased (+0.01%) when pulling 5fbb1a9 on cancan101:fisher_test into db20a63 on scipy:master.

coveralls · 2014-08-31T22:24:32Z

Coverage increased (+0.01%) when pulling 5fbb1a9 on cancan101:fisher_test into db20a63 on scipy:master.

branliu0 · 2014-09-05T16:31:01Z

bump

branliu0 · 2014-09-21T19:06:24Z

Bump -- just rebased.

coveralls · 2014-09-21T19:50:18Z

Coverage increased (+0.0%) when pulling f5ce988 on cancan101:fisher_test into c532eb8 on scipy:master.

branliu0 · 2014-09-25T12:59:56Z

@josef-pkt or someone else -- could we get this merged in soon so this PR gets wrapped up?

josef-pkt · 2014-09-26T13:10:32Z

looks good to me (assuming the reference numbers are correct)

the doc string formatting for enumerating 'fisher' and 'stouffer' in a list might not be correct
@ev-br @argriffing @rgommers merge ?

branliu0 · 2014-10-16T19:30:36Z

@ev-br @argriffing @rgommers bump

Also, any idea on the proper docstring formatting for a list of items?

jseabold · 2014-10-16T19:40:29Z

This is the proper docstring format for a list.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/gofplots.py#L204

branliu0 · 2014-10-16T19:48:48Z

Thank you @jseabold. Fixed in the latest commit.

ev-br · 2014-10-16T21:16:42Z

Could you squash your commits please? I think it could/should be just two: one for each of you two co-authors of this feature. If you don't feel confident with git rebase -i, we'll do it.

…y#3092

Refactored Fisher's method and Stouffer's method into a single method called stats.combine_pvalues. Added documentation for this method, and added tests for weighted Stouffer's method. Minor naming and style fixes. Update THANKS.txt Update autosummary Use sf instead of 1 - cdf Fix reST reference syntax Coerce to ndarray, use more numpy functions Add ref for test examples; test using python lists Remove leftover text from git conflict Add proper docstring format for list items

branliu0 · 2014-10-16T21:31:39Z

Sure thing, squashed!

coveralls · 2014-10-16T22:22:23Z

Coverage increased (+0.0%) when pulling 25cb793 on cancan101:fisher_test into 6b36d00 on scipy:master.

ENH Added Fisher's method and Stouffer's Z-score method

ev-br · 2014-10-17T10:19:22Z

OK, merged. Thanks @thenovices, @cancan101, all. Sorry it tool so long.

branliu0 · 2014-10-17T10:20:22Z

Thanks!!

argriffing reviewed Dec 3, 2013
View reviewed changes

scipy/stats/stats.py

Parameters

----------

x : array_like of p-values

Copy link

Contributor

argriffing Dec 3, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably say p not x

josef-pkt reviewed Jan 9, 2014
View reviewed changes

josef-pkt mentioned this pull request Jan 9, 2014

Fisher's method for combining independent test results #3092

Closed

pv added the PR label Feb 19, 2014

pv removed the PR label Aug 13, 2014

ev-br reviewed Aug 22, 2014
View reviewed changes

josef-pkt reviewed Aug 23, 2014
View reviewed changes

branliu0 force-pushed the fisher_test branch from b4ba453 to e1324be Compare August 31, 2014 14:58

branliu0 force-pushed the fisher_test branch from 5fbb1a9 to f5ce988 Compare September 21, 2014 19:06

cancan101 and others added 2 commits October 16, 2014 22:30

ENH Addeded Fisher's method and Stouffer's Z-score method closes scip…

91bf644

…y#3092

branliu0 force-pushed the fisher_test branch from 4d342f7 to 25cb793 Compare October 16, 2014 21:31

ev-br added a commit that referenced this pull request Oct 17, 2014

Merge pull request #3109 from cancan101/fisher_test

5f93f7b

ENH Added Fisher's method and Stouffer's Z-score method

ev-br merged commit 5f93f7b into scipy:master Oct 17, 2014

ev-br added this to the 0.15.0 milestone Oct 17, 2014

ENH Added Fisher's method and Stouffer's Z-score method #3109

ENH Added Fisher's method and Stouffer's Z-score method #3109

Conversation

cancan101 commented Dec 3, 2013

argriffing Dec 3, 2013

Choose a reason for hiding this comment

coveralls commented Dec 3, 2013

coveralls commented Dec 4, 2013

cancan101 commented Dec 4, 2013

cancan101 commented Jan 9, 2014

ev-br commented Jan 9, 2014

josef-pkt commented Jan 9, 2014

argriffing commented Jan 9, 2014

josef-pkt commented Jan 9, 2014

josef-pkt Jan 9, 2014

Choose a reason for hiding this comment

josef-pkt commented Jan 9, 2014

josef-pkt commented Jan 9, 2014

cancan101 commented Jan 9, 2014

josef-pkt commented Jan 9, 2014

jancr commented Mar 13, 2014

cancan101 commented Mar 25, 2014

branliu0 commented Aug 6, 2014

rgommers commented Aug 12, 2014

cancan101 commented Aug 12, 2014

branliu0 commented Aug 14, 2014

cancan101 commented Aug 14, 2014

branliu0 commented Aug 14, 2014

coveralls commented Aug 22, 2014

ev-br Aug 22, 2014

Choose a reason for hiding this comment

argriffing commented Aug 22, 2014

coveralls commented Aug 23, 2014

josef-pkt Aug 23, 2014

Choose a reason for hiding this comment

argriffing Aug 25, 2014

Choose a reason for hiding this comment

josef-pkt Aug 25, 2014

Choose a reason for hiding this comment

branliu0 commented Aug 31, 2014

coveralls commented Aug 31, 2014

coveralls commented Aug 31, 2014

branliu0 commented Sep 5, 2014

branliu0 commented Sep 21, 2014

coveralls commented Sep 21, 2014

branliu0 commented Sep 25, 2014

josef-pkt commented Sep 26, 2014

branliu0 commented Oct 16, 2014

jseabold commented Oct 16, 2014

branliu0 commented Oct 16, 2014

ev-br commented Oct 16, 2014

branliu0 commented Oct 16, 2014

coveralls commented Oct 16, 2014

ev-br commented Oct 17, 2014

branliu0 commented Oct 17, 2014