-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats: Add the function odds_ratio. #13340
Conversation
The new function `scipy.stats.contingency.odds_ratio` computes the odds ratio and p-value for a 2x2 contingency table. An option allows the user to select either the sample odds ratio or the conditional maximum likelihood estimate of the odds ratio. The returned object provides a method for computing the confidence interval of the odds ratio. Closes scipygh-11131.
This PR replaces gh-12288. This PR creates a new function in the There is code in the file |
Add odds_ratio and OddsRatioResult to the contingency module docstring. This should make the refguide-check happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, I hope to dive into _odds_ratio.py
tomorrow.
@WarrenWeckesser you proposed the addition of the two noncentral hypergeometric distributions. I implemented them in gh-13330 as we discussed. Rather than asking others to review the private code, then re-review when you swap things out, would it be more efficient for you to review gh-13330 first? |
@mdhaber, yes, that makes sense. I'll mark this as draft for now. |
* Use the new nchypergeom_fisher distribution instead of the private implementation. * Don't put `OddsRatioResult` into the public `scipy.stats` namespace.
I have updated this PR to use the new |
* Tweak whitespace in the fisher_exact docstring. * Add a comment in the R script generate_fisher_exact_results_from_r.R that shows how to run the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I've reviewed documentation, tests, and sample odds ratio calculations (odds ratio, p value, and CI). Calculations look good; biggest comments are that sample odds ratio calculations and input validation need tests.
I see the reference for the conditional odds ratio CI. Is there one particular reference that you'd recommend I follow for the conditional odds ratio itself and for the fact that the fisher_exact
p-value should be the same as this p-value?
Here are some notes about relevant references, for anyone looking to learn more about inference with the odds ratio.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was really close. Shall we finish it? I think the remaining request was just documenting these alternatives and for more specific comments to be added to the code to point readers to the appropriate reference + equation.
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
I'm re-running CircleCI. Looked like a temporary glitch. |
Re-ran CircleCI; same failure. Not seeing it in other PRs. But I can't see how the latest two commits would have cause this. Aside: something I thought about when we were discussion the alternatives - from a UI perspective, how should we think about the relationship between this function and Update 7/30/2022: the function now only provides a point estimate and confidence interval of the statistic rather than duplicating the p-value of |
@@ -427,6 +427,7 @@ | |||
contingency.margins | |||
contingency.relative_risk | |||
contingency.association | |||
contingency.odds_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't object to this being in scipy.stats
, but here is fine, too. Ultimately, it was probably added here because of #13048 (review).
table : array_like of ints | ||
A 2x2 contingency table. Elements must be non-negative integers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this stays in stats.contingency
, we could consider calling this observed
to match chi2_contingency
, expected_freq
, and association
.
The documentation explains how the table is laid out, so I don't see a need for the four separate arguments of relative_risk
.
If this goes in stats
, I would keep table
to match fisher_exact
, barnard_exact
, and boschloo_exact
.
On the other hand, I can see leaving it as table
either way and using _rename_parameter
to make the stats.contingency
functions consistent with stats
.
---------- | ||
table : array_like of ints | ||
A 2x2 contingency table. Elements must be non-negative integers. | ||
kind : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not much precedent for kind
in scipy.stats
, but method
is not appropriate IMO since it's more a matter of choice/definition rather than algorithm, and I'd avoid the keyword type
.
No responses to email, but this seems fundamental, was requested by a user, and at least two maintainers have wanted to have this function, so let's merge it. For now, this has been trimmed down to the bare minimum ( |
* ENH: stats: Add the function odds_ratio. The new function `scipy.stats.contingency.odds_ratio` computes the odds ratio for a 2x2 contingency table. An option allows the user to select either the sample odds ratio or the conditional maximum likelihood estimate of the odds ratio. The returned object provides a method for computing the confidence interval of the odds ratio. Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
The new function
scipy.stats.contingency.odds_ratio
computesthe odds ratio and p-value for a 2x2 contingency table. An option
allows the user to select either the sample odds ratio or the
conditional maximum likelihood estimate of the odds ratio. The
returned object provides a method for computing the confidence
interval of the odds ratio.
Closes gh-11131.