ENH: stats: Add the function odds_ratio. #13340

WarrenWeckesser · 2021-01-04T10:22:02Z

The new function scipy.stats.contingency.odds_ratio computes
the odds ratio and p-value for a 2x2 contingency table. An option
allows the user to select either the sample odds ratio or the
conditional maximum likelihood estimate of the odds ratio. The
returned object provides a method for computing the confidence
interval of the odds ratio.

Closes gh-11131.

The new function `scipy.stats.contingency.odds_ratio` computes the odds ratio and p-value for a 2x2 contingency table. An option allows the user to select either the sample odds ratio or the conditional maximum likelihood estimate of the odds ratio. The returned object provides a method for computing the confidence interval of the odds ratio. Closes scipygh-11131.

WarrenWeckesser · 2021-01-04T10:34:37Z

This PR replaces gh-12288. This PR creates a new function in the scipy.stats.contingency namespace specifically for the odds ratio statistic, similar to how the proposed function relative_risk in gh-13048 creates a function for another widely used statistic for 2x2 contingency tables, the relative risk.

There is code in the file _odds_ratio.py that implements a private version of Fisher's noncentral hypergeometric distribution. Some of that code will be able to use the implementation in gh-13330 instead, but we don't necessarily have to block this PR waiting for gh-13330. If it turns out that gh-13330 is merged first, I can update this PR to use it, but if this PR goes in first, the private implementation can be updated after gh-13330 is merged.

Add odds_ratio and OddsRatioResult to the contingency module docstring. This should make the refguide-check happy.

sethtroisi

Small comments, I hope to dive into _odds_ratio.py tomorrow.

scipy/stats/stats.py

scipy/stats/tests/generate_fisher_exact_results_from_r.R

mdhaber · 2021-01-04T18:32:01Z

Some of that code will be able to use the implementation in gh-13330 instead, but we don't necessarily have to block this PR waiting for gh-13330.

@WarrenWeckesser you proposed the addition of the two noncentral hypergeometric distributions. I implemented them in gh-13330 as we discussed. Rather than asking others to review the private code, then re-review when you swap things out, would it be more efficient for you to review gh-13330 first?

WarrenWeckesser · 2021-01-04T22:47:14Z

@mdhaber, yes, that makes sense. I'll mark this as draft for now.

* Use the new nchypergeom_fisher distribution instead of the private implementation. * Don't put `OddsRatioResult` into the public `scipy.stats` namespace.

WarrenWeckesser · 2021-02-23T00:00:21Z

I have updated this PR to use the new nchypergeom_fisher distribution.

scipy/stats/__init__.py

scipy/stats/_odds_ratio.py

* Tweak whitespace in the fisher_exact docstring. * Add a comment in the R script generate_fisher_exact_results_from_r.R that shows how to run the script.

scipy/stats/_odds_ratio.py

scipy/stats/tests/test_odds_ratio.py

scipy/stats/stats.py

mdhaber

So far I've reviewed documentation, tests, and sample odds ratio calculations (odds ratio, p value, and CI). Calculations look good; biggest comments are that sample odds ratio calculations and input validation need tests.

I see the reference for the conditional odds ratio CI. Is there one particular reference that you'd recommend I follow for the conditional odds ratio itself and for the fact that the fisher_exact p-value should be the same as this p-value?

scipy/stats/stats.py

scipy/stats/_odds_ratio.py

WarrenWeckesser · 2021-03-04T16:11:38Z

Here are some notes about relevant references, for anyone looking to learn more about inference with the odds ratio.

The following text covers the calculation in detail, in section 4.1.2. Specifically, equation (4.7) is the equation that must be solved for the CMLE.
- H. Sahai and A. Khurshid (1996), Statistics in Epidemiology: Methods, Techniques, and Applications, CRC Press LLC, Boca Raton, Florida.
This publication is often cited when the CMLE is disussed:
- Breslow NE, Day NE. Statistical methods in cancer research. Volume I - The analysis of case-control studies. IARC Sci Publ. 1980;(32):5-338. PMID: 7216345.
The PDF is available at

https://publications.iarc.fr/Book-And-Report-Series/Iarc-Scientific-Publications/Statistical-Methods-In-Cancer-Research-Volume-I-The-Analysis-Of-Case-Control-Studies-1980

The odds ratio and the CMLE are discussed in section 4.2, beginning on page 124. The defining equation for the CMLE is (4.5).
Inference for the odds ratio is discussed in section 10.3 of the text
- M. Hollander, D. A. Wolfe and E. Chicken (2014), Nonparametric Statistical Methods (third edition), John Wiley & Sons, Inc., Hoboken, NJ, USA.
The CMLE is covered in comments 14, 15 and 16 in the "Comments" subsection on pages 519-520. Note that there is a typographical error in comment 16: the reference to equation (10.38) should be (10.58).

scipy/stats/_odds_ratio.py

* Use nchypergeom_fisher.support(M, n, N, 1) to compute the support (instead of hypergeom.support(M, n, N)). * Add more tests of parameter validation for odds_ratio() and for the odds_ratio_ci() method of the result.

mdhaber

This was really close. Shall we finish it? I think the remaining request was just documenting these alternatives and for more specific comments to be added to the code to point readers to the appropriate reference + equation.

scipy/stats/_odds_ratio.py

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

mdhaber · 2021-05-06T19:11:13Z

I'm re-running CircleCI. Looked like a temporary glitch.

mdhaber · 2021-05-07T16:56:18Z

Re-ran CircleCI; same failure. Not seeing it in other PRs. But I can't see how the latest two commits would have cause this.

Aside: something I thought about when we were discussion the alternatives - from a UI perspective, how should we think about the relationship between this function and fisher_exact? Does this function essentially make fisher_exact obsolete but retain it for backwards compatibility (and its commonly-used name for the test)?

Update 7/30/2022: the function now only provides a point estimate and confidence interval of the statistic rather than duplicating the p-value of fisher_exact. I made this and other changes to bring the interfaces more in line with typical stats functions, avoiding exposure of attributes unless necessary. Future PRs can build on this if desired.

scipy/stats/_odds_ratio.py

… from new location

scipy/stats/_odds_ratio.py

mdhaber · 2022-07-31T14:52:18Z

scipy/stats/__init__.py

@@ -427,6 +427,7 @@
   contingency.margins
   contingency.relative_risk
   contingency.association
+   contingency.odds_ratio


I wouldn't object to this being in scipy.stats, but here is fine, too. Ultimately, it was probably added here because of #13048 (review).

mdhaber · 2022-07-31T15:00:59Z

scipy/stats/_odds_ratio.py

+    table : array_like of ints
+        A 2x2 contingency table.  Elements must be non-negative integers.


If this stays in stats.contingency, we could consider calling this observed to match chi2_contingency, expected_freq, and association.

The documentation explains how the table is laid out, so I don't see a need for the four separate arguments of relative_risk.

If this goes in stats, I would keep table to match fisher_exact, barnard_exact, and boschloo_exact.

On the other hand, I can see leaving it as table either way and using _rename_parameter to make the stats.contingency functions consistent with stats.

mdhaber · 2022-07-31T15:03:40Z

scipy/stats/_odds_ratio.py

+    ----------
+    table : array_like of ints
+        A 2x2 contingency table.  Elements must be non-negative integers.
+    kind : str, optional


There is not much precedent for kind in scipy.stats, but method is not appropriate IMO since it's more a matter of choice/definition rather than algorithm, and I'd avoid the keyword type.

mdhaber · 2022-08-08T03:16:54Z

No responses to email, but this seems fundamental, was requested by a user, and at least two maintainers have wanted to have this function, so let's merge it. For now, this has been trimmed down to the bare minimum (statistic and confidence interval), but if we want to change parameter names, add output, etc., that can all be done in a backward compatible way.

* ENH: stats: Add the function odds_ratio. The new function `scipy.stats.contingency.odds_ratio` computes the odds ratio for a 2x2 contingency table. An option allows the user to select either the sample odds ratio or the conditional maximum likelihood estimate of the odds ratio. The returned object provides a method for computing the confidence interval of the odds ratio. Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

WarrenWeckesser added scipy.stats enhancement A new feature or improvement labels Jan 4, 2021

DOC: stats: update the contingency module docstring.

c5644eb

Add odds_ratio and OddsRatioResult to the contingency module docstring. This should make the refguide-check happy.

WarrenWeckesser mentioned this pull request Jan 4, 2021

ENH: stats: Add the conditional odds ratio and CI to fisher_exact. #12288

Closed

sethtroisi reviewed Jan 4, 2021

View reviewed changes

scipy/stats/stats.py Outdated Show resolved Hide resolved

scipy/stats/tests/generate_fisher_exact_results_from_r.R Outdated Show resolved Hide resolved

scipy/stats/tests/generate_fisher_exact_results_from_r.R Outdated Show resolved Hide resolved

mdhaber mentioned this pull request Jan 4, 2021

ENH: stats: add noncentral hypergeometric distributions (Fisher's and Wallenius') #13330

Merged

WarrenWeckesser marked this pull request as draft January 4, 2021 22:47

mdhaber mentioned this pull request Feb 10, 2021

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

WarrenWeckesser added 2 commits February 22, 2021 12:47

Merge branch 'master' into oddsratio

d43c463

MAINT: stats: Update odds_ratio:

90627d1

* Use the new nchypergeom_fisher distribution instead of the private implementation. * Don't put `OddsRatioResult` into the public `scipy.stats` namespace.

WarrenWeckesser marked this pull request as ready for review February 22, 2021 20:40

DOC: stats: Update numerical output in the odds_ratio examples.

68f410f

mdhaber reviewed Feb 23, 2021

View reviewed changes

scipy/stats/__init__.py Outdated Show resolved Hide resolved

scipy/stats/_odds_ratio.py Outdated Show resolved Hide resolved

scipy/stats/_odds_ratio.py Outdated Show resolved Hide resolved

WarrenWeckesser added 2 commits February 23, 2021 15:23

MAINT: stats:

aca847f

* Tweak whitespace in the fisher_exact docstring. * Add a comment in the R script generate_fisher_exact_results_from_r.R that shows how to run the script.

DOC: stats: Update 'Returns' section of odds_ratio

0f39362

mdhaber reviewed Feb 26, 2021

View reviewed changes

scipy/stats/stats.py Outdated Show resolved Hide resolved

mdhaber requested changes Feb 26, 2021

View reviewed changes

MAINT: stats: Fix a comment; add tests for the odds ratio conf. int.

c591646