New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats: Add the conditional odds ratio and CI to fisher_exact. #12288
ENH: stats: Add the conditional odds ratio and CI to fisher_exact. #12288
Conversation
ea25e94
to
b051f5d
Compare
Test failures are not related to this PR. |
4037cca
to
2628935
Compare
I just pushed an update (rebased). The PR now includes both the conditional odds ratio and the conditional confidence interval. The new computations match the values in the R function
With the updated
(Note: The code was updated to add the parentheses to The numerical values do not match R's values exactly, and in many cases, the values agree to only a few decimal places. Each time I have checked such a discrepancy, I have found it is because R often gives only 3 to 5 digits of precision, and the result from |
2628935
to
f5c0ad9
Compare
f5c0ad9
to
3ac9c7d
Compare
3ac9c7d
to
1b3bcc8
Compare
I have switched this from "draft" to "ready for review". There is a lot here, so let me know if there is anything I can do to make it easier to review. |
# [1,] 100 2 | ||
# [2,] 1000 5 | ||
# | ||
table_data = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these values come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Getting back to this PR--you might have to jog your memory a bit.)
These are arrays that were tested in
scipy/scipy/stats/tests/test_stats.py
Line 407 in d4789ba
class TestFisherExact(object): |
def test_less_greater(self): | ||
tables = ( | ||
# Some tables to compare with R: | ||
[[2, 7], [8, 2]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these tables are no longer tested, or were they moved somewhere other than test_less_greater
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tables are included in the file fisher_exact_results_from _r.py
, which includes all three alternatives for each table.
[1] 1.701815e-09 | ||
""" | ||
|
||
def test_basic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain where these and test_precise
went?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test class TestFisherExact
should cover all these matrices. Note that the results from R are stored in the generated file fisher_exact_results_from_r.py
. This data is used in the test method TestFisherExact.test_results_from_r
in test_fisher_exact.py
.
scipy/stats/_fisher_exact.py
Outdated
the noncentrality parameter of Fisher's noncentral | ||
hypergeometric distribution with the same hypergeometric | ||
parameters as ``table`` whose mean is ``table[0, 0]``. | ||
(FIXME: I know this need work!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could use a little copy-editing. I'll tweak the docstring in my next update.
scipy/stats/_fisher_exact.py
Outdated
>>> result.pvalue | ||
0.03496503496503495 | ||
|
||
The probability that we would observe this or an even more imbalanced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The probability under the null hypothesis that ..." would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll add that.
scipy/stats/_fisher_exact.py
Outdated
|
||
The sample and conditional odds ratios for this example are | ||
|
||
>>> result.sample_odds_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as would interpretation of these
1b3bcc8
to
b4c8f10
Compare
In fisher_exact, compute the conditional odds ratio and its confidence interval. The return value of `fisher_exact` is changed to an instance of a class created by _make_tuple_bunch(). The new return values related to the conditional odds ratio are included as attributes of this object. The code for `fisher_exact` is now in its own file. The tests for `fisher_exact` are also moved to a separate file. Some of the test values have been given more precise "known" values (computed using code written with mpmath or in C with GMP). An R script is included that generates the Python file "fisher_exact_results_from_r.py" that is used in the test suite. The generated Python file has the input parameters and R results written in a form that can be loaded into Python with just an import statement.
b4c8f10
to
56b8247
Compare
…terval Instead of always computing the confidence interval, the object returned by fisher_exact now has a method that is called to compute the confidence interval.
@WarrenWeckesser this should probably go under the Is there any reason it should be here in |
@rlucas7 @WarrenWeckesser @mdhaber No activity in last 16 days--will there be consensus here for a merge very soon? Otherwise, maybe bump the milestone and if it does get merged before I branch just switch back the milestone. |
@rlucas7 wrote
For reference, a bit of history (also relevant to #13153 and #13048): the These days I try to be more conservative about what we make public, and if I went back in time, I'd argue against making Having said that, I'm OK with using the |
After looking into more instances of the use of the odds ratio in texts and papers, I think it is important that we also include the confidence interval for the sample odds ratio. In a new commit, I have added the method |
For example, with the new method, we can reproduce the results at https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717_ComparingFrequencies/PH717_ComparingFrequencies8.html:
|
@WarrenWeckesser If I understand correctly, a lot of this will be implementation of Fisher's NCHG distribution. Before I review all that, we should make some decisions about mdhaber#31. If FNCHG is going to be added as a distribution, shouldn't this rely on that? If so, I think we should add FNCHG first, then add this. Also, I suppose I can use a separate diff tool to review the changes against the existing |
If this will close gh-11131, maybe add that up top so it will be closed when this merges. |
I decided it makes more sense to create a new function for the odds ratio statistic. The new PR gh-13340 is a replacement for this PR. In the new PR, I tweak the docstring of |
Replaced by gh-13340, so closing. |
In fisher_exact, compute the conditional odds ratio and its
confidence interval.
The return value of
fisher_exact
is changed to an instanceof a class created by _make_tuple_bunch(). The new return
values related to the conditional odds ratio are included as
attributes of this object.
The code for
fisher_exact
is now in its own file. The testsfor
fisher_exact
are also moved to a separate file. Some ofthe test values have been given more precise "known" values
(computed using code written with mpmath or in C with GMP).
An R script is included that generates the Python file
"fisher_exact_results_from_r.py" that is used in the test suite.
The generated Python file has the input parameters and R results
written in a form that can be loaded into Python with just an
import statement.