-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scipy.stats.binom_test / binom.sf return incorrect values for large x and n #13079
Labels
defect
A clear bug or issue that prevents SciPy from being installed or used as expected
scipy.stats
Comments
chrisb83
added
the
defect
A clear bug or issue that prevents SciPy from being installed or used as expected
label
Nov 15, 2020
thanks for reporting it, this is bug which also impacts the distribution
|
Sounds like a potential job for boostinator! |
@mdhaber Done. The boost implementation of binomial survival function appears to do the right thing here. |
I think this is the same issue as this one:
#5503
…-Lucas Roberts
On Nov 15, 2020, at 2:32 PM, Christoph Baumgarten ***@***.***> wrote:
thanks for reporting it, this is bug which also impacts the distribution stats.binom
stats.binom.cdf(1e9, 2e9, 0.5) # 0.885...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
defect
A clear bug or issue that prevents SciPy from being installed or used as expected
scipy.stats
scipy.stats.binom.sf and scipy.stats.binom_test returns incorrect values for large inputs. The bounds on the inputs is not documented, and similar functions in other libraries (e.g. R's
binom.test
) do not have this problem.Reproducing code example:
Error message:
None.
Expected behaviour:
All calls in the repro example should return a value near 0.5. For example, in R:
Notes:
Internally, for the "greater" alternative, scipy calls
binom.sf(x - 1, n, p)
, which then callsscipy.special.bdtrc(floor(x), n, p)
. This forwards to Cephes' bdtrc, according to scipy's docs.We can verify that the bug lies within
bdtrc
:Non-broken implementations of the regularized incomplete beta function (such as TensorFlow's
tf.math.betainc
) will return the expected value (0.5).This also affects anything else that calls
bdtrc
, includingbinom.sf
. There may be a case thatbdtrc
is Working as Intended (because the documentation is explicit that it's just a wrapper for Cephes' buggy implementation), but I think forbinom.sf
andbinom_test
it's clear that there is at least a documentation bug if not an opportunity to make things better.I think there are 4 viable solutions, in what I think is most-preferred to least-preferred:
binom.sf
when "k" and "n" are large (perhaps checking that "p" isn't too biased). Fortunately the circumstances whenbdtrc
breaks usually coincide with when this approximation is good. :)I'm happy to send Pull Requests given some direction about which of the choices above to go with.
Scipy/Numpy/Python version information:
The text was updated successfully, but these errors were encountered: