New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scipy.stats.binom.ppf Incorrect for p=0 #5122
Comments
I guess you mean |
I did. Sorry for the error.
|
Btw, I got Most of what's happening is coming from here https://github.com/scipy/scipy/blob/master/scipy/stats/_discrete_distns.py#L64 The As in special.bdtrik(0, n, p) = 0 # for non-zero n and p
special.bdtrik(q, n, 0) = n # for non-zero n and q<0.5
special.bdtrik(q, n, 0) = 0 # for non-zero n and q>0.5 It appears So I have no idea what's happening. I suspect this has to do with implementation where it chooses method of evaluation depending on whether q>0.5 or not. |
Hi, For version 0.13.0b1 I get the following: ppf(q,n,0) == n-1 for 0<q<=.5 I'm not sure what is wrong with the code, however hopefully the following I suspect the issue is due to the fact that a binomial random variable To ensure that its inverse (the ppf function) is defined everywhere in ppf = lambda x, n, p : (float(i) for i in xrange(n+1) if Here I find the first value "i" such that the cdf is >= x. (keep n small if you want to test this out) I suspect that the section of code that does this is the likely cause of On Sun, Aug 9, 2015 at 11:31 AM, Varun Nayyar notifications@github.com
|
Is this related to #1603? |
Going to the limits is often difficult numerically as in #1603. The current ppf looks correct for small (but not tiny) p, but jumps to wrong values at p=0. (ignoring the segment of p in #1603) |
@josef-pkt I vaguely recall that earlier you said the scipy.stats functions are not in general implemented correctly at limits. Is this still true? Are problems at these limits considered bugs or out of scope? If limits are allowed (e.g. p=0 or p=1 in a binomial distribution, sigma=0 in a normal distribution, etc.) then should some kind of meta-issue be opened for this? |
First, specific to this issue: AFAICS, ppf returns wrong numbers and not edit The difference to the case sigma=0 in the normal distribution is that here we already have a discrete distribution with probabilities and a well defined finite limit. When scale->0 in continuous distribtuion, then we don't have a standard continuous distribution anymore. (end edit) In general about edge cases: I would treat them case by case and improve (include boundaries) if possible and if it doesn't produce slow spaghetti code. scipy.special has improved a lot over the years, and also in corner cases in stats are now much better handled than they were before. However, I wouldn't make any assumptions that any of the functions are well defined at or close to singular limits. For example in statsmodels (e.g. GLM and discrete) we clip values most of the times to stay away from the boundaries, or impose transformations that force values away from the boundaries. (#1603 doesn't worry me longer than half a minute) A case that is now well handled by scipy but not yet used much in statsmodels is the 0log0 case or many similar cases. (or currently box-cox transformation where I didn't pay attention) |
I'm not sure if it helps debugging, but this happens: >>> bdtrik(0, 4, 1e-16)
0.0
>>> bdtrik(0, 4, 1e-20)
4.0
>>> bdtrik(0, 4, 0)
4.0 I guess this is related to that other issue. |
I think it's hitting edge case bugs in https://github.com/scipy/scipy/blob/master/scipy/special/cdflib/cdfbin.f or https://github.com/scipy/scipy/blob/master/scipy/special/cdflib/dinvr.f, if anyone would like following GOTOs through unmaintained fortran code. |
Yes this is how scipy documents ppf of discrete distributions in http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#specific-points-for-discrete-distributions: |
It is almost second anniversary of the issue and the issue still haunts scipy (v. 0.19.0). Maybe some kind of a monkey-path like |
In the original post, the desired answer for x=1 is:
Note that Boost and R (e.g. |
Hi,
The when calling the function scipy.stats.binom.ppf(x,n,0) it returns the following values:
x=0 : scipy.stats.binom.ppf(x,n,0) == -1
0<x<1 : scipy.stats.binom.ppf(x,n,0) == n-1
x=1 : scipy.stats.binom.ppf(x,n,0) == n
However, I believe that it should return the values:
x=0 : scipy.stats.binom.ppf(x,n,0) == -1
0<x<1 : scipy.stats.binom.ppf(x,n,0) == 0
x=1 : scipy.stats.binom.ppf(x,n,0) == 0
The text was updated successfully, but these errors were encountered: