-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats binom at non-integer n (Trac #1895) #2414
Comments
trac user Sytse wrote on 2013-04-23 I don't know of an extension of the binomial distribution for non-integer n and couldn't find one in the literature. BTW, how should in scipy a check be coded whether a parameter has an integer value? abs(par-round(par))<eps? |
@josef-pkt wrote on 2013-04-23 for a related discussion #198 and gh-2172 "practicality beats purity" I'm using binom right now for the binomial tests and power and sample size calculations. It's good to know the limitation, but don't know what's the best way to enforce this yet. |
trac user Sytse wrote on 2013-04-23 The discussion you refer to is about something different: the Erlang distribution appears to be identical to the gamma distribution, apart from the fact that one of the Erlang parameters is restricted to integer values. If the integer value restriction to the binomial n is dropped, you obviously don't get such a 'continuous' extension: the resulting 'distribution' either has no finite support (as in the binomial case), or its total probability is not 1, where the latter in my opinion is an essential property of a probability distribution. The slogan "practicality beats purity" sounds appealing, but is imo not a fair summary of the problem. |
@josef-pkt wrote on 2013-04-23 I completely agree, but what should be done
Nobody has ever complained about this, and I only found it because of a random check of the distribution. Maybe the case where we run over a large number of n doesn't show up very often (only in sample size calculations, and maybe estimation). So I guess the check would cost much. Floating point errors: what's the correct check? your BTW I think I can convince myself to returning nans. in R
|
@josef-pkt wrote on 2013-04-23 typo in previous comment So I guess the check would NOT cost much |
I think this issue just needs a decision. I would suggest just putting an equality integer check in def _argcheck(self, n, p):
return (n >= 0) & **(n == np.asarray(n, dtype=int))** & (p >= 0) & (p <= 1) Ideally For reference, gh-13204 performs this sort of integer check for the new |
@mdhaber Josh and I had some related discussions here: #11045 Given that the This wiki section describes the relationship, basically repeated integration by parts of the incomplete beta function to get the distribution function for binomial. |
100% @mckib2! BTW, it's ready from my perspective if you have a minute to check it out!
I'll take a look, thanks. Does that mean that most of the methods work fine with non-integer |
On Feb 9, 2021, at 11:44 PM, Matt Haberland ***@***.***> wrote:
at least until boostinator goes in (love the name BTW).
100% @mckib2! BTW, it's ready from my perspective if you have a minute to check it out!
This wiki section describes the relationship, basically repeated integration by parts of the incomplete beta function to get the distribution function for binomial.
I'll take a look, thanks. Does that mean that most of the methods are right; it's just pmf that's wrong?
That’s the TL;DR for current implementation.
The rvs method might also be impacted-not certain on that one though-it depends.
Should there be a separate argument check in pmf, or should we just mention it in documentation?
Looked over the discussion again in the bdtr PR, there we said we’ll stop supporting n!=floor(n) cases with 1.7.0 release which we are doing in next release. That change will impact the cdf, sf, and ppf.
The cdf and sf can support a double k argument and a double n but they need to be floored iirc so things still behave according to axioms of probabilities.
Given that plan is to move to boost, how does boost handle these cases?
(n and/or k double)
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'll reply in gh-13328 |
My vote would be just to do the integer check. Here's some background for context: The natural generalization of the binomial distribution for non-integral Some special cases show up in a paper by Gabor J Szelesky entitled Half of a coin: Negative probabilities. The main idea is that if The I think these quasiprobability distributions are interesting but outside the scope of |
For gh-13632, my vote would also be to do the integral check unless there really is a meaningful extension to non-integral parameter values that gives an actual probability distribution. |
Thanks @steppi. @tupui Since you also replied in gh-13632, I thought I'd bring this one up. Based on @steppi's comments above, what do you think about deprecating calling SciPy's built-in discrete distributions with non-integer values of arguments that are supposed to be integer (unless there is a clearly correct extension to non-integer values already implemented in SciPy)? |
Crossref gh-3758. Not the same, but related (which arguments can be non-integral.) |
If only calling integers simplifies things on our side, I am fine with it. I used a lot discrete distributions with non integers values for modelling, but I could have used a mapping if SciPy was only dealing with integers. |
@tupui can you give an example? I think @mdhaber is talking about discrete distributions with parameters that only make sense for integer values, but I think you may be talking about discrete distributions which take on non-integer values, for example a distribution that takes on 0 with probability 1/2 and 0.5 with probability 0.5. Do I understand correctly? I’d hope that we only deprecate for cases where there genuinely isn’t a valid probability distribution when a parameter is extended to be non-integral, like the example from this issue of the binomial distribution with non-integral n. |
You're correct 👍 then all good. |
I think we're on the same page. I'll open a PR. |
Original ticket http://projects.scipy.org/scipy/ticket/1895 on 2013-04-19 by @josef-pkt, assigned to @rgommers.
stats.binom does not impose that n is an integer.
The cdf seems to work correctly (?), the pmf doesn't sum to one.
The cdf from scipy.special might just floor the argument n (cast to int).
I don't know what to make of the case, when n is not an integer.
Is there an extension of the distribution for all real n>=0?
The text was updated successfully, but these errors were encountered: