-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poisson Binomial Distribution Support #6000
Comments
In many cases if a distribution is missing, then it's because nobody implemented it. Based on a brief look at the Wikipedia page, I think the main question is whether this will be fast enough in general to get a usable implementation. The sums and products in the pmf and cdf look computation intensive. |
Yeah, I assume it will be fairly computationally intensive but I guess as you said it wouldn't be the only thing. Are there any standards we try to adhere to for performance? I'm going to be implementing this anyways so I'm happy to test out performance if there are any standards that need to be met. If not, we can at least document it to be clear that it's expected to be computationally intensive. |
I haven't seen any performance standards for distributions. Some of them are marked "slow" in the unit test because some methods that use generic calculations can be very slow (rvs needs ppf needs cdf needs pmf for each point). |
Using fourier transforms this is actually pretty fast. One thing I am seeing in my pmf implementation using fourier transforms is that the sum of my probabilities is off from 1 by around 1e-14 so there's some precision loss. I'm currently looking through my implementation to see if I can avoid precision loss but are there any standards regarding precision that I should be meeting? |
Re accuracy: hard to tell in general, depends on the quantity.Offhand, 1e-14 sounds ok. |
I've got a fairly working implementation of this outside of scipy, but when trying to get it working within scipy I'm running into some issues regarding the way shape parameters are handled. Because Poisson Binomial requires a bunch of individual probabilities I'm trying to take in an array of probabilities as the shape argument of my distribution. As far as I can tell, the rv_generic/rv_discrete infrastructure that exists currently doesn't really work with array parameters for shape variables. Is there any way to take in an array of probabilities as a shape parameter with the current infrastructure? I could probably also do this with varargs but at a cursory glance it looks like that's prohibited and if it's allowed it's not really clear how. |
No, the setting/framework for distributions doesn't allow for multivariate shape parameters, they have to be individually provided as arguments. |
Gotcha. So if that is the case do you see any way to be able to specify the If there's documentation about what's supported in happy to look there.
|
BTW: In statsmodels we are just struggling with the Tweedie distribution which is a compound Poisson-Gamma distribution with an infinite sum in the pdf. I hope to avoid having to calculate it for the |
There is also the discussion about the histogram distribution on the mailing list, which would have the same problem. see https://github.com/scipy/scipy/pull/5672/files |
@ev-br using |
ISTM (superficially, without looking at any code) that this pbinom would better fit among multivariate distributions. They are in _multivariate.py, a simple example is e.g. in #5410 |
The distribution itself is univariate. |
@josef-pkt Indeed. Then it might be easier to register it as having no shape parameters and set the probabilities in the constructor. |
Something like this should work:
Then a user would do |
I recognize this is a pretty old issue but I was never able to get it cleanly integrated into scipy. Someone just asked me about the code so I'm including it below and if anyone wants to use it, or carry it over the finish line that would be great:
|
Can you comment as to why it was never integrated? |
There is another implementation using FFT here. The author says that his implementation is based on this paper, which is newer (2013) than the paper cited by wikipedia(2010). The newer paper is available as full text here. Note that the author of the paper also published an R package implementing his computation methods (source), that may be used for verifying the results. |
The problem is that this distribution doesn't fit well into the existing distribution infrastructure because it has a many-valued shape parameter (or a variable number of scalar shape parameters). We'll make sure that there is support for this sort of thing in the new infrastructure (gh-15928). |
I'd like to add support for the Poisson Binomial Distribution: https://en.wikipedia.org/wiki/Poisson_binomial_distribution
into the scipy.stats module. It could be placed under "scipy.stats.pbinom". Is there a reason why support for this doesn't already exist and if not, would it be a welcome addition to the scipy repo?
If so, I'll go ahead and implement it and submit a PR.
The text was updated successfully, but these errors were encountered: