New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Added Mixture distribution #19886
base: master
Are you sure you want to change the base?
Conversation
✅ Hi, I am the SymPy bot (v160). I'm here to help you write a release notes entry. Please read the guide on how to write release notes. Your release notes are in good order. Here is what the release notes will look like:
This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.7. Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it. Click here to see the pull request description that was parsed.
|
🟠Hi, I am the SymPy bot (v160). I've noticed that some of your commits add or delete files. Since this is sometimes done unintentionally, I wanted to alert you about it. This is an experimental feature of SymPy Bot. If you have any feedback on it, please comment at sympy/sympy-bot#75. The following commits add new files:
If these files were added/deleted on purpose, you can ignore this message. |
Codecov Report
@@ Coverage Diff @@
## master #19886 +/- ##
==============================================
+ Coverage 64.256% 75.809% +11.552%
==============================================
Files 666 669 +3
Lines 172568 173342 +774
Branches 40689 40857 +168
==============================================
+ Hits 110887 131410 +20523
+ Misses 55277 36195 -19082
+ Partials 6404 5737 -667 |
Maybe there is a possibility for a different kind of appoarch: you designed an API asking for
Point 1. is just a categorical distribution. Point 2. can be considered as a joint distribution of independent random variables. I was wondering, could we instead approach this design by defining:
At this point, for example, you would define the mixture of IndependentRandomSymbolProduct(X, Y)[CategoricalRV(probs=[1/3, 2/3])] This is a completely different approach. I was curious about making the class construction more similar to the way the compound distribution is defined. |
Okay, I understood your approach, but according to the definition of IID Rvs, all the RVs should have the same distribution as per wikipedia, so for IID Random Symbol should be in the following way I think: 1. X = IIDRandomSymbol('X', [X1, X2, ..., Xn]) # where all Xi have same distribution
2. density(X)([x1, x2, ..., xn]) # this will give density(X1)(x1) * density(X2)(x2) * ... * density(Xn)(xn)
3. # Similarly for probability and expectations methods |
I think rather than taking RVs, distributions should be taken. |
sympy/stats/tests/test_mixture_rv.py
Outdated
N = Normal("N", 0, 1) | ||
M = Normal('M', 1, 2) | ||
Z = Laplace('L', 3, 1) | ||
D = Mixture('D', [S(2)/10, S(5)/10, S(3)/10], [N, M, Z]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new Mixture RV, a user should be able to define a new distribution using, MixtureDistribution
and then pass it on to custom RV API. Mixture
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for accessing P, E, etc. we need an RV. Considering that the MixtureDIstribution is created, then we need an RV which will take the MixtureDistrbution as an argument and then able to change the pspace to any of the Finite, Continuous or Discrete Pspaces upon calling of compute_cdf
, compute_density
etc., so we need Mixture
RV IMO. The change can be:
>>> M = Mixture('M', [wt1, wt2, wt3, ...,], [dist1, dist2, dist3, ...,])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, Mixture
can be a function rather than a class. It can return a generic RV/ContinuousRV/DiscreteRV/FiniteRV depending on the type of PDF.
P.S. I saw that ContinuousRV generates only RVs with continuous PDF but what about PDFs which are neither continuous nor discrete? What will be returned in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think,
Mixture
can be a function rather than a class
Do you mean to remove the MixtureDistribution
/MixturePSpace
class?
I saw that ContinuousRV generates only RVs with continuous PDF but what about PDFs which are neither continuous nor discrete?
Presently, I have considered the 3 cases of Continuous, Discrete, and Finite. Should an error be raised if the distribution is not out of these 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should an error be raised if the distribution is not out of these 3?
No a generic random symbol can be returned having the custom PDF given by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to remove the
MixtureDistribution
/MixturePSpace
class?
Mixture
should be a constructor function returning generic RandomSymbol objects with the given mixture probability distribution.
I simply meant independent RVs, not identical, in the example above. |
@Upabjojr @czgdp1807 For implementing the Independent Random Symbol, should it be done in the following way? Continuing the #19886 (comment) >>> X = IndependentRandomSymbols('X', [X1, X2, X3,...,]) # where X1, X2, etc. are Independent Random Variables
>>> density(X)([x1, x2,...x3]) # product of respective pdfs
...
>>> E(X) # product of respective expectations
... |
Yes, that is a nice notation. Better in another PR, this one is already long enough. Maybe
On Wikipedia, the expected value of a multivariate random variables in the vector of the expected values of the components: |
I thought a bit over it... we are creating too many classes this way. We already have the |
I mean, this looks like it's already working: In [4]: N = Normal("N", 0, 1)
In [5]: P = Poisson("P", 3)
In [6]: m = Matrix([N, P])
In [7]: m
Out[7]:
Matrix([
[N],
[P]])
In [8]: E(m)
Out[8]:
Matrix([
[0],
[3]]) This is the correct behaviour of the expectation if |
OTOH, there is a bug: In [10]: a = Array([N, P])
In [11]: a
Out[11]: [N, P]
In [12]: E(a)
---------------------------------------------------------------------------
AttributeError: 'ImmutableDenseNDimArray' object has no attribute 'expand' |
In order to create a mixture distribution this way, one should: In [17]: B = Bernoulli("B", 0.3)
In [18]: m[B, 1]
---------------------------------------------------------------------------
ValueError: index out of boundary This is likely a bug in the class In case of multivariate RVs, create a @Smit-create maybe with this procedure we can avoid creating new classes. What do you think? |
Strangely enough, the same bug does not appear if we pass the Bernoulli RV as index to an array: In [22]: a[B]
Out[22]: [N, P][B] |
Expectation apparently works with In [23]: E(a[B])
Out[23]: 0.900000000000000
In [24]: 7/10*E(N) + 3/10*E(P)
Out[24]: 0.900000000000000 |
Yes, I understood the procedure with the Array, but I think density and E are working fine but not cdf: >>> N = Normal("N", 0, 1)
>>> M = Normal('M', 1, 2)
>>> Z = Laplace('L', 3, 1)
>>> wt = Binomial('B', 2, S(2)/10)
>>> E(a[wt])
11/25
>>> density(a[wt])(x)
2*sqrt(2)*exp(-1/8)*exp(x/4)*exp(-x**2/8)/(25*sqrt(pi)) + exp(-Abs(x - 3))/50 + 8*sqrt(2)*exp(-x**2/2)
/(25*sqrt(pi))
>>> cdf(a[wt])(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/smit/Smitlunagariya/sympy/sympy/stats/rv.py", line 950, in cdf
result = pspace(expr).compute_cdf(expr, **kwargs)
File "/home/smit/Smitlunagariya/sympy/sympy/stats/rv.py", line 489, in compute_cdf
raise ValueError("CDF not well defined on multivariate expressions")
ValueError: CDF not well defined on multivariate expressions Also, this works for the random variables, but not when we pass instances of distribution. I think that we couldn't allow custom weights to be passed directly as we would need to create a custom RV for indexing. |
Maybe |
It should be. But this will work if we have Rvs as an argument and not when Distributions as |
It seems like there are a bunch of approaches presented here but a consensus isn't reached. This should be there in ideas list of next |
References to other Issues or PRs
Discussions in #18730
Brief description of what is fixed or changed
Other comments
Release Notes
ping @czgdp1807 @Upabjojr @jmig5776