Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Dirichlet, Ewens Distributions #16576

Merged
merged 14 commits into from May 15, 2019

Conversation

Projects
None yet
9 participants
@czgdp1807
Copy link
Member

commented Apr 6, 2019

References

[1] GSoC Proposal by @czgdp1807
[2] GSoC Proposal by @ritesh99rakesh

Brief description of what is fixed or changed

MultivariateBetaDistribution has been added to sympy.stats.joint_rv_types. More distributions will be added after reviewing the previous one added.

Other comments

I want to use Beta in sympy.stats.crv_types for making random symbols following MultivariateBetaDistribution as is Normal being used for MultivariateNormalDistribution. However, the reason restricting me for using Beta is the difference in the way parameters are passed to the Beta function and MultivariateBeta function. The former one accepts alpha, beta as arguments and the latter one accepts, a single vector or pythonically, list, alpha. One solution in my mind is to use, Beta(name, *alpha), for both univariate and multivariate Beta distributions. The length of and type of *alpha can be used as the deciding factor for returning either return multivariate_rv or rv.

Edit - As said in this comment below, I think using Beta for MultivariateBeta or Dirichlet distribution is not a good idea because, most people use Dirichlet to denote MultivariateBetaDistribution, so mixing up the two distributions is not valid. However, both, MultivariateBeta and Dirichlet can be used to denote MultivariateBetaDistributions .

Release Notes

  • stats
    • multivariate distributions added to sympy.stats.joint_rv_types

czgdp1807 added some commits Apr 6, 2019

Added MultivariateBetaDistribution
MultivariateBetaDistribution has been added
to sympy.stats.joint_rv_types. Tests and
Documentation will be added in the next commit.
@sympy-bot

This comment has been minimized.

Copy link

commented Apr 6, 2019

Hi, I am the SymPy bot (v147). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • stats
    • multivariate distributions added to sympy.stats.joint_rv_types (#16576 by @czgdp1807)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.5.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234". See
https://github.com/blog/1506-closing-issues-via-pull-requests . Please also
write a comment on that issue linking back to this pull request once it is
open. -->
[1] [GSoC Proposal](https://docs.google.com/document/d/1oIeaROiJyglpbris7X1uZPRE5ZeO0pD1ygCFhBIeATI/edit#heading=h.wb8kk2gus2ng) by @czgdp1807 
[2] [GSoC Proposal](https://github.com/ritesh99rakesh/gsocProposal/blob/master/SymPy.md) by @ritesh99rakesh 

#### Brief description of what is fixed or changed
MultivariateBetaDistribution has been added to sympy.stats.joint_rv_types. More distributions will be added after reviewing the previous one added. 

#### Other comments
I want to use `Beta` in `sympy.stats.crv_types` for making random symbols following `MultivariateBetaDistribution` as is `Normal` being used for `MultivariateNormalDistribution`. However, the reason restricting me for using `Beta` is the difference in the way parameters are passed to the `Beta` function and `MultivariateBeta` function. The former one accepts `alpha`, `beta` as arguments and the latter one accepts, a single vector or pythonically, list, `alpha`. One solution in my mind is to use, `Beta(name, *alpha)`, for both univariate and multivariate Beta distributions. The length of and type of `*alpha` can be used as the deciding factor for returning either return `multivariate_rv` or `rv`. 

Edit - As said in [this comment](https://github.com/sympy/sympy/pull/16576#issuecomment-491218778) below, I think using `Beta` for `MultivariateBeta` or `Dirichlet` distribution is not a good idea because, most people use `Dirichlet` to denote `MultivariateBetaDistribution`, so mixing up the two distributions is not valid. However, both, `MultivariateBeta` and `Dirichlet` can be used to denote `MultivariateBetaDistributions` . 

#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* stats
  * multivariate distributions added to sympy.stats.joint_rv_types
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@Sc0rpi0n101

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2019

The build notes say you have to add a test for the new class in test_args.py

@codecov

This comment has been minimized.

Copy link

commented Apr 7, 2019

Codecov Report

Merging #16576 into master will increase coverage by 0.022%.
The diff coverage is 100%.

@@              Coverage Diff              @@
##            master    #16576       +/-   ##
=============================================
+ Coverage   73.849%   73.871%   +0.022%     
=============================================
  Files          619       619               
  Lines       159669    159708       +39     
  Branches     37476     37482        +6     
=============================================
+ Hits        117915    117979       +64     
+ Misses       36294     36268       -26     
- Partials      5460      5461        +1
@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented Apr 11, 2019

@jksuom @Upabjojr Please review this.
It would be great if we can discuss on the other comments section in the description.

@czgdp1807 czgdp1807 changed the title [WIP] Addition of Multivariable Distributions [WIP][GSoC] Addition of Multivariable Distributions May 8, 2019

@sidhantnagpal
Copy link
Member

left a comment

Seems like a good start. I would have expected MultivariateBeta('B', a_1, a_2, ...) to work, but I would wait to see what someone with more experience in stats thinks about the API.

Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

@sidhantnagpal I have made the changes.
About API:
I also agree that MultivariateBeta('B', a_1, a_2, ...) should work. Therefore, I have added the provision of both passing the parameters in the form of list as well as in the format that you suggested. I think @Upabjojr will also agree to such an API. If not, then please let me know which API should I go for?

@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 9, 2019

@Abdullahjavednesar Please add the GSoC label on this PR.


def check(self, alpha):
_value_check(len(alpha) >= 2, "At least two categories should be passed.")
_value_check(all([a_k > 0 for a_k in alpha]), "Each concentration parameter"

This comment has been minimized.

Copy link
@Upabjojr

Upabjojr May 10, 2019

Contributor

this will raise an exception if a_k is symbolic. a_k > 0 creates a GreaterThan object which cannot be cast to a boolean.

I usually do (a_k > 0) == True.

This comment has been minimized.

Copy link
@jksuom

jksuom May 10, 2019

Member

I think that _value_check should be able to handle also Relational type conditions. It has if condition == False. But all, on the other hand, cannot do that.

This comment has been minimized.

Copy link
@Upabjojr

Upabjojr May 10, 2019

Contributor

Maybe (a_k > 0) != False is the correct condition. You would probably allow generic symbols.

This comment has been minimized.

Copy link
@Upabjojr

Upabjojr May 10, 2019

Contributor
for a_k in alpha:
    _value_check(a_k > 0, "...")

This comment has been minimized.

Copy link
@czgdp1807

czgdp1807 May 10, 2019

Author Member

@jksuom @Upabjojr I think in _value_check the error message should be raised when condition == None apart from condition == False. As in the following session, theta.is_positive is None but it will get through checks which require theta to be positive and not negative.

Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sympy import *
>>> theta = symbols('theta', real = True)
>>> theta.is_positive
>>> theta.is_negative
>>> theta.is_real
True

PS - I haven't tried the change but may be it can require change at lots of places but I believe that efforts required hold little value when concepts need to be corrected. Though that is just my point of view.

This comment has been minimized.

Copy link
@smichr

smichr May 14, 2019

Member

If _value_check is made to return True if it doesn't raise then this can be written if all(_value_check(a_k < 0, msg) for a_k in alpha):

Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
Show resolved Hide resolved sympy/stats/tests/test_joint_rv.py Outdated
@Upabjojr

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

This distribution is usually referred to as Dirichlet distribution. Is there a reason why you prefer multivariate beta?

Could you please add some references in the documentations (e.g. a link to Wikipedia or MathWorld)?

@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

@Upabjojr Thank you for the reviews. From yours and Sidhant's reviews, I conclude that the current API being used is fine.
This line allows both Dirichlet and MultivariateBeta. Wikipedia of Dirichlet Distribution suggests that alternative name being used is MultivariateBeta, so I thought of using both. I will add more references in the documentation.

Added Ewens Distribution
Multivariate Ewens Distribution has been added
to sympy.stats.joint_rv_types.Few changes to limit
format have also been made in marginal_distribution
of discrete joint distribution.
@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

@Upabjojr Please review the changes I have made in the latest commit.
Brief Summary of Changes:

  1. Added Multivariate Ewens Distribution. Few unused imports are there, I will remove them when I will add the next multivariate distribution.
  2. I have changed the format of limits being passed to summation while computing marginal_distributions of discrete joint distributions.

@czgdp1807 czgdp1807 changed the title [WIP][GSoC] Addition of Multivariable Distributions [GSoC] Addition of Dirichlet, Ewens Distributions May 11, 2019

@czgdp1807 czgdp1807 force-pushed the czgdp1807:multivar_dists branch from cd9cad7 to 4f7b879 May 11, 2019

czgdp1807 added some commits May 11, 2019

@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 11, 2019

@sidhantnagpal Changes have been made according to the reviews. If there is anything else to change then please comment.
Rest of the distributions will be added in two more PRs.

Show resolved Hide resolved sympy/stats/tests/test_joint_rv.py Outdated
Show resolved Hide resolved sympy/stats/tests/test_joint_rv.py Outdated
Show resolved Hide resolved sympy/stats/joint_rv_types.py Outdated
def test_MultivariateEwens():
from sympy.stats.joint_rv_types import MultivariateEwens
n, theta = symbols('n theta', positive = True)
theta_f = symbols('t_f', negative = True)

This comment has been minimized.

Copy link
@sidhantnagpal

sidhantnagpal May 11, 2019

Member

negative=True and similar.

This comment has been minimized.

Copy link
@sidhantnagpal

sidhantnagpal May 11, 2019

Member

Test(s) for symbolic n and theta with valid assumptions should also be included.

This comment has been minimized.

Copy link
@czgdp1807

czgdp1807 May 12, 2019

Author Member

@sidhantnagpal Tests can be added for symbolic theta but for n it is not possible. The reason for my claim is based on the fact that we can determine n from the the length of syms. So, one way or the other n is an Integer and not a Symbol and I thought of accepting a fixed n from the user rather than figuring out what n is from the length of syms.

@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

@sylee957 Please add the GSoC label here too. Thanks.

@sylee957 sylee957 added the GSoC label May 12, 2019

@czgdp1807 czgdp1807 changed the title [GSoC] Addition of Dirichlet, Ewens Distributions Addition of Dirichlet, Ewens Distributions May 12, 2019

@Upabjojr

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

Master needs to be merged.

czgdp1807 added some commits May 13, 2019

@czgdp1807

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

@sidhantnagpal @Upabjojr Are there any more changes to make apart from the un-evaluated lists? If not, then I think for the time being we can go for only, Integer dimensions and once something like ArrayComprehension is developed we can make separate PR for improving on that part.

@Upabjojr Upabjojr merged commit 496e75c into sympy:master May 15, 2019

3 checks passed

codecov/project 73.871% (target 0%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
sympy-bot/release-notes The release notes look OK
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.