Skip to content

Conversation

@Smit-create
Copy link
Member

@Smit-create Smit-create commented May 17, 2020

Sampling from External Libraries of sympy random variables

References to other Issues or PRs

FIxes #19061
Fixes #16942
FIxes #8478
Fixes #6160
FIxes #7358

Brief description of what is fixed or changed

Initially, as decided in #18754, there is some improvement in the design as previously all the external sampling was handled by a single class. Instead, there will be separate classes for each library to avoid load on a single class. Also, this PR completes the whole sampling of stats from external libraries including CRV, FRV, and DRV.

Other comments

Release Notes

  • stats
    • Added Sampling from external libraries for all the random variables of sympy

ping @czgdp1807 @Upabjojr @jmig5776

@sympy-bot
Copy link

sympy-bot commented May 17, 2020

Hi, I am the SymPy bot (v158). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • stats
    • Added Sampling from external libraries for all the random variables of sympy (#19342 by @Smit-create)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.7.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->
Sampling from External Libraries of sympy random variables
#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
FIxes #19061
Fixes #16942
FIxes #8478
Fixes #6160
#### Brief description of what is fixed or changed
Initially, as decided in #18754, there is some improvement in the design as previously all the external sampling was handled by a single class. Instead, there will be separate classes for each library to avoid load on a single class. Also, this PR completes the whole sampling of stats from external libraries including CRV, FRV, and DRV.


#### Other comments


#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* stats
   * Added Sampling from external libraries for all the random variables of sympy
<!-- END RELEASE NOTES -->
ping @czgdp1807 @Upabjojr @jmig5776 

Update

The release notes on the wiki have been updated.

@codecov
Copy link

codecov bot commented May 17, 2020

Codecov Report

Merging #19342 into master will increase coverage by 0.031%.
The diff coverage is 32.824%.

@@              Coverage Diff              @@
##            master    #19342       +/-   ##
=============================================
+ Coverage   75.611%   75.643%   +0.031%     
=============================================
  Files          652       652               
  Lines       169634    169706       +72     
  Branches     40032     40058       +26     
=============================================
+ Hits        128263    128371      +108     
+ Misses       35758     35731       -27     
+ Partials      5613      5604        -9     

@Smit-create
Copy link
Member Author

This implementation works good for every CRV and FRV, I found only two failing examples in DRV:

  1. NegativeBinomial-> lambdify error
  2. Skellam-> Internal error in scipy..might be because of complex pdf

@czgdp1807
Copy link
Member

czgdp1807 commented May 17, 2020

I think @Upabjojr 's idea of using adaptors can work. Here's what I think,
Interface should accept random expression of sympy and sample it according to the base library selected and return only numpy arrays.
For class design a base class, Sampler can be made. Samplers specific to libraries can inherit from base class. For example, SciPySampler. Sampler can transfer the calls to the methods of it's child class. Here, Sampler will act as kind of adaptor.
Each sampler can have implementations according to the libraries they are using for sampling purposes. This can be easy to extend, maintain and is more decoupled.

@czgdp1807
Copy link
Member

Removing currently supported libraries have a downside that, users may have to modify their code a lot. And may be, one library can be having different algorithms(which can be better). So, removal of libraries narrows down use cases for the end user.

@Smit-create
Copy link
Member Author

Then, the previous implementation looks fine if we think of providing multiple libraries.
We need not change that as the return type of all three of them is numpy.ndarray. So, the previous implementation looks good:

>>> from sympy.stats import *
>>> N = Normal('N',0,1)
>>> resp=next(sample(N, library='pymc3', size=2))
>>> resp, resp.__class__
(array([0.28739264, 1.0677173 ]), <class 'numpy.ndarray'>)
>>> ress=next(sample(N, library='scipy', size=2))
>>> ress, ress.__class__
(array([-0.10553114, -1.55432616]), <class 'numpy.ndarray'>)
>>> resn=next(sample(N, library='numpy', size=2))
>>> resn, resn.__class__
(array([ 0.94539252, -0.68432651]), <class 'numpy.ndarray'>)

Then, the only change I suggest, is to use scipy custom distributions when library is scipy to avoid mechanically adding scipy rvs. Is it good?

@czgdp1807
Copy link
Member

Well, the problem with current design is that everything is stored in a single sampling class and that makes it a bit hard to maintain.

@Smit-create
Copy link
Member Author

So need to break into three classes?

@czgdp1807
Copy link
Member

czgdp1807 commented May 18, 2020

Yes, like a base sampler class which decides which one of it's child class should get the task of sampling.
In addition, can we return numpy.ndarray instead of iterator?

@Smit-create
Copy link
Member Author

In addition, can we return numpy.ndarray instead of iterator?

Yes, Should I?

@Smit-create
Copy link
Member Author

But then this may break the current API and the meaning of sample_iter

@czgdp1807
Copy link
Member

Let's decide on the return type later on. First improve the design.

@Smit-create
Copy link
Member Author

I have made a change for CRV, if this looks fine then will make similar to DRV and FRV

@czgdp1807
Copy link
Member

I think you can apply the same design for discrete and finite variables.

@Smit-create Smit-create changed the title [GSoC] Handle sampling from scipy [GSoC] Sampling from external libraries for all RVs May 20, 2020
@Smit-create
Copy link
Member Author

@czgdp1807 Does this require any further changes?

@czgdp1807
Copy link
Member

LGTM. Any thoughts @Upabjojr @jmig5776 ?

@czgdp1807
Copy link
Member

Will be merged tomorrow. Please express your opinions.

@Upabjojr
Copy link
Contributor

Upabjojr commented May 22, 2020

Just as a minor thought... Maybe separating the code into files by the external library used... But that's just a thought, no need to do it.

Or maybe better so.

@Upabjojr
Copy link
Contributor

I don't see the point in removing the class inheritance and implementing it manually with a dictionary. Too much added complexity for no clear gain.

@jmig5776
Copy link
Member

@czgdp1807 Looks good to me.

@Smit-create
Copy link
Member Author

@Upabjojr Is this good to go now?

@Smit-create Smit-create requested a review from Upabjojr May 25, 2020 10:48
@czgdp1807
Copy link
Member

Can we merge this by tomorrow?

@Upabjojr
Copy link
Contributor

Can we merge this by tomorrow?

If you want, yes.

@czgdp1807
Copy link
Member

I don't know why but a lot of lines are uncovered, https://codecov.io/gh/sympy/sympy/pull/19342/diff
@Smit-create Can you please check?

@Smit-create
Copy link
Member Author

Hi, I have covered all the lines and for all the RVs in the map. The reason here can be because of dependencies on external libraries, the tests might be skipped, so they are not reaching to their respective files. I observed the same with the CRV pr too earlier.

@Smit-create
Copy link
Member Author

Smit-create commented May 27, 2020

Locally running for sympy.stats.crv covers almost all the lines of diff,
Screenshot from 2020-05-27 11-09-59

@czgdp1807
Copy link
Member

Will be merged tomorrow.

@czgdp1807 czgdp1807 merged commit e4e24da into sympy:master May 28, 2020
@czgdp1807
Copy link
Member

Let's start discussion of simulating stochastic process, probably using external libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

5 participants