Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handmade distributions have the option not to check for validity #20120

Merged
merged 9 commits into from Oct 5, 2020

Conversation

Maelstrom6
Copy link
Contributor

@Maelstrom6 Maelstrom6 commented Sep 20, 2020

References to other Issues or PRs

Fixes #19959

Brief description of what is fixed or changed

(stats module)

Some handmade distributions raised errors saying that the density function did not integrate to 1. This was simply because the integrate function did not integrate to 1 exactly, but rather an expression that is simplifies to 1. I have given the option for the user to allow SymPy to check the validity.

Other comments

On request, I will add similar functionality for drv_types.py, frv_types.py and possibly some vector and matrix handmade distributions.

I can also add a .simplify() after integrating.

Release Notes

NO ENTRY

@sympy-bot
Copy link

sympy-bot commented Sep 20, 2020

Hi, I am the SymPy bot (v161). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

  • No release notes entry will be added for this pull request.
Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
Fixes #19959

#### Brief description of what is fixed or changed

(stats module)

Some handmade distributions raised errors saying that the density function did not integrate to 1. This was simply because the `integrate` function did not integrate to 1 exactly, but rather an expression that is simplifies to 1. I have given the option for the user to allow SymPy to check the validity.

#### Other comments

On request, I will add similar functionality for `drv_types.py`, `frv_types.py` and possibly some vector and matrix handmade distributions.

I can also add a `.simplify()` after integrating.

#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
NO ENTRY
<!-- END RELEASE NOTES -->

@codecov
Copy link

codecov bot commented Sep 20, 2020

Codecov Report

Merging #20120 into master will decrease coverage by 0.007%.
The diff coverage is 89.473%.

@@              Coverage Diff              @@
##            master    #20120       +/-   ##
=============================================
- Coverage   75.854%   75.847%   -0.008%     
=============================================
  Files          671       671               
  Lines       173815    173788       -27     
  Branches     41046     41065       +19     
=============================================
- Hits        131847    131814       -33     
+ Misses       36221     36220        -1     
- Partials      5747      5754        +7     

@Smit-create
Copy link
Member

Thanks for the PR!
This looks good to me. I request you to add the same of drv and frv.

@Maelstrom6
Copy link
Contributor Author

joint_rv_types.py has a completely different setup for JointRV. Currently, JointRV does not check for validity of the distribution.

@Maelstrom6 Maelstrom6 closed this Sep 22, 2020
@Maelstrom6 Maelstrom6 reopened this Sep 22, 2020
@Smit-create
Copy link
Member

@czgdp1807 @oscarbenjamin Please have a look if this can be merged.

@@ -74,6 +75,14 @@ def DiscreteRV(symbol, density, set=S.Integers):
set : set
Represents the region where the pdf is valid, by default is real line.

Other Parameters
================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have a section called "Other parameters".

The expected sections are listed here:
https://docs.sympy.org/latest/documentation-style-guide.html#docstring-sections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added it under the "Parameters" section in that case.

I found an error in the guide: they state that one needs an extra section for Returns but Returns is not part of the list of the valid sections. Probably not a big deal though.

@oscarbenjamin
Copy link
Contributor

I'm not sure it's necessary to have a check at all but if it is then shouldn't we just fix the check so it doesn't incorrectly return False?

We could use e.g. Eq(expr, 1) is S.false rather than expr == 1.

@Maelstrom6
Copy link
Contributor Author

Maelstrom6 commented Oct 2, 2020

I'm not sure it's necessary to have a check at all but if it is then shouldn't we just fix the check so it doesn't incorrectly return False?

We could use e.g. Eq(expr, 1) is S.false rather than expr == 1.

I'll add that as well. But I fear that for some complicated distributions, Integrate might not be able to be simplified, and users will be stuck with the situation now.

Edit: Never mind. I see that you changed the condition now. But maybe Integrate could still hang. Users could be stuck on the check for ever.

@oscarbenjamin
Copy link
Contributor

It's probably better not to use integrate by default anyway. Checking should be something that a user chooses to do if it isn't guaranteed to be cheap. Or perhaps there could be an API that "normalises" a pdf to produce a distribution.

@Maelstrom6
Copy link
Contributor Author

Checking should be something that a user chooses to do if it isn't guaranteed to be cheap.

This is the goal of this PR. My handmade distributions had this problem.

Or perhaps there could be an API that "normalises" a pdf to produce a distribution.

I think this could be done by the users on their own since its only 1 line of work:
new_pdf = old_pdf/integrate(old_pdf, (x, set))

@oscarbenjamin
Copy link
Contributor

I think this could be done by the users on their own since its only 1 line of work:

Then we could just remove the checking. I'd rather do that than have a "check" argument.

@Maelstrom6
Copy link
Contributor Author

What if the default was not to check? I'm sure some users indirectly rely on the check.

@oscarbenjamin
Copy link
Contributor

How would they rely on the check? Doesn't it just throw an exception (incorrectly some of the time).

@Maelstrom6
Copy link
Contributor Author

How would they rely on the check? Doesn't it just throw an exception (incorrectly some of the time).

One uses of a custom RV is so that a developer can design an API for clients that can input their own distributions. The developer might assume that these checks are done by sympy and has a try-catch ready for incorrect user input. If we change this, their code might break since their functions are running with an invalid distribution.

Even though the checks fail for complicated distributions, they don't fail for most of the usual ones.

@oscarbenjamin
Copy link
Contributor

Is that a hypothetical use case or do you know of anyone doing that?

I don't think we need to worry about checking. It should be up to users to pass in valid input because sympy can not check that in general. We can provide a function for checking or otherwise an API that explicitly states that it will normalise a pdf.

@Maelstrom6
Copy link
Contributor Author

Is that a hypothetical use case or do you know of anyone doing that?

I am the only one I know doing this. This is how I found out about the problem.

I can add that extra normalising functionality as an extra keyword arg in this PR. I can also set the default for the check to false. Should I do these two things?

@oscarbenjamin
Copy link
Contributor

I'll let someone more familiar with stats and how it is used comment. My views here are just in general for sympy:

  1. It's usually better to provide separate functions for things rather than having boolean flags.
  2. Expensive operations like integration should not be enabled by default unless strictly necessary.
  3. We should not use **kwargs when we can use an explicit keyword argument (now that Python 2 isn't supported any more)

@czgdp1807
Copy link
Member

2. Expensive operations like integration should not be enabled by default unless strictly necessary.

Agreed.

sympy/stats/crv_types.py Outdated Show resolved Hide resolved
sympy/stats/frv_types.py Outdated Show resolved Hide resolved
sympy/stats/drv_types.py Outdated Show resolved Hide resolved
@@ -355,7 +355,7 @@ def compute_moment_generating_function(self, **kwargs):
"""
x, t = symbols('x, t', real=True, cls=Dummy)
pdf = self.pdf(x)
mgf = integrate(exp(t * x) * pdf, (x, -oo, oo))
mgf = integrate(exp(t * x) * pdf, (x, self.set))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@czgdp1807
Copy link
Member

LGTM. Will merge tomorrow if no objections.

@czgdp1807 czgdp1807 merged commit a0a7add into sympy:master Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error with handmade Gamma Distribution (Easy Fix)
5 participants