Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed DiscreteUniform to produce correct density and cdf #18614

Merged
merged 7 commits into from
Feb 13, 2020

Conversation

Smit-create
Copy link
Member

@Smit-create Smit-create commented Feb 9, 2020

Fixed DiscreteUniform to produce correct density and cdf

References to other Issues or PRs

Fixes #18611

Brief description of what is fixed or changed

Other comments

Release Notes

  • stats
    • DiscreteUniform raises ValueError for duplicate args

@sympy-bot
Copy link

sympy-bot commented Feb 9, 2020

Hi, I am the SymPy bot (v149). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.6.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->
Fixed DiscreteUniform to produce correct density and cdf
#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
Fixes #18611

#### Brief description of what is fixed or changed


#### Other comments


#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* stats
  * `DiscreteUniform` raises ValueError for duplicate args
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@Smit-create
Copy link
Member Author

Smit-create commented Feb 9, 2020

Please review, failing test on Travis but locally it passes:

________________________________________________________________________________

 sympy/tensor/tests/test_tensor_operators.py:test_expand_partial_derivative_constant_factor_rule 

Traceback (most recent call last):

  File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/sympy-1.6.dev0-py3.6.egg/sympy/testing/runtests.py", line 1328, in _timeout

    function()

  File "/home/travis/virtualenv/python3.6.7/lib/python3.6/site-packages/sympy-1.6.dev0-py3.6.egg/sympy/tensor/tests/test_tensor_operators.py", line 203, in test_expand_partial_derivative_constant_factor_rule

    nneg*PartialDerivative(A(i), D(j))

AssertionError

 tests finished: 1326 passed, 1 failed, 77 skipped, 185 expected to fail, 

3 expected to fail but passed, in 1258.65 seconds 

DO *NOT* COMMIT!

Traceback (most recent call last):

  File "<stdin>", line 4, in <module>

Exception: Tests failed

@sylee957
Copy link
Member

I guess it's a random test failing.

@codecov
Copy link

codecov bot commented Feb 10, 2020

Codecov Report

Merging #18614 into master will increase coverage by 0.063%.
The diff coverage is 100%.

@@              Coverage Diff              @@
##            master    #18614       +/-   ##
=============================================
+ Coverage   75.533%   75.597%   +0.063%     
=============================================
  Files          644       644               
  Lines       167360    167545      +185     
  Branches     39440     39492       +52     
=============================================
+ Hits        126413    126659      +246     
+ Misses       35426     35367       -59     
+ Partials      5521      5519        -2

@sylee957
Copy link
Member

I wonder what about sanitizing the inputs when creating the classes.

@Smit-create
Copy link
Member Author

sanitizing the inputs

I haven't use it because according to this comment DiscreteUnifromDistribution was made to handle general things.

sympy/stats/frv_types.py Outdated Show resolved Hide resolved
@smichr
Copy link
Member

smichr commented Feb 11, 2020

Should the result for [1,1,1,2,2] be {1:3/5, 2:2/5}? The documentation is ambiguous: does 'input set' mean 'finite collection' or 'finite set'. If it allows duplicates then that suggests the former. And if that is the case and the distribution is "uniform over the collection" then those that appear more often will have a greater probability of appearing.

@Smit-create
Copy link
Member Author

I think the DiscreteUniform here was made to work general in the sense to work fine with not only numbers but also symbols. Following the wikipedia, It has an interval as a input, but the current implementation is a bit different from that in wikipedia. Also, IMO input set here tries to indicate the probability distributed unifromly over the unique elements of the input set (finite set), with density as {1: 1/2, 2: 1/2}.

@czgdp1807
Copy link
Member

May be the following can be overridden in DiscreteUnifromDistribution.

class SingleFiniteDistribution(Basic, NamedArgsMixin):
    def __new__(cls, *args):
        args = list(map(sympify, args))
        return Basic.__new__(cls, *args)

The above can reduce the amortized cost of computing results.

@smichr
Copy link
Member

smichr commented Feb 11, 2020

What type of object would you use to give a distribution that has probabilities {1:2/3, 2:1/3}? (I ask because I am not versed in the stats module.)

I would have made this change

diff --git a/sympy/stats/frv_types.py b/sympy/stats/frv_types.py
index d5b3560..61e72ec 100644
--- a/sympy/stats/frv_types.py
+++ b/sympy/stats/frv_types.py
@@ -97,7 +97,10 @@ def p(self):
     @property  # type: ignore
     @cacheit
     def dict(self):
-        return dict((k, self.p) for k in self.set)
+        d = {k: 0 for k in self.set}
+        for k in self.args:
+            d[k] += self.p
+        return d

to give

>>> from sympy.stats import *
>>> Z = DiscreteUniform('Z', [1, 1, 1, 2, 2, 3, 3, 3, 4])
>>> dict(density(Z).items())
{1: 1/3, 2: 2/9, 3: 1/3, 4: 1/9}

@@ -92,7 +92,7 @@ def FiniteRV(name, density):
class DiscreteUniformDistribution(SingleFiniteDistribution):
@property
def p(self):
return Rational(1, len(self.args))
return Rational(1, len(set(self.args)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Rational(1, len(set(self.args)))
return Rational(1, len(self.set))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will make that change.

@sylee957
Copy link
Member

I still think that the issues will ultimately remain if values like sin(x)**2 + cos(x)**2 and 1 are supplied simultaneously.

@smichr
Copy link
Member

smichr commented Feb 12, 2020

issues will ultimately remain

are you commenting on the right issue?

@sylee957
Copy link
Member

Yes. How would the density like this be.
Z = DiscreteUniform('Z', [x, 1, sin(y)**2 + cos(y)**2])
Should it be 1/3 each? Or should it be 1/2?

@Smit-create
Copy link
Member Author

What type of object would you use to give a distribution that has probabilities {1:2/3, 2:1/3}? (I ask because I am not versed in the stats module.)

I would have made this change

diff --git a/sympy/stats/frv_types.py b/sympy/stats/frv_types.py
index d5b3560..61e72ec 100644
--- a/sympy/stats/frv_types.py
+++ b/sympy/stats/frv_types.py
@@ -97,7 +97,10 @@ def p(self):
     @property  # type: ignore
     @cacheit
     def dict(self):
-        return dict((k, self.p) for k in self.set)
+        d = {k: 0 for k in self.set}
+        for k in self.args:
+            d[k] += self.p
+        return d

to give

>>> from sympy.stats import *
>>> Z = DiscreteUniform('Z', [1, 1, 1, 2, 2, 3, 3, 3, 4])
>>> dict(density(Z).items())
{1: 1/3, 2: 2/9, 3: 1/3, 4: 1/9}

I think this should not work this way because, in DiscreteUniform , the probability is not dependent on the frequency of the data, and is distributed uniformly on the input set(finite set).
To obtain density as {1:2/3, 2:1/3} you can use:

>>> from sympy.stats import *
>>> from sympy import S
>>> X = FiniteRV('X', {1: S(2)/3, 2: S(1)/3})
>>> dict(density(X).items())
{1: 2/3, 2: 1/3}
>>> cdf(X)
{1: 2/3, 2: 1}
>>> Y = Bernoulli('Y', p=S(2)/3, succ=1, fail=2)
>>> dict(density(Y).items())
{1: 2/3, 2: 1/3}
>>> cdf(Y)
{1: 2/3, 2: 1}

@Smit-create
Copy link
Member Author

Smit-create commented Feb 12, 2020

How would the density like this be.
Z = DiscreteUniform('Z', [x, 1, sin(y)**2 + cos(y)**2])

On the current branch it produces density as {1: 1/3, x: 1/3, sin(y)**2 + cos(y)**2: 1/3}
But this can be fixed by overriding __new__ as suggested by @czgdp1807 .

diff --git a/sympy/stats/frv_types.py b/sympy/stats/frv_types.py
index a0500228d8..3c112e01dd 100644
--- a/sympy/stats/frv_types.py
+++ b/sympy/stats/frv_types.py
@@ -18,8 +18,8 @@
 
 import random
 
-from sympy import (S, sympify, Rational, binomial, cacheit, Integer,
-                   Dummy, Eq, Intersection, Interval,
+from sympy import (Basic, S, sympify, Rational, binomial, cacheit, Integer,
+                   Dummy, Eq, Intersection, Interval, simplify,
                    Symbol, Lambda, Piecewise, Or, Gt, Lt, Ge, Le, Contains)
 from sympy import beta as beta_fn
 from sympy.external import import_module
@@ -90,9 +90,14 @@ def FiniteRV(name, density):
     return rv(name, FiniteDistributionHandmade, density)
 
 class DiscreteUniformDistribution(SingleFiniteDistribution):
+    def __new__(cls, *args):
+        args = list(map(sympify, args))
+        args = list(set(map(simplify, args)))
+        return Basic.__new__(cls, *args)
+
     @property
     def p(self):
-        return Rational(1, len(set(self.args)))
+        return Rational(1, len(self.set))
 
     @property  # type: ignore
     @cacheit

From this diff, I get the following results:

>>> from sympy.stats import *
>>> from sympy import sin, cos
>>> from sympy.abc import x,y
>>> Z = DiscreteUniform('Z', [x, 1, sin(y)**2 + cos(y)**2])
>>> dict(density(Z).items())
{1: 1/2, x: 1/2}

If this change is good and should I commit it?

@smichr
Copy link
Member

smichr commented Feb 12, 2020

If this change is good and should I commit it?

I would definitely advise against this. A distribution should be over things and those things should be math-agnostic.

Seeing that there is a way to give a distribution for a multiset via FiniteRV then I would raise an error in DiscreteUniform if len(set(args)) != len(args) and advise them to use FiniteRV, something like:

ValueError(filldedent('''
Repeated args detected but set expected. If you want a
distribution that has different weightings for each item
consider using DiscreteFinite(%s, %s)''' % symbol, multiset(args))

@Smit-create
Copy link
Member Author

Sure! Then I would change accordingly.

@sylee957
Copy link
Member

Okay, I’d agree to this direction, to raise errors for some weird usage

@smichr
Copy link
Member

smichr commented Feb 12, 2020

Your suggestion in the error string would not have helped them get a weighted distribution.

@Smit-create
Copy link
Member Author

Smit-create commented Feb 13, 2020

Thanks @smichr for adding the error message. The build is failing some irrelevant error

@smichr
Copy link
Member

smichr commented Feb 13, 2020

The error message still looks bad unless the newline is inserted afterwards. Now it looks like this:

ValueError:
Repeated args detected but set expected. For a distribution having
different weights for each item use the following:
FiniteRV(X, {a: 1/2, b: 1/3, c: 1/6}).

instead of

ValueError:
Repeated args detected but set expected. For a distribution having
different weights for each item use the following: FiniteRV(X, {a: 
1/2, b: 1/3, c: 1/6}).

@smichr smichr merged commit fcefd30 into sympy:master Feb 13, 2020
@Smit-create
Copy link
Member Author

Thanks @smichr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect density, cdf calculation in DiscreteUniform
6 participants