Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: entropy calculations failing for large values #18093

Closed
9 of 12 tasks
Tracked by #18435
dschmitz89 opened this issue Mar 4, 2023 · 23 comments
Closed
9 of 12 tasks
Tracked by #18435

BUG: entropy calculations failing for large values #18093

dschmitz89 opened this issue Mar 4, 2023 · 23 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats task A straightforward change, verification or fix.
Milestone

Comments

@dschmitz89
Copy link
Contributor

dschmitz89 commented Mar 4, 2023

Describe your issue.

For a few distributions, the entropy calculation currently fails for large values. As an example, consider the gamma distribution:
for $x>1e12$, the computed entropy starts to oscillate and after approx. 2e16, returns just 0.

image

Affected distributions and current PRs that aim to fix it:

The reason for the failures are catastrophic cancellations occuring in the subtraction of logarithmic beta/gamma functions betaln/gammaln on one side and the digamma function psi or simply log on the other side.

This can be avoided by using asymptotic expansions instead of calling the functions from scipy.special. See #17930 for examples.

The expansions for $\psi$ and $\ln(\Gamma)$ used in PRs such as #17929 are:

$$ \begin{align} \psi(x) & \sim\ln(x) - \frac{1}{2x} - \frac{1}{12x^2}+\frac{1}{120x^4}-\frac{1}{252x^6} \\ \ln(\Gamma(x)) &\sim x\ln(x)-x-\frac{1}{2}\ln(x)+\frac{1}{2}\ln(2\pi)+\frac{1}{12x}-\frac{1}{360x^3}+\frac{1}{1260x^5} \end{align} $$

For the beta function, expansions are listed on Wikipedia:

If $x$ and $y$ are large:

$$ \begin{align} B(x, y) & \sim\sqrt{2\pi}\frac{x^{x-1/2}y^{y-1/2}}{(x+y)^{x+y-1/2}} \\ \ln(B(x, y)) & \sim \ln(\sqrt{2\pi})+(x-1/2)\ln(x)+(y-1/2)\ln(y)-(x+y-1/2)\ln(x+y) \end{align} $$

If one argument is large (here y, same holds for large x as betaln is symmetric):

$$ \begin{align} B(x, y) & \sim\Gamma(x)y^{-x} \\ \ln(B(x, y)) & \sim \ln(\Gamma(x)) - x\ln(y) \\ \end{align} $$

This issue can keep track of the necessary fixes.

Reproducing Code Example

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from mpmath import mp

mp.dps = 50

@np.vectorize
def gamma_entropy_reference(x):
    x = mp.mpf(x)
    return mp.digamma(x) * (mp.one - x) + x + mp.loggamma(x)

x = np.logspace(5, 20, 1000)
reference_entropy = np.array(gamma_entropy_reference(x), np.float64)
plt.semilogx(x, stats.gamma.entropy(x), label="scipy main")
plt.semilogx(x, reference_entropy, label="mpmath")
plt.legend(loc='upper left')
plt.xlabel("$x$")
plt.ylabel("$h$")
plt.ylim(0, 40)
plt.show()

Error message

See plot above

SciPy/NumPy/Python version and system information

current main branch
@dschmitz89 dschmitz89 added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats task A straightforward change, verification or fix. labels Mar 4, 2023
@MatteoRaso
Copy link
Contributor

If one argument is large (here y, same holds for large x as betaln is symmetric):

Just a heads up that this formula is wrong. You used the expansion used for large x on gammaln, when x is defined to not be large.

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Mar 11, 2023

If one argument is large (here y, same holds for large x as betaln is symmetric):

Just a heads up that this formula is wrong. You used the expansion used for large x on gammaln, when x is defined to not be large.

Thanks for finding, corrected it!

@mdhaber
Copy link
Contributor

mdhaber commented Mar 14, 2023

BTW technically these implementations don't really need to use _lazywhere. It looks like if...else would work fine because the distribution infrastructure (apparently) applies np.vectorize to _entropy before it is called, so _entropy only ever gets passed scalar arguments. But go ahead and keep using _lazywhere for now; the new infrastructure will take advantage of the native vectorization. (No need for a PR or an enhancement request. I'll track this in gh-15928, which I will address this summer,)

When this issue is ready to be closed, it would be good to take one more pass to clean up a few things:

  • PEP8 issues found after merge
  • Inconsistent use of _lazywhere. For instance, t entropy has df == np.inf: at the top, which makes it obvious that either the if statement will raise an error or _lazywhere is unnecessary. I'd suggest just removing the if statement, since the asymptotic expansion makes it unnecessary.
  • Ensure that overflow warnings are avoided (e.g. raise to negative powers rather than dividing by positive powers)
  • ValueError: Integers to negative integer powers are not allowed. are avoided
  • Take a look for obvious simplifications. For instance, I suggested np.log(2) + np.log(np.pi) in at least one PR because that's how it came out of mpmath, but np.log(2*np.pi) is simpler, faster, and not in danger of causing any problems.
  • Take another look at the comments. There are some comments that show part of the asymptotic expansion of gammaln and psi, but not enough terms to actually derive the formula. (Additional terms were added after the comment was written, and the comment wasn't updated.) I think it would be better to just link back to this issue or to Wolfram Alpha than to provide incomplete information.

Individually, these are mostly insignificant. But if we're going to take the time to do all this, we might as well do it right.

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Mar 26, 2023

@MatteoRaso : in case you are also still looking into this. I would like to use this to get more familiar with sympy, so if you dont mind I would finish the last two cases :).

Anyone interested is welcome to help fix this issue.

@MatteoRaso
Copy link
Contributor

@MatteoRaso : in case you are also still looking into this. I would like to use this to get more familiar with sympy, so if you dont mind I would finish the last two cases :).

Yes, that sounds good. I'm a bit busy right now, so I wouldn't be able to handle this for a while, anyways.

@OmarManzoor
Copy link
Contributor

Hi @dschmitz89
Is there any among these that I could work on?

@dschmitz89
Copy link
Contributor Author

Hi @dschmitz89 Is there any among these that I could work on?

Yes definitely! I guess that multivariate_t is the easiest one to start out with. In the univariate case, Wolfram Alpha gave a very neat formula directly without any symbolic math effort of ourselves in #18903.

@OmarManzoor
Copy link
Contributor

Hi @dschmitz89 Is there any among these that I could work on?

Yes definitely! I guess that multivariate_t is the easiest one to start out with. In the univariate case, Wolfram Alpha gave a very neat formula directly without any symbolic math effort of ourselves in #18903.

I see thank you. I'll have a look at that then as a start.

@OmarManzoor
Copy link
Contributor

OmarManzoor commented May 12, 2023

Hi @dschmitz89 Is there any among these that I could work on?

Yes definitely! I guess that multivariate_t is the easiest one to start out with. In the univariate case, Wolfram Alpha gave a very neat formula directly without any symbolic math effort of ourselves in #18903.

I see thank you. I'll have a look at that then as a start.

@dschmitz89 I tried using wolfram alpha and the output it returns is quite long. Could you kindly have a look to see if I have inputted this correctly. Wolfram alpha link
I left out the last term + 0.5 * shape_info.log_pdet in the original formula when creating the equation.

Also when comparing the results with mpmath I am not sure how we can handle this shape_info.log_pdet using mpmath.

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented May 12, 2023

Hi @dschmitz89 Is there any among these that I could work on?

Yes definitely! I guess that multivariate_t is the easiest one to start out with. In the univariate case, Wolfram Alpha gave a very neat formula directly without any symbolic math effort of ourselves in #18903.

I see thank you. I'll have a look at that then as a start.

@dschmitz89 I tried using wolfram alpha and the output it returns is quite long. Could you kindly have a look to see if I have inputted this correctly. Wolfram alpha link I left out the last term + 0.5 * shape_info.log_pdet in the original formula when creating the equation.

Also when comparing the results with mpmath I am not sure how we can handle this shape_info.log_pdet using mpmath.

I think in your input wolfram alpha did not recognize the digamma function. I tried this out, maybe the result works better?

image

I would not worry about the 0.5 * shape_info.log_pdet term. Just use an identity matrix for the tests, then shape_info.log_pdet=0.

@OmarManzoor
Copy link
Contributor

@dschmitz89 Thank you! This certainly looks much better.

@OmarManzoor
Copy link
Contributor

OmarManzoor commented May 17, 2023

Hi @dschmitz89

If we want to handle the beta entropy

def _entropy(self, a, b):
    return (sc.betaln(a, b) - (a - 1) * sc.psi(a) -
            (b - 1) * sc.psi(b) + (a + b - 2) * sc.psi(a + b))

do we need to consider each of the three conditions separately for the expansions as mentioned in the description i.e.

  • both a and b are large
  • a is large
  • b is large

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented May 17, 2023

Hi @dschmitz89

If we want to handle the beta entropy

def _entropy(self, a, b):
    return (sc.betaln(a, b) - (a - 1) * sc.psi(a) -
            (b - 1) * sc.psi(b) + (a + b - 2) * sc.psi(a + b))

do we need to consider each of the three conditions separately for the expansions as mentioned in the description i.e.

  • both a and b are large
  • a is large
  • b is large

Edit: I think we only need two cases. One if both arguments are large and one if only one argument is large. The formula is kind of "symmetric" so $a\to\infty$ yields the same formula as $b\to\infty$ if $a$ and $b$ are switched. CC @OmarManzoor

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jun 16, 2023

Hi @dschmitz89
If we want to handle the beta entropy

def _entropy(self, a, b):
    return (sc.betaln(a, b) - (a - 1) * sc.psi(a) -
            (b - 1) * sc.psi(b) + (a + b - 2) * sc.psi(a + b))

do we need to consider each of the three conditions separately for the expansions as mentioned in the description i.e.

  • both a and b are large
  • a is large
  • b is large

Edit: I think we only need two cases. One if both arguments are large and one if only one argument is large. The formula is kind of "symmetric" so a→∞ yields the same formula as b→∞ if a and b are switched. CC @OmarManzoor

@dschmitz89
For the case where only one of the arguments is large the expansion for betaln involves the gammaln term. So we need to substitute the expansion of gammaln as well right? Actually I tried using both simple gammaln and its expansion and I cannot get as close as expected to mpmath when for example a=1.0 and b=1e50.

This is the code when I also add the expansion of gammaln.

Code

import sympy

from sympy import log, pi, sqrt, simplify, expand, Rational

x, y, a, b = sympy.symbols("x, y, a, b")

def digamma(x):
    return log(x) - 1/(2*x) - 1/(12*x**2) + 1/(120*x**4)

def gammaln(x):
    return x*log(x) - x - Rational(1,2)*log(x) + Rational(1,2)*log(2*pi) + 1/(12*x) - 1/(360*x**3) + 1/(1260*x**5)

def betaln(x, y):
    return gammaln(x) - x*log(y)

old = betaln(a, b) - (a - 1)*digamma(a) - (b - 1)*digamma(b) + (a + b - 2)*digamma(a+b)

ref = simplify(simplify(simplify(old)))

ref

Screenshot 2023-06-16 at 11 18 34 AM

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from mpmath import mp
mp.dps = 500
def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - 1) * mp.digamma(a) - (b - 1) * mp.digamma(b) + (a + b - 2) * mp.digamma(a + b)
    return float(entropy)

def asymptotic_one_large(a, b):
    return (
        - a*np.log(b) + a*np.log(a + b) - a - a*(a + b)**-2./12 + a*(a + b)**-4./120
        - b*np.log(b) + b*np.log(a + b) - b*(a + b)**-2./12 + b*(a + b)**-4./120
        + np.log(a)/2 + np.log(b) - 2*np.log(a + b) + 1/2 + np.log(2*np.pi)/2 + 1/(a + b)
        + (a + b)**-2./6 - (a + b)**-4./60 - 5/(12*b) - b**-2./12
        - b**-3./120 + b**-4./120 - 1/(3*a) - a**-2./12 - a**-3./90
        + a**-4./120 + a**-5./1260
    )

a, b = 1.0, 1e50
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>>-114.12896691014839
>>>-114.12925464970229

@dschmitz89
Copy link
Contributor Author

@OmarManzoor
I think the problem is that the asymptotic expansion is very wrong for digamma(a) at a=1. Only for b it should be applied here, for a the regular digamma function is correct.

@OmarManzoor
Copy link
Contributor

@OmarManzoor I think the problem is that the asymptotic expansion is very wrong for digamma(a) at a=1. Only for b it should be applied here, for a the regular digamma function is correct.

Thanks.

@OmarManzoor
Copy link
Contributor

@OmarManzoor I think the problem is that the asymptotic expansion is very wrong for digamma(a) at a=1. Only for b it should be applied here, for a the regular digamma function is correct.

I tried this out by ignoring the - (a-1)*digamma(a) in the expansion

import sympy

from sympy import log, pi, sqrt, simplify, expand, Rational

x, y, a, b = sympy.symbols("x, y, a, b")

def digamma(x):
    return log(x) - 1/(2*x) - 1/(12*x**2) + 1/(120*x**4)

def gammaln(x):
    return x*log(x) - x - Rational(1,2)*log(x) + Rational(1,2)*log(2*pi) + 1/(12*x) - 1/(360*x**3)
    
def betaln(x, y):
    return gammaln(x) - x*log(y)

#old_1 = betaln(a, b) - (a - 1)*digamma(a) - (b - 1)*digamma(b) + (a + b - 2)*digamma(a+b)
old = betaln(a, b) - (b - 1)*digamma(b) + (a + b - 2)*digamma(a+b)

ref = simplify(simplify(simplify(old)))
ref
Screenshot 2023-06-16 at 7 50 54 PM

I simplified this expression and added the - (a - 1) * psi(a) at the end

def asymptotic_one_large(a, b):
    sm = a + b
    return (
        a*np.log(a) - a*(np.log(b) + 1) + (np.log(2*np.pi) - np.log(a)) / 2
        + (sm - 2) * (-1/(2*sm) - sm**-2.0/12 + np.log(sm) + sm**-4.0/120)
        - (b - 1) * (np.log(b) - 1/(2*b) - b**-2.0/12 + b**-4.0/120)
        + 1/(12*a) - a**-3.0/360
        - (a - 1) * psi(a)
    )

def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - 1) * mp.digamma(a) - (b - 1) * mp.digamma(b) + (a + b - 2) * mp.digamma(a + b)
    return float(entropy)

a, b = 1.0, 1e50
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>>0.08055555555555555
>>>-114.12925464970229

The overall results seem to be really off. Could you kindly have a look?

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Jun 16, 2023

@OmarManzoor I think the problem is that the asymptotic expansion is very wrong for digamma(a) at a=1. Only for b it should be applied here, for a the regular digamma function is correct.

I tried this out by ignoring the - (a-1)*digamma(a) in the expansion

import sympy

from sympy import log, pi, sqrt, simplify, expand, Rational

x, y, a, b = sympy.symbols("x, y, a, b")

def digamma(x):
    return log(x) - 1/(2*x) - 1/(12*x**2) + 1/(120*x**4)

def gammaln(x):
    return x*log(x) - x - Rational(1,2)*log(x) + Rational(1,2)*log(2*pi) + 1/(12*x) - 1/(360*x**3)
    
def betaln(x, y):
    return gammaln(x) - x*log(y)

#old_1 = betaln(a, b) - (a - 1)*digamma(a) - (b - 1)*digamma(b) + (a + b - 2)*digamma(a+b)
old = betaln(a, b) - (b - 1)*digamma(b) + (a + b - 2)*digamma(a+b)

ref = simplify(simplify(simplify(old)))
ref
Screenshot 2023-06-16 at 7 50 54 PM I simplified this expression and added the - (a - 1) * psi(a) at the end
def asymptotic_one_large(a, b):
    sm = a + b
    return (
        a*np.log(a) - a*(np.log(b) + 1) + (np.log(2*np.pi) - np.log(a)) / 2
        + (sm - 2) * (-1/(2*sm) - sm**-2.0/12 + np.log(sm) + sm**-4.0/120)
        - (b - 1) * (np.log(b) - 1/(2*b) - b**-2.0/12 + b**-4.0/120)
        + 1/(12*a) - a**-3.0/360
        - (a - 1) * psi(a)
    )

def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - 1) * mp.digamma(a) - (b - 1) * mp.digamma(b) + (a + b - 2) * mp.digamma(a + b)
    return float(entropy)

a, b = 1.0, 1e50
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))
> >>>0.08055555555555555
>>>-114.12925464970229

The overall results seem to be really off. Could you kindly have a look?

We are confusing the arguments with each other. For large $b$, $\ln(\Gamma(a))$ must not be expanded. Let*s look at the original expansion from wiki:

image

That means here we need to do:

$$ \begin{align} \lim_{b\to\infty} h_{\beta}(a, b) & = \lim_{b\to\infty} \ln(B(a, b)) - (a -1)\psi(a) - (b-1)\psi(b) + (a+b-2)\psi(a+b) \\\ & = \ln(\Gamma(a)) - a\ln(b) -(a-1)\psi(a)-(b-1)\psi_{exp}(b) + (a+b -2)\psi_{exp}(a+b) \end{align} $$

where $\psi_{exp}$ denotes the asymptotic expansion of $\psi$.

@OmarManzoor
Copy link
Contributor

@dschmitz89

I tried this as well but it just does not seem to match. Here is the code

from scipy.special import psi
from scipy.special import gammaln

def asymptotic_one_large(a, b):
    s = a + b
    t1 = gammaln(a) - a*np.log(b) - (a - 1)*psi(a)
    t2 = np.log(b) - b*np.log(b) - 1/(2*b) + 1/(12*b) - b**-2./12
    t3 = a*np.log(s) + b*np.log(s) - 2*np.log(s) - 1/(12*s) + 1/s + s**-2./6
    return t1 + t2 + t3

def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - 1) * mp.digamma(a) - (b - 1) * mp.digamma(b) + (a + b - 2) * mp.digamma(a + b)
    return float(entropy)

a, b = 2.0, 1e20
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>>0.0
>>>-44.47448619497938

By the way if I define asymptotic_one_large like this by returning the value at once instead of keeping it in t1,t2,t3 and then adding, the result is completely different

def asymptotic_one_large(a, b):
    s = a + b
    return (
        gammaln(a) - a*np.log(b) - (a - 1)*psi(a)
        + np.log(b) - b*np.log(b) - 1/(2*b) + 1/(12*b) - b**-2./12
        + a*np.log(s) + b*np.log(s) - 2*np.log(s) - 1/(12*s) + 1/s + s**-2./6
    )

a, b = 2.0, 1e20
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>>-92.10340371976183
>>>-44.47448619497938

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Jun 20, 2023

@OmarManzoor sorry for all this back and forth. I found a way around the catastrophic cancellations that occur in the logarithmic terms finally.

from scipy.special import psi
from scipy.special import gammaln
from mpmath import mp
import numpy as np

mp.dps = 100

def asymptotic_one_large(a, b):
    s = a + b
    simple_terms = gammaln(a) - (a -1)*psi(a) - 1/(2*b) + 1/(12*b) - b**-2./12 - 1/(12*s) + 1/s + s**-2./6
    difficult_terms = s * np.log1p(a/b) + np.log(b) - 2 * np.log(s) #np.log(b) * (1 - s) + (s - 2) * np.log(s)

    return simple_terms + difficult_terms

def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - mp.one) * mp.digamma(a) - (b - mp.one) * mp.digamma(b) + (a + b - 2*mp.one) * mp.digamma(a + b)
    return float(entropy)

a, b = 20, 1e20
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>> -43.151773525282245
>>> -43.15177352528225

The trick is to write $\ln(a+b)-\ln(b)=\ln((b+a)/b)=\ln(1+a/b)=log1p(a/b))$
That way the above labeled "difficult terms" can be calculated accurately:

a, b = 20., 1e20
s = a+b
np.log(b) * (1 - s) + (s - 2) * np.log(s), s * np.log1p(a/b) + np.log(b) - 2 * np.log(s)

>>> (0.0, -26.051701859880907)

@OmarManzoor
Copy link
Contributor

@OmarManzoor sorry for all this back and forth. I found a way around the catastrophic cancellations that occur in the logarithmic terms finally.

from scipy.special import psi
from scipy.special import gammaln
from mpmath import mp
import numpy as np

mp.dps = 100

def asymptotic_one_large(a, b):
    s = a + b
    simple_terms = gammaln(a) - (a -1)*psi(a) - 1/(2*b) + 1/(12*b) - b**-2./12 - 1/(12*s) + 1/s + s**-2./6
    difficult_terms = s * np.log1p(a/b) + np.log(b) - 2 * np.log(s) #np.log(b) * (1 - s) + (s - 2) * np.log(s)

    return simple_terms + difficult_terms

def beta_entropy_mpmath(a, b):
    a = mp.mpf(a)
    b = mp.mpf(b)
    entropy = mp.log(mp.beta(a, b)) - (a - mp.one) * mp.digamma(a) - (b - mp.one) * mp.digamma(b) + (a + b - 2*mp.one) * mp.digamma(a + b)
    return float(entropy)

a, b = 20, 1e20
print(asymptotic_one_large(a, b))
print(beta_entropy_mpmath(a, b))

>>> -43.151773525282245
>>> -43.15177352528225

The trick is to write ln⁡(a+b)−ln⁡(b)=ln⁡((b+a)/b)=ln⁡(1+a/b)=log1p(a/b)) That way the above labeled "difficult terms" can be calculated accurately:

a, b = 20., 1e20
s = a+b
np.log(b) * (1 - s) + (s - 2) * np.log(s), s * np.log1p(a/b) + np.log(b) - 2 * np.log(s)

>>> (0.0, -26.051701859880907)

Very nice! Thank you for this.

@mdhaber
Copy link
Contributor

mdhaber commented Jun 28, 2023

I was curious whether anyone could share an example use of the numerical value of the entropy of a distribution. The concept of differential entropy is used all over, of course, but I haven't been able to find examples where the entropy of a named distribution (like one of those named above) is used. Thanks for considering it!

@dschmitz89
Copy link
Contributor Author

Closing for some time until more reviewer capacity is available.

@dschmitz89 dschmitz89 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 7, 2023
@mdhaber mdhaber added this to the 1.12.0 milestone Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats task A straightforward change, verification or fix.
Projects
None yet
Development

No branches or pull requests

4 participants