ENH: override sf for rdist distribution #18586

OmarManzoor · 2023-05-30T08:53:40Z

Reference issue

What does this implement/fix?

Override and add the sf method for rdist distribution
Add some tests

Additional information

CC: @mdhaber Could you kindly have a look to see if this change makes sense? If it looks feasible then we can add the reference distribution to set up the tests.

OmarManzoor · 2023-05-30T08:55:37Z

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 200

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 541.0
x = np.logspace(-5, 10)
plt.loglog(x, stats.rdist.sf(x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(x, c), label="rdist sf pr", ls="dashdot")
plt.legend()
plt.show()

mdhaber · 2023-05-30T14:51:40Z

It makes sense. Go ahead and revise the tests, and we can probably merge. (Would you also show the plot for arguments 1-x where x = np.logspace(-16, -0.5), and show the curve for the mpmath implementation, too? That would show the important region more clearly.)

Alternatively, we could omit the custom test here. The generic tests confirms that this override is consistent with the rest of the distribution, and this is a very straightforward implementation of _sf. If we had an existing implementation with numerical difficulties that we were upgrading, I would see a need for a custom test to check that the problem had been resolved. But this is not fixing a problem specific to the distribution - it's the same cancellation error experienced by all 1 - cdf implementations - so it could be argued that no specific test is needed. (Instead, we should have some sort of generic test for all _sf overrides, which need not be added here.)

OmarManzoor · 2023-05-31T08:19:26Z

@mdhaber

Here is the plot for

c = 541.0
x = np.logspace(-15, -0.5)
x = 1 - x
mpmath_values = np.array([rdist_sf_mpmath(_x, c) for _x in x], np.float64)
plt.loglog(x, stats.rdist.sf(x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

mdhaber · 2023-05-31T19:09:41Z

This is what i was going for.

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 1000

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 10
x = np.logspace(-15, -0.5)
mpmath_values = np.array([rdist_sf_mpmath(1-_x, c) for _x in x], dtype=np.float64)
plt.loglog(x, stats.rdist.sf(1-x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(1-x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

This shows that for moderate values of c, there is a meaningful improvement because the default implementation, sf = 1 - cdf, turns values of the survival function less than ~1e-16 into zero, whereas the implementation that computes the SF directly does not.

OmarManzoor · 2023-06-01T05:52:50Z

This is what i was going for.

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 1000

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 10
x = np.logspace(-15, -0.5)
mpmath_values = np.array([rdist_sf_mpmath(1-_x, c) for _x in x], dtype=np.float64)
plt.loglog(x, stats.rdist.sf(1-x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(1-x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

This shows that for moderate values of c, there is a meaningful improvement because the default implementation, sf = 1 - cdf, turns values of the survival function less than ~1e-16 into zero, whereas the implementation that computes the SF directly does not.

Nice! I understand, thank you for the explanation. Should I remove the tests that I added for this?

scipy/stats/tests/test_distributions.py

[skip ci]

mdhaber · 2023-06-01T06:28:59Z

I went ahead and checked the tests as they are. I think the first two would pass in main, so they aren't really needed, but having the second two can't hurt, even if they are mostly a test of the accuracy of beta.sf. I refactored

float(mp.one - mp.betainc(c/2, c/2, (x+1)/2, mp.one, regularized=True))

to

float(mp.betainc(c/2, c/2, (x+1)/2, mp.one, regularized=True))

because we might as well calculate the SF directly if we can.
In the future, please use ReferenceDistribution, but I'll go ahead and merge as-is now.

ENH: override sf for rdist distribution

7dae97d

github-actions bot added the scipy.stats label May 30, 2023

j-bowhay added the enhancement A new feature or improvement label May 30, 2023

mdhaber reviewed Jun 1, 2023

View reviewed changes

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved

Update scipy/stats/tests/test_distributions.py

995a953

[skip ci]

mdhaber merged commit b7c2cb5 into scipy:main Jun 1, 2023
1 check passed

OmarManzoor deleted the rdist_sf branch June 1, 2023 06:43

j-bowhay added this to the 1.12.0 milestone Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: override sf for rdist distribution #18586

ENH: override sf for rdist distribution #18586

OmarManzoor commented May 30, 2023

OmarManzoor commented May 30, 2023

mdhaber commented May 30, 2023 •

edited

OmarManzoor commented May 31, 2023 •

edited

mdhaber commented May 31, 2023 •

edited

OmarManzoor commented Jun 1, 2023 •

edited

mdhaber commented Jun 1, 2023

ENH: override sf for rdist distribution #18586

ENH: override sf for rdist distribution #18586

Conversation

OmarManzoor commented May 30, 2023

Reference issue

What does this implement/fix?

Additional information

OmarManzoor commented May 30, 2023

mdhaber commented May 30, 2023 • edited

OmarManzoor commented May 31, 2023 • edited

mdhaber commented May 31, 2023 • edited

OmarManzoor commented Jun 1, 2023 • edited

mdhaber commented Jun 1, 2023

mdhaber commented May 30, 2023 •

edited

OmarManzoor commented May 31, 2023 •

edited

mdhaber commented May 31, 2023 •

edited

OmarManzoor commented Jun 1, 2023 •

edited