Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: override sf for rdist distribution #18586

Merged
merged 2 commits into from Jun 1, 2023
Merged

Conversation

OmarManzoor
Copy link
Contributor

Reference issue

Towards gh-17832

What does this implement/fix?

  • Override and add the sf method for rdist distribution
  • Add some tests

Additional information

CC: @mdhaber Could you kindly have a look to see if this change makes sense? If it looks feasible then we can add the reference distribution to set up the tests.

@OmarManzoor
Copy link
Contributor Author

image

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 200

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 541.0
x = np.logspace(-5, 10)
plt.loglog(x, stats.rdist.sf(x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(x, c), label="rdist sf pr", ls="dashdot")
plt.legend()
plt.show()

@j-bowhay j-bowhay added the enhancement A new feature or improvement label May 30, 2023
@mdhaber
Copy link
Contributor

mdhaber commented May 30, 2023

It makes sense. Go ahead and revise the tests, and we can probably merge. (Would you also show the plot for arguments 1-x where x = np.logspace(-16, -0.5), and show the curve for the mpmath implementation, too? That would show the important region more clearly.)

Alternatively, we could omit the custom test here. The generic tests confirms that this override is consistent with the rest of the distribution, and this is a very straightforward implementation of _sf. If we had an existing implementation with numerical difficulties that we were upgrading, I would see a need for a custom test to check that the problem had been resolved. But this is not fixing a problem specific to the distribution - it's the same cancellation error experienced by all 1 - cdf implementations - so it could be argued that no specific test is needed. (Instead, we should have some sort of generic test for all _sf overrides, which need not be added here.)

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 31, 2023

@mdhaber

Here is the plot for

c = 541.0
x = np.logspace(-15, -0.5)
x = 1 - x
mpmath_values = np.array([rdist_sf_mpmath(_x, c) for _x in x], np.float64)
plt.loglog(x, stats.rdist.sf(x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

image

@mdhaber
Copy link
Contributor

mdhaber commented May 31, 2023

This is what i was going for.

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 1000

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 10
x = np.logspace(-15, -0.5)
mpmath_values = np.array([rdist_sf_mpmath(1-_x, c) for _x in x], dtype=np.float64)
plt.loglog(x, stats.rdist.sf(1-x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(1-x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

image

This shows that for moderate values of c, there is a meaningful improvement because the default implementation, sf = 1 - cdf, turns values of the survival function less than ~1e-16 into zero, whereas the implementation that computes the SF directly does not.

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Jun 1, 2023

This is what i was going for.

from scipy import stats
import numpy as np
from time import perf_counter
import matplotlib.pyplot as plt
rng = np.random.default_rng()
from mpmath import mp
mp.dps = 1000

def rdist_sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    return float(mp.one - mp.betainc(c/2, c/2, 0, (x+1)/2, regularized=True))

def rdist_sf(x, c):
    return stats.beta._sf((x+1)/2, c/2, c/2)

c = 10
x = np.logspace(-15, -0.5)
mpmath_values = np.array([rdist_sf_mpmath(1-_x, c) for _x in x], dtype=np.float64)
plt.loglog(x, stats.rdist.sf(1-x, c), label="rdist sf main", ls="dashed")
plt.loglog(x, rdist_sf(1-x, c), label="rdist sf pr", ls="dashdot")
plt.loglog(x, mpmath_values, label="rdist mpmath", ls="dotted")
plt.legend()
plt.show()

image

This shows that for moderate values of c, there is a meaningful improvement because the default implementation, sf = 1 - cdf, turns values of the survival function less than ~1e-16 into zero, whereas the implementation that computes the SF directly does not.

Nice! I understand, thank you for the explanation. Should I remove the tests that I added for this?

@mdhaber
Copy link
Contributor

mdhaber commented Jun 1, 2023

I went ahead and checked the tests as they are. I think the first two would pass in main, so they aren't really needed, but having the second two can't hurt, even if they are mostly a test of the accuracy of beta.sf. I refactored

float(mp.one - mp.betainc(c/2, c/2, (x+1)/2, mp.one, regularized=True))

to

float(mp.betainc(c/2, c/2, (x+1)/2, mp.one, regularized=True))

because we might as well calculate the SF directly if we can.
In the future, please use ReferenceDistribution, but I'll go ahead and merge as-is now.

@mdhaber mdhaber merged commit b7c2cb5 into scipy:main Jun 1, 2023
1 check passed
@OmarManzoor OmarManzoor deleted the rdist_sf branch June 1, 2023 06:43
@j-bowhay j-bowhay added this to the 1.12.0 milestone Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants