New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: don't override sf for loguniform/reciprocal distribution #18614
Conversation
@mdhaber I think the _sf method is not being picked when we call loguniform.sf. That's why the tests are failing. |
Code from scipy import stats
import numpy as np
import mpmath
from mpmath import mp
import matplotlib.pyplot as plt
rng = np.random.default_rng()
mp.dps = 50
class Reciprocal(ReferenceDistribution):
"""
Also the LogUniform distribution
"""
def __init__(self, *, a, b):
super().__init__(a=a, b=b)
def _pdf(self, x, a, b):
return mp.one / (x * (mp.log(b) - mp.log(a)))
def _cdf(self, x, a, b):
return (mp.log(x) - mp.log(a)) / (mp.log(b) - mp.log(a))
def loguniform_sf(x, a, b):
return (np.log(b) - np.log(x)) / (np.log(b) - np.log(a))
a, b = 1.0, 5.0
x = np.logspace(-3, 2)
log_uniform = Reciprocal(a=a, b=b)
mpmath_values = np.array([log_uniform.sf(_x) for _x in x], np.float64)
plt.loglog(x, stats.loguniform.sf(x, a, b), label="loguniform main", ls="dashed")
plt.loglog(x, loguniform_sf(x, a, b), label="loguniform pr", ls="dotted")
plt.loglog(x, mpmath_values, label="loguniform mpmath", ls="dashdot")
plt.legend()
plt.show() |
I think the plot shows that the survival function in main has a maximum of 1, right? That is the maximum of any survival function. If the other formulations exceed that, it is not an advantage or a disadvantage; I assume the formulation just doesn't inherently return 0 outside the support of the distribution. Please look for improvement near the right end of the support. There is unlikely to be a problem with |
I see. That means that the difference between mpmath, pr with main towards the left is not significant. I think we can close this PR then? |
Not necessarily. Look near the right side of the support. All distributions without custom |
Okay thank you, I will try that. |
I tried various values for a and b using ranges for x but the right side of the curve aligns very well between main and mpmath/pr. Do you have any suggestions for values I could try which might show an improvement? |
It turns out that I don't. The PR doesn't really avoid catastrophic cancellation after all. You need a more accurate way to calculate import numpy as np
from scipy import stats, special
import matplotlib.pyplot as plt
from mpmath import mp
from scipy.stats.tests.test_generation.reference_distributions import ReferenceDistribution
mp.dps = 100
class Reciprocal(ReferenceDistribution):
def __init__(self, *, a, b):
super().__init__(a=a, b=b)
def _pdf(self, x, a, b):
return mp.one / (x * (mp.log(b) - mp.log(a)))
def _sf(self, x, a, b):
res = (mp.log(b) - mp.log(x)) / (mp.log(b) - mp.log(a))
return res
def loguniform_sf(x, a, b): # PR
return (np.log(b) - np.log(x)) / (np.log(b) - np.log(a))
def loguniform_sf2(x, a, b): # "PR2"
def log_delta(a, b):
# Wolfram Alpha "asymptotic expansion log(b)-log(b-x) as x approaches 0"
x = b - a
i = np.arange(1, 20)[:, np.newaxis]
return np.sum(x ** i / (i * b ** i), axis=0)
@np.vectorize
def log_delta2(a, b):
# log_delta with logsumexp trick to prevent overflow for large x
x = b - a
i = np.arange(1, 100)
return np.exp(
special.logsumexp(i * np.log(x) - np.log(i) - i * np.log(b)))
return log_delta2(x, b)/(np.log(b) - np.log(a))
a, b = 1, 5
x = np.logspace(-13, -1)
# a, b = 1e-21, 1e-10
# x = np.logspace(-20, -11)
log_uniform = Reciprocal(a=a, b=b)
mpmath = log_uniform.sf(b-x, dtype=object)
main = stats.loguniform.sf(b-x, a, b)
pr = loguniform_sf(b-x, a, b)
pr2 = loguniform_sf2(b-x, a, b)
err_main = np.abs((main - mpmath) / mpmath).astype(np.float64)
err_pr = np.abs((pr - mpmath) / mpmath).astype(np.float64)
err_pr2 = np.abs((pr2 - mpmath) / mpmath).astype(np.float64)
plt.loglog(x, err_main, label="main", ls="dashdot")
plt.loglog(x, err_pr, label="PR", ls="dotted")
plt.loglog(x, err_pr2, label="PR2")
plt.legend()
plt.show() I'd suggest changing this PR to replace the custom If you're interested in doing more of these, I'd suggest going systematically through the distributions to prioritize those that really need improvement. Also, before implementing the override, please write tests (that don't pass in main) so that the goal of the override is explicit. This will give us some assurance that the improvement will justify the effort that goes into it. Not everything you study will lead to a PR, but the work is still useful if you keep track of it in gh-17832 (or a document linked from there). |
Thank you for the detailed explanation. I understand that we can leave out the addition of sf for this distribution. |
scipy/stats/_continuous_distns.py
Outdated
Details related to the decision of not defining | ||
the surival function for this distribution can be | ||
found in the PR: https://github.com/scipy/scipy/pull/18614 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a (code) comment to developers reviewing the code, not users reviewing the documentation. Users would likely take this to mean that the survival function of this distribution is not implemented, which is not what we mean to convey. They don't need to know whether _sf
is overridden; only developers do.
I adjusted the label to |
Reference issue
Towards: gh-17832
What does this implement/fix?
Additional information
cc: @mdhaber