Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: override dweibull distribution survival and inverse survival function #18232

Merged
merged 12 commits into from
Apr 13, 2023

Conversation

dschmitz89
Copy link
Contributor

Reference issue

Part of #17832

What does this implement/fix?

Overrides sf and isf for the double Weibull distribution.

Additional information

For both functions, the available range and precision improved for positive values.

Survival function

image

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x=np.logspace(0, 30, 10000)
c = 1e-1
plt.loglog(x, stats.dweibull.sf(x, c), label="main")
plt.loglog(x, np.where(x > 0, 0.5 * np.exp(-abs(x)**c), 1 - 0.5 * np.exp(-abs(x)**c)), label="PR")
plt.legend()
plt.title(f"Dweibull survival function: $c={c}$")
plt.show()

image

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from mpmath import mp

mp.dps = 100

x=np.logspace(0, 30, 10000)
@np.vectorize
def sf_mpmath(x, c):
    x = mp.mpf(x)
    c = mp.mpf(c)
    if x > 0:
        return mp.mpf(0.5) * mp.exp(-x**c)
    else:
        return mp.one - mp.mpf(0.5) * mp.exp(-(-x)**c)

def sf_pr(x, c):
    return np.where(x > 0,
                    0.5 * np.exp(-abs(x)**c),
                    1 - 0.5 * np.exp(-abs(x)**c))

c = 0.1

x = np.logspace(0, 30, 10000)
main = stats.dweibull.sf(x, c)
pr = sf_pr(x, c)
ref = np.array(sf_mpmath(x, c), np.float64)

plt.loglog(x, np.abs(main - ref)/ref, label="main", alpha=0.5)
plt.loglog(x, np.abs(pr - ref)/ref, label="PR", alpha=0.5)
plt.title(f"Dweibull survival function relative error: $c={c}$")
plt.show()

Inverse survival function

image

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def dweibull_isf(q, c):
    fac = 2. * np.where(q <= 0.5, q, 1. - q)
    fac = np.power(-np.log(fac), 1.0 / c)
    return np.where(q > 0.5, -fac, fac)

q = np.logspace(-20, -10, 1000)
plt.loglog(q, stats.dweibull.isf(q, c), label="main", ls="dashed")
plt.loglog(q, dweibull_isf(q, c), label="PR", ls="dashdot")
plt.legend()
plt.title(f"Dweibull inverse survival function: $c={c}$")
plt.show()

image

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from mpmath import mp

mp.dps = 50

@np.vectorize
def dweibull_isf_mpmath(q, c):
    q = mp.mpf(q)
    c = mp.mpf(c)
    if q <= mp.mpf(0.5):
        fac = mp.mpf(2.) * q
    else:
        fac = mp.mpf(2.) * (mp.one - q)
    fac = (-mp.log(fac))**(mp.one / c)
    #print(fac)
    if q > mp.mpf(0.5):
        return float(-fac)
    else:
        return float(fac)

q = np.logspace(-20, 0, 1000)
c = 0.1
ref_vals = dweibull_isf_mpmath(q, c)
ref = np.array(ref_vals, np.float64)
main = stats.dweibull.isf(q, c)
pr = dweibull_isf(q, c)

plt.loglog(q, np.abs(main - ref)/main, label="main", alpha=0.5)
plt.loglog(q, np.abs(pr - ref)/main, label="PR", alpha=0.5)
plt.title(f"Dweibull inverse survival function relative error: $c={c}$")
plt.show()

@dschmitz89 dschmitz89 added scipy.stats enhancement A new feature or improvement labels Apr 2, 2023
@dschmitz89 dschmitz89 added this to the 1.11.0 milestone Apr 2, 2023
@mdhaber
Copy link
Contributor

mdhaber commented Apr 2, 2023

The dweibull survival function is half the weibull_min survival function for positive arguments.

import numpy as np
from scipy import stats
x, c = stats.uniform.rvs(size=(2, 10))
np.testing.assert_allclose(stats.dweibull(c).sf(x), stats.weibull_min(c).sf(abs(x))/2)

To simplify, can you use the private functions of weibull_min (e.g. weibull_min._sf) as we did for _entropy to make this improvement? (And if the accuracy is not as good, the weibull_min function is really what needs to be improved.)
If you're willing, defining logsf would probably be good while our attention is here. Then maybe _sf could just exponentiate _logsf.

For tests, rather than comparing against reference values from mpmath, we only need to test that the dweibull functions follow those of weibull_min, assuming weibull_min is well tested. If not, the mpmath tests can be added to weibull_min, since it's the more commonly used and fundamental distribution.

Comment on lines 1735 to 1743
def _sf(self, x, c):
Cx1 = 0.5 * np.exp(-abs(x)**c)
return np.where(x > 0, Cx1, 1 - Cx1)

def _isf(self, q, c):
fac = 2. * np.where(q <= 0.5, q, 1. - q)
fac = np.power(-np.log(fac), 1.0 / c)
return np.where(q > 0.5, -fac, fac)

Copy link
Contributor

@mdhaber mdhaber Apr 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _sf(self, x, c):
Cx1 = 0.5 * np.exp(-abs(x)**c)
return np.where(x > 0, Cx1, 1 - Cx1)
def _isf(self, q, c):
fac = 2. * np.where(q <= 0.5, q, 1. - q)
fac = np.power(-np.log(fac), 1.0 / c)
return np.where(q > 0.5, -fac, fac)
def _sf(self, x, c):
p = weibull_min._sf(x, c) / 2
return np.where(x > 0, p, 1 - p)
def _isf(self, p, c):
i = (p <= 0.5)
p = 2 * np.where(i, p, 1 - p)
q = weibull_min._isf(p, c)
return np.where(i, q, -q)

or it's OK to follow the (unfortunate) naming conventions where _isf accepts argument q.

This fails tests until _sf is overridden for weibull_min. Following the style of its ppf,

    def _isf(self, q, c):
        return pow(-np.log(q), 1.0/c)

or

    def _isf(self, p, c):
        return (-np.log(p))**(1.0/c)

would be fine, too. I don't know how the inverse cdf/sf methods started accepting q as the input. p would make more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not accept and silently revert changes in the future, please.

Comment on lines 6354 to 6371
# reference values were computed with mpmath
# from mpmath import mp
# mp.dps = 50
# def sf_mpmath(x, c):
# x = mp.mpf(x)
# c = mp.mpf(c)
# if x > 0:
# return mp.mpf(0.5) * mp.exp(-x**c)
# else:
# return mp.one - mp.mpf(0.5) * mp.exp(-(-x)**c)
# for the inverse survival function tests, swap ref and x

@pytest.mark.parametrize('x, c, ref',
[(1e20, 0.1, 1.8600379880103705e-44),
(1e5, 0.5, 2.306726997904701e-138)])
def test_sf_isf(self, x, c, ref):
assert_allclose(stats.dweibull.sf(x, c), ref, rtol=5e-14)
assert_allclose(stats.dweibull.isf(ref, c), x, rtol=5e-14)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert this to a test for weibull_min.sf/isf. The test for dweibull.sf/isf can follow the style of test_entropy. There the comment doesn't need to be repeated, though.

Comment on lines 6359 to 6364
@pytest.mark.parametrize('x, c, ref', [(50, 1, 1.9287498479639178e-22),
(1000, 0.8,
8.131269637872743e-110)])
def test_sf_isf(self, x, c, ref):
assert_allclose(stats.weibull_min.sf(x, c), ref, rtol=1e-14)
assert_allclose(stats.weibull_min.isf(ref, c), x, rtol=1e-14)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for the inverse survival function fail in main.

Copy link
Contributor

@mdhaber mdhaber Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. You'll need to override weibull_min._sfweibull_min._isf with the same idea as you originally had for dweibull._sfdweibull._isf.

Copy link
Contributor Author

@dschmitz89 dschmitz89 Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand exactly: weibull_min's sf method is already overwritten and very accurate.

Copy link
Contributor

@mdhaber mdhaber Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. As you might have guessed, I meant _isf, since we're talking about the inverse survival function. That is not overridden. (I tried it locally when I made the suggestions for what to change about dweibull, and overriding it made the existing dweibull tests pass.

@@ -6349,6 +6349,20 @@ def test_fit_min(self):
ref = np.mean(rvs), stats.skew(rvs)
assert_allclose(res, ref)

# reference values were computed via mpmath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with

class WeibullMin(ReferenceDistribution):

    def __init__(self, *, c):
        super().__init__(c=c)

    def _pdf(self, x, c):
        return c*x**(c-mp.one)*mp.exp(-x**c)

@mdhaber mdhaber merged commit c62be9c into scipy:main Apr 13, 2023
@mdhaber mdhaber mentioned this pull request May 27, 2023
5 tasks
@dschmitz89 dschmitz89 deleted the dweibull_sf_isf branch July 18, 2023 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants