Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: override halflogistic sf and isf #18002

Merged
merged 5 commits into from
Mar 7, 2023

Conversation

dschmitz89
Copy link
Contributor

@dschmitz89 dschmitz89 commented Feb 19, 2023

Reference issue

#17832

What does this implement/fix?

Overrides survival function and inverse survival function for the halflogistic distribution. Using a little algebra (or Wolfram Alpha for convenience) we get:
$SF(x)=1-CDF(x)=1-\tanh(x/2)=2/(1+e^x)=2\text{expit}(-x)$
$ISF(q)=-\text{logit}(q/2)$

Additional information

The current survival function fails at about $x=38$.

halflogistic_survival

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import special as sc


x = np.linspace(0, 400, 200)
plt.semilogy(x, stats.halflogistic.sf(x), label="main", ls="dashed")
plt.semilogy(x, 2 * sc.expit(-x), label="PR", ls="dotted")
plt.legend()
plt.title("Halflogistic survival function")
plt.show()

Inverse survival function: greatly improved range for small values.

halflogistic_inverse_survival

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import special as sc


x = np.logspace(-40, -10, 200)
plt.semilogx(x, stats.halflogistic.isf(x), label="main", ls="dashed")
plt.semilogx(x, -sc.logit(0.5 * x), label="PR", ls="dotted")
plt.legend()
plt.title("Halflogistic inverse survival function")
plt.show()

@dschmitz89 dschmitz89 added scipy.stats enhancement A new feature or improvement labels Feb 19, 2023
Copy link
Member

@WarrenWeckesser WarrenWeckesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new _sf works great, but for _isf, it looks like we have to play whack-a-mole on the relative error. Here's a plot of the relative error of the original version of isf and the version in this pull request. The pull request fixes the large errors near 0, but it has significantly worse relative error than the default for inputs near 1. If you zoom in, you can see that a good threshold for switching between the versions is right in the middle, p=0.5.

halflogistic_isf_relerr

Note: the script uses mpsci.distribution.half_logistic, a very recent addition to mpsci.

script that generates the plot

import numpy as np
from scipy import stats
from scipy.special import logit
from mpmath import mp
from mpsci.distributions import half_logistic
import matplotlib.pyplot as plt


p = np.linspace(1e-5, 1, 10000, endpoint=False)
# x1 = stats.halflogistic.isf(p)
x1 = 2*np.arctanh(1 - p)  # Same as stats.halflogistic.isf(p) in main.
x2 = -logit(0.5*p)

mp.dps = 100
x = np.array([float(half_logistic.invsf(t)) for t in p])

plt.plot(p, np.abs(x1 - x)/x, alpha=0.35, label='original')
plt.plot(p, np.abs(x2 - x)/x, alpha=0.35, label='pull request')
plt.title('half-logistic isf relative error')
plt.xlabel('p')
plt.grid()
plt.legend(framealpha=1, shadow=True)
plt.ylim(-1e-15, 4e-14)
plt.show()

# from mpmath import mp
# mp.dps = 50
# def sf_mpmath(x):
# mp.dps = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this--you already set dps two lines above.

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Mar 4, 2023

Thanks for the suggestions @WarrenWeckesser , the relative error of isf is much lower now. The remaining test failures are unrelated.

Relative error using the new formulation
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import special as sc

@np.vectorize
def isf_mpmath(x):
    x = mp.mpf(x)
    x = x/mp.mpf(2.)
    return float(-mp.log(x/(mp.one - x)))

def isf_special(x):
    return np.where(x < 0.5, -sc.logit(0.5 * x),
                    2*np.arctanh(1 - x))

x = np.linspace(0, 1, 100000)
mpmath_isf = np.array(isf_mpmath(x), dtype=np.float64)
result_special = isf_special(x)

plt.plot(x, np.abs((result_special - mpmath_isf)/mpmath_isf), label="arctanh")
plt.title("Relative error: halflogistic inv. survival function")

image

Copy link
Member

@WarrenWeckesser WarrenWeckesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll make a tiny tweak to a few lines of commented code and merge.

scipy/stats/tests/test_distributions.py Outdated Show resolved Hide resolved
@WarrenWeckesser WarrenWeckesser merged commit 81f84c8 into scipy:main Mar 7, 2023
@WarrenWeckesser WarrenWeckesser added this to the 1.11.0 milestone Mar 7, 2023
@dschmitz89 dschmitz89 deleted the halflogistic_sf branch July 18, 2023 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants