BUG: stats.tukeylambda: Bad behavior of the `cdf()` and `sf()` methods in the tails. #21370

WarrenWeckesser · 2024-08-12T20:41:58Z

Describe your issue.

When lam < 0, as x gets larger, tukeylambda.cdf(x, lam) saturates at 0.9999999999999929 instead of 1 and tukeylambda.sf(x, lam) levels off at 7.105427357601002e-15 instead of decreasing to 0. tukeylambda.pdf(x, lam) levels off at 5.9894274089194295e-22 in the positive and negative tails.

The source of the problem is in the bisection method that is used in special.tklmbda, which can be found in https://github.com/scipy/scipy/blob/main/scipy/special/xsf/cephes/tukey.h. That code needs to be refined.

Reproducing Code Example

In [6]: tukeylambda.cdf([1e4, 1e6, 1e8, 1e10, 1e12, 1e24], -0.5).tolist()
Out[6]: 
[0.999999960015991,
 0.9999999999959996,
 0.9999999999999929,
 0.9999999999999929,
 0.9999999999999929,
 0.9999999999999929]

In [7]: tukeylambda.sf([1e4, 1e6, 1e8, 1e10, 1e12, 1e24], -0.5).tolist()
Out[7]: 
[3.998400899263288e-08,
 4.000355602329364e-12,
 7.105427357601002e-15,
 7.105427357601002e-15,
 7.105427357601002e-15,
 7.105427357601002e-15]

In [8]: tukeylambda.pdf([1e4, 1e6, 1e8, 1e10, 1e12, 1e24], -0.5).tolist()
Out[8]: 
[7.995203177218486e-12,
 8.001066830697682e-18,
 5.9894274089194295e-22,
 5.9894274089194295e-22,
 5.9894274089194295e-22,
 5.9894274089194295e-22]

SciPy/NumPy/Python version and system information

1.15.0.dev0+1377.235e778 2.0.0 sys.version_info(major=3, minor=12, micro=4, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/include/x86_64-linux-gnu/openblas-pthread/
    lib directory: /usr/lib/x86_64-linux-gnu/openblas-pthread/
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 NO_CBLAS=
      NO_LAPACK= NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 generic MAX_THREADS=64
    pc file directory: /usr/lib/x86_64-linux-gnu/pkgconfig
    version: 0.3.20
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /usr/include/x86_64-linux-gnu/openblas-pthread/
    lib directory: /usr/lib/x86_64-linux-gnu/openblas-pthread/
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 NO_CBLAS=
      NO_LAPACK= NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 generic MAX_THREADS=64
    pc file directory: /usr/lib/x86_64-linux-gnu/pkgconfig
    version: 0.3.20
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.13.1
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.10
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  pythran:
    include directory: ../../../../../py3.12.4/lib/python3.12/site-packages/pythran
    version: 0.16.1
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /home/warren/py3.12.4/bin/python3
  version: '3.12'

The text was updated successfully, but these errors were encountered:

fancidev · 2024-08-14T08:56:01Z

Interesting catch!

It seems Newton’s method may work for this problem and may consume fewer iterations, but a closer look is probably necessary to make sure it is robust for different inputs.

WarrenWeckesser · 2024-08-14T12:03:01Z

I have a fix in progress. Currently I'm using the toms748_solve function from Boost/math, and so far it is working well. I still have some details to work out, so a pull request is not imminent.

fancidev · 2024-08-15T01:30:08Z

Btw, when $p$ is close to $0.5$, the numerical accuracy of tukeylambda.ppf has room for improvement. For a contrived example, the relative error of tukeylambda(-0.2).ppf(np.nextafter(0.5,0)) is 56%. For a less contrived example, the relative error of tukeylambda(0.2).ppf(0.5001) is 3e-13.

WarrenWeckesser · 2024-08-15T01:51:50Z

@fancidev I noticed that too. In fact, it was while investigating that behavior that I ran into the loss of precision of special.logit, resulting in #21388.

I'm experimenting with a technique that gives good results over a wide range of λ, but I think it will be slow, even when implemented in C++. It will be part of the changes that I am working on for tukeylambda. If you have any ideas for improving the numerical behavior of tukeylambda.ppf, let us know!

fancidev · 2024-08-15T03:28:50Z

If you have any ideas for improving the numerical behavior of tukeylambda.ppf, let us know!

For $p$ close to 0.5 and $\lambda$ not very large (in absolute value), Taylor expansion around 0.5 could help to improve the accuracy of tukeylambda.ppf,

$$ Q(p) = \frac{1}{\lambda}\left[p^\lambda-(1-p)^\lambda\right] $$

For $0 < p < 1$, let $u := p-\frac{1}{2}$, so $-\frac{1}{2} < u < \frac{1}{2}$. Using the Taylor series $(1+x)^a = \sum_{k=0}^\infty {a \choose k} x^k$, which converges for $|x| < 1$, the quantile function can be written as

$$ \begin{align*} Q (u) &= \frac{1}{\lambda}\left[ \left(\frac12+u\right)^\lambda - \left(\frac12-u\right)^\lambda\right] \\ &= \frac{1}{\lambda \cdot 2^\lambda}\left[ \left(1+2u\right)^\lambda - \left(1-2u\right)^\lambda\right] \\ &= \frac{1}{\lambda \cdot 2^\lambda}\left[ \sum_{k=0}^\infty {\lambda \choose k } (2u)^k - \sum_{k=0}^\infty {\lambda \choose k } (-2u)^k\right] \\ &= \frac{1}{2^{\lambda-1}}\sum_{k=0}^\infty \frac{1}{\lambda} {\lambda \choose 2k+1 } (2u)^{2k+1} \\ &= \frac{1}{2^{\lambda-1}} \left[ (2u) + \frac{(\lambda-1)(\lambda-2)}{6} (2u)^3 + \frac{(\lambda-1)(\lambda-2)(\lambda-3)(\lambda-4)}{120}(2u)^5 + \cdots \right] \end{align*} $$

It remains to determine the domain of $\lambda$ and $p$ for which the Taylor expansion is applicable, as well as the number of terms needed to achieve desired accuracy. Hopefully outside this domain the direct formula works well enough.

Wolfram Alpha suggests that an alternative first-order approximation for $p \approx 0.5$ is $Q(p) \approx 2^{-\lambda} \mathrm{logit}(p)$.

fancidev · 2024-08-16T01:50:14Z

For $\lambda < 0$ and $p \le 0.5$, an accurate formula for tukeylambda.ppf(p, lambda) appears to be

$$ \begin{align*} Q(p) &= \frac{1}{\lambda}\left[p^\lambda-(1-p)^\lambda\right]\\ &= \frac{1}{\lambda} p^\lambda\left[1-\left(\frac{1-p}{p}\right)^\lambda\right]\\ &=\frac{1}{\lambda} p^\lambda\left[1-e^{\lambda \ln\frac{1-p}{p}} \right] \\ &= \frac{1}{\lambda} p^\lambda\left[1-e^{-\lambda t } \right] \end{align*} $$

where $t \equiv \ln [p/(1-p)]$. The last term is evaluated using expm1. t is evaluated using logit (following gh-21388). Numerical experiments show that this expression achieves relative error of a few machine epsilon, provided no premature overflow occurs. Premature overflow occurs if $p^\lambda$ overflows but $Q(p)$ doesn’t.

For $\lambda > 0$ and $p \le 0.5$, a similar derivation yields the formula

$$ \begin{align*} Q(p) &= \frac{1}{\lambda}\left[p^\lambda-(1-p)^\lambda\right]\\ &= \frac{1}{\lambda} (1-p)^\lambda\left[\left(\frac{p}{1-p}\right)^\lambda-1\right]\\ &=\frac{1}{\lambda} (1-p)^\lambda\left[e^{\lambda \ln\frac{p}{1-p}} -1\right] \\ &= \frac{1}{\lambda} (1-p)^\lambda\left[e^{\lambda t } -1\right] \end{align*} $$

The last term is evaluated using expm1. The term $(1-p)^\lambda$ would be evaluated using the (non-existent) pow1p function. If such a function exists, then numerical experiments suggest that the expression achieves relative error of a few machine epsilon if the result does not underflow. (An example that underflows is $Q(10/74; \lambda=5000) \approx -1.1\times 10^{-319}$.)

If $p > 0.5$, it is accurate to evaluate $Q(p)$ by $Q(p)=-Q(1-p)$.

pow1p(x,a) is defined as $(1+x)^a$ for $x \ge -1$ and $a > 0$. Write $1+x=s+t$ where $|t|\ll |s|$; this may be done using the Fast2Sum algorithm. Then

$$ (1+x)^a = (s+t)^a=s^a \left(1+\frac{t}{s}\right)^a=s^a e^{a\ln\left(1+\frac{t}{s}\right)} $$

WarrenWeckesser added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats labels Aug 12, 2024

fancidev mentioned this issue Aug 16, 2024

ENH: special: add function pow1p(x,y) to compute (1+x)**y #21404

Open

mdhaber added the scipy.special label Oct 5, 2024

mdhaber mentioned this issue Oct 5, 2024

ENH: special: Add root finding tools to xsf and implement gdtrib for GPU while making substantial improvements #21454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: stats.tukeylambda: Bad behavior of the `cdf()` and `sf()` methods in the tails. #21370

BUG: stats.tukeylambda: Bad behavior of the `cdf()` and `sf()` methods in the tails. #21370

WarrenWeckesser commented Aug 12, 2024 •

edited

Loading

fancidev commented Aug 14, 2024 •

edited

Loading

WarrenWeckesser commented Aug 14, 2024

fancidev commented Aug 15, 2024 •

edited

Loading

WarrenWeckesser commented Aug 15, 2024

fancidev commented Aug 15, 2024 •

edited

Loading

fancidev commented Aug 16, 2024 •

edited

Loading

BUG: stats.tukeylambda: Bad behavior of the cdf() and sf() methods in the tails. #21370

BUG: stats.tukeylambda: Bad behavior of the cdf() and sf() methods in the tails. #21370

Comments

WarrenWeckesser commented Aug 12, 2024 • edited Loading

Describe your issue.

Reproducing Code Example

fancidev commented Aug 14, 2024 • edited Loading

WarrenWeckesser commented Aug 14, 2024

fancidev commented Aug 15, 2024 • edited Loading

WarrenWeckesser commented Aug 15, 2024

fancidev commented Aug 15, 2024 • edited Loading

fancidev commented Aug 16, 2024 • edited Loading

BUG: stats.tukeylambda: Bad behavior of the `cdf()` and `sf()` methods in the tails. #21370

BUG: stats.tukeylambda: Bad behavior of the `cdf()` and `sf()` methods in the tails. #21370

WarrenWeckesser commented Aug 12, 2024 •

edited

Loading

fancidev commented Aug 14, 2024 •

edited

Loading

fancidev commented Aug 15, 2024 •

edited

Loading

fancidev commented Aug 15, 2024 •

edited

Loading

fancidev commented Aug 16, 2024 •

edited

Loading