ENH: simplify `gausshyper.pdf` #18799

dschmitz89 · 2023-06-30T18:29:57Z

Reference issue

What does this implement/fix?

There is no analytical expression available for the gausshyper distribution PPF. To make things worse, implementing the CDF is not possible without high precision codes for Appel's hypergeometric functions (see #18056). That means that the default random variate generation goes through numerical integration to calculate the CDF and then inverts it via a rootfinder. At least the quadrature integration can be speeded up by providing a lowlevel callable of the PDF in cython.

I pretty much blindly copied the approach from the genhyperbolic distribution implemented in #18134.

Additional information

I did not further check the accuracy of the PDF in the tails, here I focussed on performance only. Locally, this passed the generic distribution tests for me at least. Going via the logpdf could be done in a follow up.

Benchmark:

import numpy as np
from scipy import stats
from time import time

a, b, c, z = 1.5, 2.5, 2, 0

gh = stats.gausshyper(a=a, b=b, c=c, z=z)

x = np.linspace(0, 1, 1000)
t0 = time()
cdf_vals = gh.cdf(x)
print(f"Time for 1000 CDF values: {time() - t0}")

t0 = time()
samples = gh.rvs(size=1000)
print(f"Time for 1000 samples: {time() - t0}")

This PR:
Time for 1000 CDF values: 0.03378629684448242
Time for 1000 samples: 1.2594482898712158

Main:
Time for 1000 CDF values: 0.8763303756713867
Time for 1000 samples: 8.12756633758545

Sampling is still not "fast" but at least faster. Maybe with a vectorized rootfinder (#17719), the PPF calculation will become much faster in future.

mdhaber · 2023-06-30T19:03:07Z

_tanhsinh provides a speedup 3.7x speedup even for individual evaluations of this function's CDF (except at x=0... I need to add a special condition for that edge case), and it will be vectorized following the approach of _chandrupatla.

For vectorized calls, I've seen _chandrupatla be 440x faster than scipy.optimize.brent, even though individual evaluations take twice as long.

This is on a CPU, but when the function can be evaluated on a GPU, it will take advantage of that, and we can expect a few more orders of magnitude improvement. (hyp2f1 isn't available in CuPy yet, though.)

So if we can wait until the end of the summer, during which I'm working on gh-15928, I think we can get a substantial speed boost without any special effort. There will be similar speedups for vectorized calls to rvs and ppf.

Another advantage of _tanhsinh is evaluation of the logcdf (using logsumexp) if logpdf is available. I see some gamma functions in there, so this could come in handy.

OTOH I see you asked for Warren's review, so it's fine with me if he will review it.

BTW, I'd suggest time.perf_counter. time.time doesn't have great resolution on all platforms.

Update: make that 7.5x speedup when doing a Python loop over the 999 points. gmean of relative error is <1e-15.

import numpy as np
from scipy import stats
from scipy import special as sc
from scipy.integrate._tanhsinh import _tanhsinh_noiv
from time import perf_counter
from mpmath import mp
mp.dps = 50

a, b, c, z = 1.5, 2.5, 2, 0


def _pdf(x):
    Cinv = sc.gamma(a) * sc.gamma(b) / sc.gamma(a + b) * sc.hyp2f1(c, a, a + b, -z)
    return 1.0 / Cinv * x ** (a - 1.0) * (1.0 - x) ** (b - 1.0) / (1.0 + z * x) ** c

def _pdf_mpmath(x):
    a, b, c, z = 1.5, 2.5, 2, 0
    a, b, c, z = mp.mpf(a), mp.mpf(b), mp.mpf(c), mp.mpf(z)
    Cinv = mp.gamma(a) * mp.gamma(b) / mp.gamma(a + b) * mp.hyp2f1(c, a, a + b, -z)
    return mp.one / Cinv * x ** (a - mp.one) * (mp.one - x) ** (b - mp.one) / (mp.one + z * x) ** c

gh = stats.gausshyper(a=a, b=b, c=c, z=z)

x = np.linspace(0, 1, 1000)[1:]
tic = perf_counter()
cdf_vals = gh.cdf(x)
toc = perf_counter()
t1 = toc-tic
print(f"Time for 1000 CDF value (main): {t1}")

tic = perf_counter()
cdf_vals2 = [_tanhsinh_noiv(_pdf, 0, xi, log=False, maxfun=np.inf, maxlevel=10,
                            minlevel=2, atol=0, rtol=1e-12).integral for xi in x]
toc = perf_counter()
t2 = toc-tic
print(f"Time for 1000 CDF values (_tanhsinh): {t2}")

cdf_vals3 = np.asarray([mp.quad(_pdf_mpmath, (mp.mpf(0), mp.mpf(xi))) for xi in x], dtype=np.float64)

rerr = abs((cdf_vals - cdf_vals3)/(cdf_vals3))
rerr2 = abs((np.asarray(cdf_vals2) - cdf_vals3)/(cdf_vals3))
rerr = rerr[rerr != 0]
rerr2 = rerr2[rerr2 != 0]

print(f"Speedup: {t1/t2}")
print(f"gmean of relative error (quad): {stats.gmean(rerr)}")
print(f"gmean of relative error (tanhsinh): {stats.gmean(rerr2)}")

print(f"max relative error (quad): {np.max(rerr)}")
print(f"max relative error (tanhsinh): {np.max(rerr2)}")

dschmitz89 · 2023-06-30T19:41:36Z

Sounds like awesome times are coming for scipy.stats! The integration via the lowlevel callable looks still faster for the 1000 data points here still. I agree, hopefully this extra effort will not be necessary in the future anymore.

For the fun of it, what effect does changing

Cinv = sc.gamma(a) * sc.gamma(b) / sc.gamma(a + b) * sc.hyp2f1(c, a, a + b, -z) to

Cinv = sc.beta(a, b) * sc.hyp2f1(c, a, a + b, -z) have on the timing with the new integrator? I thought this could reduce some inaccuracy.

mdhaber · 2023-06-30T19:51:26Z

Actually, we can reduce some inaccuracy just by using _tanhsinh. Code comparing against mpmath updated above.

Speedup: 7.49806608190174
gmean of relative error (quad): 7.961428032643411e-16
gmean of relative error (tanhsinh): 4.46206718903016e-16
max relative error (quad): 1.1235272457145405e-07
max relative error (tanhsinh): 1.445515587446478e-13

dschmitz89 · 2023-06-30T19:54:58Z

I will add some tests against the analytical CDF expression implemented in mpmath over the next days as I have started to doubt mpmath's quadrature results sometimes.

mdhaber · 2023-06-30T20:03:17Z

I've seen inaccuracy for entropy calculations when dps is not high enough. I do not doubt it in general when the methods agree with one another, dps is sufficiently high, and there is convergence as dps is increased.

I agree that it may not be fair to compare _tanhsinh against mpmath's tanh-sinh implementation. But _tanhsinh still has less error if we trust mpmath's Gauss-Legendre.

gmean of relative error (quad): 5.981882710120695e-10
gmean of relative error (tanhsinh): 5.91909957455664e-10
max relative error (quad): 1.1201215834892658e-07
max relative error (tanhsinh): 1.155285199503168e-09

Using beta is a bit faster.

Time for 1000 CDF value (main): 1.1051812999940012
Time for 1000 CDF values (_tanhsinh): 0.1437306999869179
Speedup: 7.689250105193899

All the integrals are identical.

ilayn · 2023-06-30T20:33:52Z

I can help with the speed up this code, since np.vectorize is a glorified for loop still in Python side. But I'm not sure I understand what the code is doing. The x is an array and the rest are doubles? Then we can speed this up much faster. Not sure if you want to though, Matt's comment notwithstanding about the summer work.

dschmitz89 · 2023-07-01T06:12:26Z

I can help with the speed up this code, since np.vectorize is a glorified for loop still in Python side. But I'm not sure I understand what the code is doing. The x is an array and the rest are doubles? Then we can speed this up much faster. Not sure if you want to though, Matt's comment notwithstanding about the summer work.

Thanks for the offer. If you have an idea how to speed up the integration more by some cython magic, I am all ears. The gausshyper.pdf method is not so performance critical on the other hand.

dschmitz89 · 2023-07-05T19:24:59Z

quad results in CDF values > 1 for certain parameter regions for this distribution.

from scipy.stats import gausshyper

x, a, b, c, z = 0.999999, 1.5, 2.5, 2, -0.5

gausshyper.cdf(x, a, b, c, z)
#1.000000000000055

@mdhaber : if the new integrator does better in this case, let's close this PR. I understood that its relative error was anyway much lower .

mdhaber · 2023-07-05T19:54:12Z

I wouldn't consider that a dealbreaker, since the result is very close to one and can just be clipped, but:

QuadratureResult(integral=0.9999999999999949, error=2.4863916783069763e-16, feval=130, success=True, status=0, message='The algorithm completed successfully, and the error estimate meets the requested tolerance.')

mpmath's two methods disagree about this integral as posed, but they agree that the complement of this integral is 2.037182544157116e-15, so _tanhsinh is about 3.1086244689504383e-15 too low whereas this gausshyper.cdf would be 5.706546346573305e-14 too high.

But I'm happy with this being closed since this will already speed up quite a bit in pure Python. Besides _tanhsinh, I also plan to add a way to create a way of generating a distribution object based on an interpolated PDF, similar to the UNURAN stuff. To demonstrate the idea.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, interpolate
import scipy.stats.sampling as sampling

a, b, c, z = 13.8, 3.12, 2.51, 5.18
f = lambda x: stats.gausshyper._pdf(x, a, b, c, z)

def integrate(f):
  x = np.linspace(0, 1, 1000)
  f = interpolate.PchipInterpolator(x, f(x), axis=-1)
  F = f.antiderivative()
  return F

F = integrate(f)

x = np.linspace(0, 1, 200)
res = F(x)
ref = stats.gausshyper.cdf(x, a, b, c, z)

plt.plot(x, ref, '-', label='original')
plt.plot(x, res, '--', label='interpolant')
plt.legend()

Even with the cost of setting up the interpolant, this is a faster way of getting the CDF at a few points.

%timeit F = integrate(f); F(x)  # 898 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit stats.gausshyper.cdf(x, a, b, c, z)  # 35.6 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# this is for evaluating at 200 points, but even for a single point gausshyper.cdf currently takes about 0.5 ms.

and of course once the interpolant has been generated, it's super fast to compute with and reasonably accurate.

%timeit F(x)  # 19.1 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.max(abs(F(x) - stats.gausshyper.cdf(x, a, b, c, z)))  # 6.080010717113282e-09

And the accuracy is something that the user can specify; the tradeoff is just additional evaluations of the PDF.

Update: I developed these ideas a bit further in https://gist.github.com/mdhaber/b112732e5b19889a5589cfdce7e055c9. It's promising, so @tupui and I might work on that after the main infrastructure improvements. For large draws, that has the potential to be faster than anything else discussed here.

mdhaber · 2023-07-25T19:14:26Z

I vectorized tanhsinh. Now, re-running the benchmark with cdf_vals2 = _tanhsinh2(_pdf, 0, x).integral in place of the list comprehension:

Time for 1000 CDF value (main): 1.0867439999710768
Time for 1000 CDF values (_tanhsinh): 0.005081499926745892
Speedup: 213.8628388541589
gmean of relative error (quad): 7.961428032643411e-16
gmean of relative error (tanhsinh): 4.46206718903016e-16
max relative error (quad): 1.1235272457145405e-07
max relative error (tanhsinh): 1.445515587446478e-13

That's a factor of 213.86 compared to 25.9 reported in the top post.

Then for the random samples, using the scalar root finder from gh-17719

from scipy.optimize._zeros_py import _chandrupatla
rng = np.random.default_rng(8346584583456)
f = lambda x, p: _tanhsinh2(_pdf, 0, x).integral - p
tic = perf_counter()
p = rng.random(1000)
res = _chandrupatla(f, 0, 1, args=(p,))
toc = perf_counter()
print(f"Time for 1000 observations: {toc-tic}")
# Time for 1000 observations: 0.05970019998494536

tic = perf_counter()
samples = gh.rvs(size=1000)
toc = perf_counter()
print(f"Time for 1000 observations (main): {toc-tic}")
# Time for 1000 observations (main): 10.13877489999868

That's a factor of 169.8 compared to 6.45 reported in the top post.

Re: #18799 (comment), it is certainly possible to beat these pure-Python solutions if the integrand and all the loops are brought into C++. So if that speed is needed, I'd be happy for it to happen, although personally I'd be more interested in gh-18056. I just thought I should mention the speedups we will get for free pretty soon, and the pure-Python solutions will be useful for execution with non-CPU arrays.

tupui · 2023-07-25T20:06:50Z

Looks like the CDF part might not be wanted then, but what about the PDF and other additions? This still seems to have value no?

dschmitz89 · 2023-07-26T05:36:06Z

Doing the whole integration in cython seems unnecessary then definitely.

Regarding @tupui 's comment there are two things that can be done in my opinion:

easy: replacing sc.gamma(a)*sc.gamma(b)/sc.gamma(a+b) by sc.beta(a, b) for simplification
harder: potentially implementing the PDF as a ufunc using the machinery from special. That should yield some performance improvement also using the new vectorized integrator.

Any opinions?

mdhaber · 2023-07-26T05:57:16Z

I'd suggest just doing the easy thing for now. The existing tests are (hopefully) strong enough to detect a mistake, so you shouldn't need any tests if the intent is just simplification. It would be good to test locally to make sure there are no new numerical issues, but I have a hard time seeing how there would be since you're replacing a ratio of gammas, which overflow easily, with a single call to beta.

@tirthasheshpatel and @steppi are working on something that will make it even easier to do the harder thing, so I'd suggest waiting on that.

This still seems to have value no?

The short answer is not really, as-is. You can see that the PR (as-is) replaces a natively vectorized PDF with one that uses @np.vectorize. That can't be good for the pdf. The idea of replacing the ratio of gammas with beta is good, but that doesn't really preserve much that's here.

tupui · 2023-07-26T07:23:25Z

This still seems to have value no?

The short answer is not really, as-is. You can see that the PR (as-is) replaces a natively vectorized PDF with one that uses @np.vectorize. That can't be good for the pdf. The idea of replacing the ratio of gammas with beta is good, but that doesn't really preserve much that's here.

I am really talking about the Cython part here, not the Python.

mdhaber

If the style is going to change, I'd prefer to keep it self consistent (e.g. 1. or 1.0 throughout). It would also be nice to avoid the line continuation, which is easy with a slightly shorter variable name (e.g. normalization). It would also be OK to leave them all that alone to reduce the diff. But to avoid further delay, I'll merge once tests finish.

* ENH: stats.gausshyper.pdf: simplify implementation

ENH: cythonize gausshyper PDF

64b57ac

github-actions bot added the scipy.stats label Jun 30, 2023

dschmitz89 added the enhancement A new feature or improvement label Jun 30, 2023

dschmitz89 requested a review from WarrenWeckesser June 30, 2023 18:31

dschmitz89 changed the title ~~ENH: cythonize gausshyper PDF~~ ENH: cythonize gausshyper.pdf Jun 30, 2023

dschmitz89 added the Cython Issues with the internal Cython code base label Jun 30, 2023

dschmitz89 added 5 commits July 1, 2023 17:49

TST: test CDF values and clip to 1

77f8378

TST: add reference mpmath gauss hyper distribution

b864370

ENH: avoid CDF values > 1 by calculating 1 - SF in that case

945df48

DOC: add comment

70a2caa

Fix broadcasting

1b2cb87

dschmitz89 added 2 commits July 26, 2023 22:07

Merge branch 'main' into gausshyper_cython

05091b4

TST/ENH: remove cython code, simplify PDF in python

ce6e3e4

dschmitz89 changed the title ~~ENH: cythonize gausshyper.pdf~~ ENH: simplify gausshyper.pdf Jul 26, 2023

mdhaber approved these changes Jul 26, 2023

View reviewed changes

mdhaber merged commit 42224e8 into scipy:main Jul 27, 2023
20 of 25 checks passed

j-bowhay added this to the 1.12.0 milestone Jul 27, 2023

mdhaber mentioned this pull request Jul 28, 2023

ENH: integrate._tanhsinh: tanh-sinh quadrature #18650

Merged

scottshambaugh pushed a commit to scottshambaugh/scipy that referenced this pull request Jul 28, 2023

ENH: stats.gausshyper.pdf: simplify implementation (scipy#18799)

8991dc5

* ENH: stats.gausshyper.pdf: simplify implementation

mdhaber mentioned this pull request Nov 29, 2023

ENH: vectorize expected value method for continous random variables #19511

Closed

dschmitz89 mentioned this pull request Dec 10, 2023

DOC: 1.12.0 release notes #19628

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: simplify `gausshyper.pdf` #18799

ENH: simplify `gausshyper.pdf` #18799

dschmitz89 commented Jun 30, 2023 •

edited

mdhaber commented Jun 30, 2023 •

edited

dschmitz89 commented Jun 30, 2023 •

edited

mdhaber commented Jun 30, 2023

dschmitz89 commented Jun 30, 2023

mdhaber commented Jun 30, 2023 •

edited

ilayn commented Jun 30, 2023

dschmitz89 commented Jul 1, 2023

dschmitz89 commented Jul 5, 2023

mdhaber commented Jul 5, 2023 •

edited

mdhaber commented Jul 25, 2023 •

edited

tupui commented Jul 25, 2023

dschmitz89 commented Jul 26, 2023

mdhaber commented Jul 26, 2023 •

edited

tupui commented Jul 26, 2023

mdhaber left a comment •

edited

ENH: simplify gausshyper.pdf #18799

ENH: simplify gausshyper.pdf #18799

Conversation

dschmitz89 commented Jun 30, 2023 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber commented Jun 30, 2023 • edited

dschmitz89 commented Jun 30, 2023 • edited

mdhaber commented Jun 30, 2023

dschmitz89 commented Jun 30, 2023

mdhaber commented Jun 30, 2023 • edited

ilayn commented Jun 30, 2023

dschmitz89 commented Jul 1, 2023

dschmitz89 commented Jul 5, 2023

mdhaber commented Jul 5, 2023 • edited

mdhaber commented Jul 25, 2023 • edited

tupui commented Jul 25, 2023

dschmitz89 commented Jul 26, 2023

mdhaber commented Jul 26, 2023 • edited

tupui commented Jul 26, 2023

mdhaber left a comment • edited

Choose a reason for hiding this comment

ENH: simplify `gausshyper.pdf` #18799

ENH: simplify `gausshyper.pdf` #18799

dschmitz89 commented Jun 30, 2023 •

edited

mdhaber commented Jun 30, 2023 •

edited

dschmitz89 commented Jun 30, 2023 •

edited

mdhaber commented Jun 30, 2023 •

edited

mdhaber commented Jul 5, 2023 •

edited

mdhaber commented Jul 25, 2023 •

edited

mdhaber commented Jul 26, 2023 •

edited

mdhaber left a comment •

edited