Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: simplify gausshyper.pdf #18799

Merged
merged 8 commits into from Jul 27, 2023
Merged

ENH: simplify gausshyper.pdf #18799

merged 8 commits into from Jul 27, 2023

Conversation

dschmitz89
Copy link
Contributor

@dschmitz89 dschmitz89 commented Jun 30, 2023

Reference issue

Part of #10106 and #17832

What does this implement/fix?

There is no analytical expression available for the gausshyper distribution PPF. To make things worse, implementing the CDF is not possible without high precision codes for Appel's hypergeometric functions (see #18056). That means that the default random variate generation goes through numerical integration to calculate the CDF and then inverts it via a rootfinder. At least the quadrature integration can be speeded up by providing a lowlevel callable of the PDF in cython.

I pretty much blindly copied the approach from the genhyperbolic distribution implemented in #18134.

Additional information

I did not further check the accuracy of the PDF in the tails, here I focussed on performance only. Locally, this passed the generic distribution tests for me at least. Going via the logpdf could be done in a follow up.

Benchmark:

import numpy as np
from scipy import stats
from time import time

a, b, c, z = 1.5, 2.5, 2, 0

gh = stats.gausshyper(a=a, b=b, c=c, z=z)

x = np.linspace(0, 1, 1000)
t0 = time()
cdf_vals = gh.cdf(x)
print(f"Time for 1000 CDF values: {time() - t0}")

t0 = time()
samples = gh.rvs(size=1000)
print(f"Time for 1000 samples: {time() - t0}")

This PR:
Time for 1000 CDF values: 0.03378629684448242
Time for 1000 samples: 1.2594482898712158

Main:
Time for 1000 CDF values: 0.8763303756713867
Time for 1000 samples: 8.12756633758545

Sampling is still not "fast" but at least faster. Maybe with a vectorized rootfinder (#17719), the PPF calculation will become much faster in future.

@dschmitz89 dschmitz89 added the enhancement A new feature or improvement label Jun 30, 2023
@dschmitz89 dschmitz89 changed the title ENH: cythonize gausshyper PDF ENH: cythonize gausshyper.pdf Jun 30, 2023
@mdhaber
Copy link
Contributor

mdhaber commented Jun 30, 2023

_tanhsinh provides a speedup 3.7x speedup even for individual evaluations of this function's CDF (except at x=0... I need to add a special condition for that edge case), and it will be vectorized following the approach of _chandrupatla.

For vectorized calls, I've seen _chandrupatla be 440x faster than scipy.optimize.brent, even though individual evaluations take twice as long.

This is on a CPU, but when the function can be evaluated on a GPU, it will take advantage of that, and we can expect a few more orders of magnitude improvement. (hyp2f1 isn't available in CuPy yet, though.)

So if we can wait until the end of the summer, during which I'm working on gh-15928, I think we can get a substantial speed boost without any special effort. There will be similar speedups for vectorized calls to rvs and ppf.

Another advantage of _tanhsinh is evaluation of the logcdf (using logsumexp) if logpdf is available. I see some gamma functions in there, so this could come in handy.

OTOH I see you asked for Warren's review, so it's fine with me if he will review it.

BTW, I'd suggest time.perf_counter. time.time doesn't have great resolution on all platforms.

Update: make that 7.5x speedup when doing a Python loop over the 999 points. gmean of relative error is <1e-15.

import numpy as np
from scipy import stats
from scipy import special as sc
from scipy.integrate._tanhsinh import _tanhsinh_noiv
from time import perf_counter
from mpmath import mp
mp.dps = 50

a, b, c, z = 1.5, 2.5, 2, 0


def _pdf(x):
    Cinv = sc.gamma(a) * sc.gamma(b) / sc.gamma(a + b) * sc.hyp2f1(c, a, a + b, -z)
    return 1.0 / Cinv * x ** (a - 1.0) * (1.0 - x) ** (b - 1.0) / (1.0 + z * x) ** c

def _pdf_mpmath(x):
    a, b, c, z = 1.5, 2.5, 2, 0
    a, b, c, z = mp.mpf(a), mp.mpf(b), mp.mpf(c), mp.mpf(z)
    Cinv = mp.gamma(a) * mp.gamma(b) / mp.gamma(a + b) * mp.hyp2f1(c, a, a + b, -z)
    return mp.one / Cinv * x ** (a - mp.one) * (mp.one - x) ** (b - mp.one) / (mp.one + z * x) ** c

gh = stats.gausshyper(a=a, b=b, c=c, z=z)

x = np.linspace(0, 1, 1000)[1:]
tic = perf_counter()
cdf_vals = gh.cdf(x)
toc = perf_counter()
t1 = toc-tic
print(f"Time for 1000 CDF value (main): {t1}")

tic = perf_counter()
cdf_vals2 = [_tanhsinh_noiv(_pdf, 0, xi, log=False, maxfun=np.inf, maxlevel=10,
                            minlevel=2, atol=0, rtol=1e-12).integral for xi in x]
toc = perf_counter()
t2 = toc-tic
print(f"Time for 1000 CDF values (_tanhsinh): {t2}")

cdf_vals3 = np.asarray([mp.quad(_pdf_mpmath, (mp.mpf(0), mp.mpf(xi))) for xi in x], dtype=np.float64)

rerr = abs((cdf_vals - cdf_vals3)/(cdf_vals3))
rerr2 = abs((np.asarray(cdf_vals2) - cdf_vals3)/(cdf_vals3))
rerr = rerr[rerr != 0]
rerr2 = rerr2[rerr2 != 0]

print(f"Speedup: {t1/t2}")
print(f"gmean of relative error (quad): {stats.gmean(rerr)}")
print(f"gmean of relative error (tanhsinh): {stats.gmean(rerr2)}")

print(f"max relative error (quad): {np.max(rerr)}")
print(f"max relative error (tanhsinh): {np.max(rerr2)}")

@dschmitz89
Copy link
Contributor Author

dschmitz89 commented Jun 30, 2023

Sounds like awesome times are coming for scipy.stats! The integration via the lowlevel callable looks still faster for the 1000 data points here still. I agree, hopefully this extra effort will not be necessary in the future anymore.

For the fun of it, what effect does changing

Cinv = sc.gamma(a) * sc.gamma(b) / sc.gamma(a + b) * sc.hyp2f1(c, a, a + b, -z) to

Cinv = sc.beta(a, b) * sc.hyp2f1(c, a, a + b, -z) have on the timing with the new integrator? I thought this could reduce some inaccuracy.

@mdhaber
Copy link
Contributor

mdhaber commented Jun 30, 2023

Actually, we can reduce some inaccuracy just by using _tanhsinh. Code comparing against mpmath updated above.

Speedup: 7.49806608190174
gmean of relative error (quad): 7.961428032643411e-16
gmean of relative error (tanhsinh): 4.46206718903016e-16
max relative error (quad): 1.1235272457145405e-07
max relative error (tanhsinh): 1.445515587446478e-13

@dschmitz89
Copy link
Contributor Author

I will add some tests against the analytical CDF expression implemented in mpmath over the next days as I have started to doubt mpmath's quadrature results sometimes.

@mdhaber
Copy link
Contributor

mdhaber commented Jun 30, 2023

I've seen inaccuracy for entropy calculations when dps is not high enough. I do not doubt it in general when the methods agree with one another, dps is sufficiently high, and there is convergence as dps is increased.

I agree that it may not be fair to compare _tanhsinh against mpmath's tanh-sinh implementation. But _tanhsinh still has less error if we trust mpmath's Gauss-Legendre.

gmean of relative error (quad): 5.981882710120695e-10
gmean of relative error (tanhsinh): 5.91909957455664e-10
max relative error (quad): 1.1201215834892658e-07
max relative error (tanhsinh): 1.155285199503168e-09

Using beta is a bit faster.

Time for 1000 CDF value (main): 1.1051812999940012
Time for 1000 CDF values (_tanhsinh): 0.1437306999869179
Speedup: 7.689250105193899

All the integrals are identical.

@ilayn
Copy link
Member

ilayn commented Jun 30, 2023

I can help with the speed up this code, since np.vectorize is a glorified for loop still in Python side. But I'm not sure I understand what the code is doing. The x is an array and the rest are doubles? Then we can speed this up much faster. Not sure if you want to though, Matt's comment notwithstanding about the summer work.

@dschmitz89 dschmitz89 added the Cython Issues with the internal Cython code base label Jun 30, 2023
@dschmitz89
Copy link
Contributor Author

I can help with the speed up this code, since np.vectorize is a glorified for loop still in Python side. But I'm not sure I understand what the code is doing. The x is an array and the rest are doubles? Then we can speed this up much faster. Not sure if you want to though, Matt's comment notwithstanding about the summer work.

Thanks for the offer. If you have an idea how to speed up the integration more by some cython magic, I am all ears. The gausshyper.pdf method is not so performance critical on the other hand.

@dschmitz89
Copy link
Contributor Author

quad results in CDF values > 1 for certain parameter regions for this distribution.

from scipy.stats import gausshyper

x, a, b, c, z = 0.999999, 1.5, 2.5, 2, -0.5

gausshyper.cdf(x, a, b, c, z)
#1.000000000000055

@mdhaber : if the new integrator does better in this case, let's close this PR. I understood that its relative error was anyway much lower .

@mdhaber
Copy link
Contributor

mdhaber commented Jul 5, 2023

I wouldn't consider that a dealbreaker, since the result is very close to one and can just be clipped, but:

QuadratureResult(integral=0.9999999999999949, error=2.4863916783069763e-16, feval=130, success=True, status=0, message='The algorithm completed successfully, and the error estimate meets the requested tolerance.')

mpmath's two methods disagree about this integral as posed, but they agree that the complement of this integral is 2.037182544157116e-15, so _tanhsinh is about 3.1086244689504383e-15 too low whereas this gausshyper.cdf would be 5.706546346573305e-14 too high.

But I'm happy with this being closed since this will already speed up quite a bit in pure Python. Besides _tanhsinh, I also plan to add a way to create a way of generating a distribution object based on an interpolated PDF, similar to the UNURAN stuff. To demonstrate the idea.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, interpolate
import scipy.stats.sampling as sampling

a, b, c, z = 13.8, 3.12, 2.51, 5.18
f = lambda x: stats.gausshyper._pdf(x, a, b, c, z)

def integrate(f):
  x = np.linspace(0, 1, 1000)
  f = interpolate.PchipInterpolator(x, f(x), axis=-1)
  F = f.antiderivative()
  return F

F = integrate(f)

x = np.linspace(0, 1, 200)
res = F(x)
ref = stats.gausshyper.cdf(x, a, b, c, z)

plt.plot(x, ref, '-', label='original')
plt.plot(x, res, '--', label='interpolant')
plt.legend()

image

Even with the cost of setting up the interpolant, this is a faster way of getting the CDF at a few points.

%timeit F = integrate(f); F(x)  # 898 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit stats.gausshyper.cdf(x, a, b, c, z)  # 35.6 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# this is for evaluating at 200 points, but even for a single point gausshyper.cdf currently takes about 0.5 ms.

and of course once the interpolant has been generated, it's super fast to compute with and reasonably accurate.

%timeit F(x)  # 19.1 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.max(abs(F(x) - stats.gausshyper.cdf(x, a, b, c, z)))  # 6.080010717113282e-09

And the accuracy is something that the user can specify; the tradeoff is just additional evaluations of the PDF.

Update: I developed these ideas a bit further in https://gist.github.com/mdhaber/b112732e5b19889a5589cfdce7e055c9. It's promising, so @tupui and I might work on that after the main infrastructure improvements. For large draws, that has the potential to be faster than anything else discussed here.

@mdhaber
Copy link
Contributor

mdhaber commented Jul 25, 2023

I vectorized tanhsinh. Now, re-running the benchmark with cdf_vals2 = _tanhsinh2(_pdf, 0, x).integral in place of the list comprehension:

Time for 1000 CDF value (main): 1.0867439999710768
Time for 1000 CDF values (_tanhsinh): 0.005081499926745892
Speedup: 213.8628388541589
gmean of relative error (quad): 7.961428032643411e-16
gmean of relative error (tanhsinh): 4.46206718903016e-16
max relative error (quad): 1.1235272457145405e-07
max relative error (tanhsinh): 1.445515587446478e-13

That's a factor of 213.86 compared to 25.9 reported in the top post.

Then for the random samples, using the scalar root finder from gh-17719

from scipy.optimize._zeros_py import _chandrupatla
rng = np.random.default_rng(8346584583456)
f = lambda x, p: _tanhsinh2(_pdf, 0, x).integral - p
tic = perf_counter()
p = rng.random(1000)
res = _chandrupatla(f, 0, 1, args=(p,))
toc = perf_counter()
print(f"Time for 1000 observations: {toc-tic}")
# Time for 1000 observations: 0.05970019998494536

tic = perf_counter()
samples = gh.rvs(size=1000)
toc = perf_counter()
print(f"Time for 1000 observations (main): {toc-tic}")
# Time for 1000 observations (main): 10.13877489999868

That's a factor of 169.8 compared to 6.45 reported in the top post.

Re: #18799 (comment), it is certainly possible to beat these pure-Python solutions if the integrand and all the loops are brought into C++. So if that speed is needed, I'd be happy for it to happen, although personally I'd be more interested in gh-18056. I just thought I should mention the speedups we will get for free pretty soon, and the pure-Python solutions will be useful for execution with non-CPU arrays.

@tupui
Copy link
Member

tupui commented Jul 25, 2023

Looks like the CDF part might not be wanted then, but what about the PDF and other additions? This still seems to have value no?

@dschmitz89
Copy link
Contributor Author

Doing the whole integration in cython seems unnecessary then definitely.

Regarding @tupui 's comment there are two things that can be done in my opinion:

  • easy: replacing sc.gamma(a)*sc.gamma(b)/sc.gamma(a+b) by sc.beta(a, b) for simplification
  • harder: potentially implementing the PDF as a ufunc using the machinery from special. That should yield some performance improvement also using the new vectorized integrator.

Any opinions?

@mdhaber
Copy link
Contributor

mdhaber commented Jul 26, 2023

I'd suggest just doing the easy thing for now. The existing tests are (hopefully) strong enough to detect a mistake, so you shouldn't need any tests if the intent is just simplification. It would be good to test locally to make sure there are no new numerical issues, but I have a hard time seeing how there would be since you're replacing a ratio of gammas, which overflow easily, with a single call to beta.

@tirthasheshpatel and @steppi are working on something that will make it even easier to do the harder thing, so I'd suggest waiting on that.

This still seems to have value no?

The short answer is not really, as-is. You can see that the PR (as-is) replaces a natively vectorized PDF with one that uses @np.vectorize. That can't be good for the pdf. The idea of replacing the ratio of gammas with beta is good, but that doesn't really preserve much that's here.

@tupui
Copy link
Member

tupui commented Jul 26, 2023

This still seems to have value no?

The short answer is not really, as-is. You can see that the PR (as-is) replaces a natively vectorized PDF with one that uses @np.vectorize. That can't be good for the pdf. The idea of replacing the ratio of gammas with beta is good, but that doesn't really preserve much that's here.

I am really talking about the Cython part here, not the Python.

@dschmitz89 dschmitz89 changed the title ENH: cythonize gausshyper.pdf ENH: simplify gausshyper.pdf Jul 26, 2023
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the style is going to change, I'd prefer to keep it self consistent (e.g. 1. or 1.0 throughout). It would also be nice to avoid the line continuation, which is easy with a slightly shorter variable name (e.g. normalization). It would also be OK to leave them all that alone to reduce the diff. But to avoid further delay, I'll merge once tests finish.

@mdhaber mdhaber merged commit 42224e8 into scipy:main Jul 27, 2023
20 of 25 checks passed
@j-bowhay j-bowhay added this to the 1.12.0 milestone Jul 27, 2023
scottshambaugh pushed a commit to scottshambaugh/scipy that referenced this pull request Jul 28, 2023
* ENH: stats.gausshyper.pdf: simplify implementation
@dschmitz89 dschmitz89 mentioned this pull request Dec 10, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants