New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: simplify gausshyper.pdf
#18799
ENH: simplify gausshyper.pdf
#18799
Conversation
gausshyper.pdf
_tanhsinh provides a speedup 3.7x speedup even for individual evaluations of this function's CDF (except at For vectorized calls, I've seen This is on a CPU, but when the function can be evaluated on a GPU, it will take advantage of that, and we can expect a few more orders of magnitude improvement. ( So if we can wait until the end of the summer, during which I'm working on gh-15928, I think we can get a substantial speed boost without any special effort. There will be similar speedups for vectorized calls to rvs and ppf. Another advantage of OTOH I see you asked for Warren's review, so it's fine with me if he will review it. BTW, I'd suggest Update: make that 7.5x speedup when doing a Python loop over the 999 points. import numpy as np
from scipy import stats
from scipy import special as sc
from scipy.integrate._tanhsinh import _tanhsinh_noiv
from time import perf_counter
from mpmath import mp
mp.dps = 50
a, b, c, z = 1.5, 2.5, 2, 0
def _pdf(x):
Cinv = sc.gamma(a) * sc.gamma(b) / sc.gamma(a + b) * sc.hyp2f1(c, a, a + b, -z)
return 1.0 / Cinv * x ** (a - 1.0) * (1.0 - x) ** (b - 1.0) / (1.0 + z * x) ** c
def _pdf_mpmath(x):
a, b, c, z = 1.5, 2.5, 2, 0
a, b, c, z = mp.mpf(a), mp.mpf(b), mp.mpf(c), mp.mpf(z)
Cinv = mp.gamma(a) * mp.gamma(b) / mp.gamma(a + b) * mp.hyp2f1(c, a, a + b, -z)
return mp.one / Cinv * x ** (a - mp.one) * (mp.one - x) ** (b - mp.one) / (mp.one + z * x) ** c
gh = stats.gausshyper(a=a, b=b, c=c, z=z)
x = np.linspace(0, 1, 1000)[1:]
tic = perf_counter()
cdf_vals = gh.cdf(x)
toc = perf_counter()
t1 = toc-tic
print(f"Time for 1000 CDF value (main): {t1}")
tic = perf_counter()
cdf_vals2 = [_tanhsinh_noiv(_pdf, 0, xi, log=False, maxfun=np.inf, maxlevel=10,
minlevel=2, atol=0, rtol=1e-12).integral for xi in x]
toc = perf_counter()
t2 = toc-tic
print(f"Time for 1000 CDF values (_tanhsinh): {t2}")
cdf_vals3 = np.asarray([mp.quad(_pdf_mpmath, (mp.mpf(0), mp.mpf(xi))) for xi in x], dtype=np.float64)
rerr = abs((cdf_vals - cdf_vals3)/(cdf_vals3))
rerr2 = abs((np.asarray(cdf_vals2) - cdf_vals3)/(cdf_vals3))
rerr = rerr[rerr != 0]
rerr2 = rerr2[rerr2 != 0]
print(f"Speedup: {t1/t2}")
print(f"gmean of relative error (quad): {stats.gmean(rerr)}")
print(f"gmean of relative error (tanhsinh): {stats.gmean(rerr2)}")
print(f"max relative error (quad): {np.max(rerr)}")
print(f"max relative error (tanhsinh): {np.max(rerr2)}") |
Sounds like awesome times are coming for For the fun of it, what effect does changing
|
Actually, we can reduce some inaccuracy just by using
|
I will add some tests against the analytical CDF expression implemented in mpmath over the next days as I have started to doubt mpmath's quadrature results sometimes. |
I've seen inaccuracy for entropy calculations when I agree that it may not be fair to compare
Using
All the integrals are identical. |
I can help with the speed up this code, since np.vectorize is a glorified for loop still in Python side. But I'm not sure I understand what the code is doing. The |
Thanks for the offer. If you have an idea how to speed up the integration more by some cython magic, I am all ears. The |
@mdhaber : if the new integrator does better in this case, let's close this PR. I understood that its relative error was anyway much lower . |
I wouldn't consider that a dealbreaker, since the result is very close to one and can just be clipped, but:
But I'm happy with this being closed since this will already speed up quite a bit in pure Python. Besides import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, interpolate
import scipy.stats.sampling as sampling
a, b, c, z = 13.8, 3.12, 2.51, 5.18
f = lambda x: stats.gausshyper._pdf(x, a, b, c, z)
def integrate(f):
x = np.linspace(0, 1, 1000)
f = interpolate.PchipInterpolator(x, f(x), axis=-1)
F = f.antiderivative()
return F
F = integrate(f)
x = np.linspace(0, 1, 200)
res = F(x)
ref = stats.gausshyper.cdf(x, a, b, c, z)
plt.plot(x, ref, '-', label='original')
plt.plot(x, res, '--', label='interpolant')
plt.legend() Even with the cost of setting up the interpolant, this is a faster way of getting the CDF at a few points. %timeit F = integrate(f); F(x) # 898 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit stats.gausshyper.cdf(x, a, b, c, z) # 35.6 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# this is for evaluating at 200 points, but even for a single point gausshyper.cdf currently takes about 0.5 ms. and of course once the interpolant has been generated, it's super fast to compute with and reasonably accurate. %timeit F(x) # 19.1 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.max(abs(F(x) - stats.gausshyper.cdf(x, a, b, c, z))) # 6.080010717113282e-09 And the accuracy is something that the user can specify; the tradeoff is just additional evaluations of the PDF. Update: I developed these ideas a bit further in https://gist.github.com/mdhaber/b112732e5b19889a5589cfdce7e055c9. It's promising, so @tupui and I might work on that after the main infrastructure improvements. For large draws, that has the potential to be faster than anything else discussed here. |
I vectorized Time for 1000 CDF value (main): 1.0867439999710768
Time for 1000 CDF values (_tanhsinh): 0.005081499926745892
Speedup: 213.8628388541589
gmean of relative error (quad): 7.961428032643411e-16
gmean of relative error (tanhsinh): 4.46206718903016e-16
max relative error (quad): 1.1235272457145405e-07
max relative error (tanhsinh): 1.445515587446478e-13 That's a factor of 213.86 compared to 25.9 reported in the top post. Then for the random samples, using the scalar root finder from gh-17719 from scipy.optimize._zeros_py import _chandrupatla
rng = np.random.default_rng(8346584583456)
f = lambda x, p: _tanhsinh2(_pdf, 0, x).integral - p
tic = perf_counter()
p = rng.random(1000)
res = _chandrupatla(f, 0, 1, args=(p,))
toc = perf_counter()
print(f"Time for 1000 observations: {toc-tic}")
# Time for 1000 observations: 0.05970019998494536
tic = perf_counter()
samples = gh.rvs(size=1000)
toc = perf_counter()
print(f"Time for 1000 observations (main): {toc-tic}")
# Time for 1000 observations (main): 10.13877489999868 That's a factor of 169.8 compared to 6.45 reported in the top post. Re: #18799 (comment), it is certainly possible to beat these pure-Python solutions if the integrand and all the loops are brought into C++. So if that speed is needed, I'd be happy for it to happen, although personally I'd be more interested in gh-18056. I just thought I should mention the speedups we will get for free pretty soon, and the pure-Python solutions will be useful for execution with non-CPU arrays. |
Looks like the CDF part might not be wanted then, but what about the PDF and other additions? This still seems to have value no? |
Doing the whole integration in cython seems unnecessary then definitely. Regarding @tupui 's comment there are two things that can be done in my opinion:
Any opinions? |
I'd suggest just doing the easy thing for now. The existing tests are (hopefully) strong enough to detect a mistake, so you shouldn't need any tests if the intent is just simplification. It would be good to test locally to make sure there are no new numerical issues, but I have a hard time seeing how there would be since you're replacing a ratio of @tirthasheshpatel and @steppi are working on something that will make it even easier to do the harder thing, so I'd suggest waiting on that.
The short answer is not really, as-is. You can see that the PR (as-is) replaces a natively vectorized PDF with one that uses |
I am really talking about the Cython part here, not the Python. |
gausshyper.pdf
gausshyper.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the style is going to change, I'd prefer to keep it self consistent (e.g. 1.
or 1.0
throughout). It would also be nice to avoid the line continuation, which is easy with a slightly shorter variable name (e.g. normalization
). It would also be OK to leave them all that alone to reduce the diff. But to avoid further delay, I'll merge once tests finish.
* ENH: stats.gausshyper.pdf: simplify implementation
Reference issue
Part of #10106 and #17832
What does this implement/fix?
There is no analytical expression available for the gausshyper distribution PPF. To make things worse, implementing the CDF is not possible without high precision codes for Appel's hypergeometric functions (see #18056). That means that the default random variate generation goes through numerical integration to calculate the CDF and then inverts it via a rootfinder. At least the quadrature integration can be speeded up by providing a lowlevel callable of the PDF in cython.
I pretty much blindly copied the approach from the genhyperbolic distribution implemented in #18134.
Additional information
I did not further check the accuracy of the PDF in the tails, here I focussed on performance only. Locally, this passed the generic distribution tests for me at least. Going via the logpdf could be done in a follow up.
Benchmark:
This PR:
Time for 1000 CDF values: 0.03378629684448242
Time for 1000 samples: 1.2594482898712158
Main:
Time for 1000 CDF values: 0.8763303756713867
Time for 1000 samples: 8.12756633758545
Sampling is still not "fast" but at least faster. Maybe with a vectorized rootfinder (#17719), the PPF calculation will become much faster in future.