Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve erf performance #4653

Merged
merged 2 commits into from Mar 21, 2015
Merged

improve erf performance #4653

merged 2 commits into from Mar 21, 2015

Conversation

juliantaylor
Copy link
Contributor

@juliantaylor juliantaylor commented Mar 21, 2015

as suggested by @pv allow inlining polynomials, combined with unrolling and isnan replace this bumps performance quite considerably:

import numpy as np
import scipy.special as sps
r=np.random.rand(int(1e5))
%timeit -n 1000 -r 5 sps.erf(r)
%timeit -n 1000 -r 5 sps.erf(r + 2.)

before:

1000 loops, best of 5: 2.7 ms per loop
1000 loops, best of 5: 9.56 ms per loop

after

1000 loops, best of 5: 1.17 ms per loop
1000 loops, best of 5: 6.42 ms per loop

…ndtr

Allows inlining of the constant size polynomials with gcc.
Unrolling polynomial evaluations and replace isnan with builtin to
double speed up erf(|x| < 1) and erfc by about 30%.
GCC does unfortunately not replace isnan with builtins automatically.
#ifndef cephes_isnan
#define cephes_isnan(x) npy_isnan(x)
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be worthwhile in many more places than just ndtr, might be a candidate to put into the main header

@pv pv merged commit f9e381b into scipy:master Mar 21, 2015
pv added a commit that referenced this pull request Mar 21, 2015
Allow inlining polynomials, combined with unrolling and isnan replace.
Do this for all Cephes functions.
@pv
Copy link
Member

pv commented Mar 21, 2015

Thanks, merged. Also moved the unroll+builtin-isnan part to apply to whole Cephes.
Didn't seem to cause slowdowns, and at least gamma function was sped up by 30%.

@pv pv added this to the 0.16.0 milestone Mar 21, 2015
@ewmoore ewmoore added the enhancement A new feature or improvement label Mar 21, 2015
@juliantaylor juliantaylor deleted the erf-improve branch August 30, 2015 11:18
@rgommers
Copy link
Member

rgommers commented Aug 7, 2022

I ran into this when cleaning up _c99compat.h (see gh-16800), and tested whether this still helps - it indeed does, so I didn't touch it:

In [2]: import numpy as np
   ...: import scipy.special as sps
   ...: r=np.random.rand(int(1e5))
   ...: %timeit -n 1000 -r 5 sps.erf(r)
   ...: %timeit -n 1000 -r 5 sps.erf(r + 2.)
395 µs ± 20.3 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)
1.74 ms ± 31.1 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)

remove the unroll pragma:

517 µs ± 7.35 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)
1.87 ms ± 25.7 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)

don't use __builtins_isnan at all:

723 µs ± 4.76 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)
2.04 ms ± 7.47 µs per loop (mean ± std. dev. of 5 runs, 1,000 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants