Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use numba rather than C++ for fast polymer calculations #103

Merged
merged 7 commits into from
Feb 9, 2021

Conversation

pkienzle
Copy link
Member

@pkienzle pkienzle commented Feb 8, 2021

Note: similar to code from https://github.com/richardsheridan/sf_nscf/blob/3e72e5f3777928e8eb844f93bd48549bca8c9041/util.py#L185

The numba version of compose made no difference on the speed of the tests so I did not include it.

This version drops the C++ entirely, falling back to the numpy version if numba is not available.

Copy link
Contributor

@richardsheridan richardsheridan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge this, you should consider rewriting reflmodule with numba as well in light of #100.

Thanks for pinging me on this!

LAMBDA_ARRAY = np.array([LAMBDA_1, LAMBDA_0, LAMBDA_1])
MINLAT = 25
MINBULK = 5
SQRT_PI = sqrt(pi)

def correlate_same(a, b):
return np.correlate(a, b, 'same')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking up this correlation mode was absurdly expensive in the pure python mode back in the day, that's why I was selecting it with the weird internal int. See numpy/numpy#4999 and consider using multiarray.correlate based on timings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice significant timing differences when I chose one versus the other, so I went with the newer interface.

Yes, the multiarray.correlate function is faster when I test it alone:

In [187]: %timeit np.convolve(x, y, 'same')
5.4 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [188]: %timeit np.correlate(x, y, 'same')
3.05 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [189]: %timeit np.core.multiarray.correlate(x, y, 1)
1.5 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Trying again, I now see the times dropping significantly (5.7s down to 4.0s), so I'll revert to the deprecated function.

except ImportError:
USE_NUMBA = False

USE_NUMBA = True # Uncomment when doing timing tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be commented?

for r in range(1, segments):
g_zs[:, r] = pg_zs = correlate(pg_zs, LAMBDA_ARRAY, 1) * g_z
try:
from numba import njit, prange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested to know if prange helps for large numbers of segments. It seems like it could lead to cache contention in most cases though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prange made no significant difference in my tests, so I stopped using it

@bmaranville bmaranville merged commit 0b82a51 into master Feb 9, 2021
@pkienzle pkienzle deleted the polymer_numba branch May 3, 2022 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants