-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cupy / numba to speed up computation? #48
Comments
https://github.com/astrofrog/fast-histogram also looks promising |
By the way, I looked at speed a bit, and I think there's a bit to be gained! I had 128 1M element arrays I needed 2D histograms from, and I had switched to Physt to make it simple to rebin and sum them (which it did!). But I found using numpy then giving it to Physt gave 3x better performance. I then redid it in numba and got 8-9x more performance. I hadn't tried fast-histogram, but it looks pretty similar to numba. Here are the timings. I originally did this on a Xeon Phi, which has 64 very slow cores and 4x hyperthreading. Obviously, none of these solutions is using it very well (to be honest, not much does). I reran it on my laptop too. import numpy as np
import numba
import math
from physt import h2
import fast_histogram
from physt.histogram_nd import Histogram2D bins=(100, 100)
ranges=((-1,1),(-1,1))
bins = np.asarray(bins).astype(np.int64)
ranges = np.asarray(ranges).astype(np.float64)
edges = (np.linspace(*ranges[0,:], bins[0]+1),
np.linspace(*ranges[1,:], bins[1]+1)) # This is 8-9 times faster than np.histogram
# We give the signature here so it gets precompmiled
# In theory, this could even be threaded (nogil!)
@numba.njit(nogil=True)
def fast_histogram2d(tracks, bins, ranges):
H = np.zeros((bins[0], bins[1]), dtype=np.uint64)
delta = 1/((ranges[:,1] - ranges[:,0]) / bins)
for t in range(tracks.shape[1]):
i = math.floor((tracks[0,t] - ranges[0,0]) * delta[0])
j = math.floor((tracks[1,t] - ranges[1,0]) * delta[1])
if 0 <= i < bins[0] and 0 <= j < bins[1]:
H[i,j] += 1
return H vals = np.random.normal(size=[2, 1000000]).astype(np.float32) %%timeit
h2(*vals, bins=edges)
%%timeit
H, *ledges = np.histogram2d(*vals, bins=bins, range=ranges)
Histogram2D(ledges, H)
%%timeit
H = fast_histogram.histogram2d(*vals, bins=100, range=((-1,1),(-1,1)))
Histogram2D(edges, H)
%%timeit
H = fast_histogram2d(vals, bins=bins, ranges=ranges)
Histogram2D(edges, H)
|
I also think it would be helpful to add the numpy to physt example to the docs, since there's a speed benefit. |
Extra example: With numba, you can also release the GIL, though for some reason it only works on my macbook or a normal Xeon: from concurrent.futures import ThreadPoolExecutor
from functools import partial %%timeit
splits = 4
with ThreadPoolExecutor(max_workers=splits) as pool:
chunk = vals.shape[1] // splits
chunks = (vals[:,i*chunk:(i+1)*chunk] for i in range(splits))
f = partial(fast_histogram2d, bins=bins, ranges=ranges)
results = pool.map(f, chunks)
results = sum(results)
|
Hi Henry, thanks a lot for the ideas. I know that I'd like to incorporate numba as well (as optional - I don't want this as hard dependency). Unfortunately, I won't be able to touch the code until the end of November (or even attend a discussion about it). I am putting this issue in my internal queue and will get to it (and you) immediately afterwards. Sorry for the necessary delay :-( Cheers, |
Here's my post on histogram performance: https://iscinumpy.gitlab.io/post/histogram-speeds-in-python/ (all examples eventually produce a Physt histogram) |
Use https://github.com/cupy/cupy as computing backend? Given the API similarity to numpy, it should be trivial.
Investigate if numba can lead to any speed up.
The text was updated successfully, but these errors were encountered: