Use cupy / numba to speed up computation? #48

janpipek · 2018-10-03T20:29:23Z

Use https://github.com/cupy/cupy as computing backend? Given the API similarity to numpy, it should be trivial.

Investigate if numba can lead to any speed up.

janpipek · 2018-10-03T20:32:15Z

https://github.com/astrofrog/fast-histogram also looks promising

henryiii · 2018-10-31T20:46:48Z

By the way, I looked at speed a bit, and I think there's a bit to be gained! I had 128 1M element arrays I needed 2D histograms from, and I had switched to Physt to make it simple to rebin and sum them (which it did!). But I found using numpy then giving it to Physt gave 3x better performance. I then redid it in numba and got 8-9x more performance. I hadn't tried fast-histogram, but it looks pretty similar to numba. Here are the timings. I originally did this on a Xeon Phi, which has 64 very slow cores and 4x hyperthreading. Obviously, none of these solutions is using it very well (to be honest, not much does). I reran it on my laptop too.

import numpy as np
import numba
import math
from physt import h2
import fast_histogram 
from physt.histogram_nd import Histogram2D

bins=(100, 100)
ranges=((-1,1),(-1,1))
bins = np.asarray(bins).astype(np.int64)
ranges = np.asarray(ranges).astype(np.float64)
    
edges = (np.linspace(*ranges[0,:], bins[0]+1),
         np.linspace(*ranges[1,:], bins[1]+1))

# This is 8-9 times faster than np.histogram
# We give the signature here so it gets precompmiled
# In theory, this could even be threaded (nogil!)
@numba.njit(nogil=True)
def fast_histogram2d(tracks, bins, ranges):
    H = np.zeros((bins[0], bins[1]), dtype=np.uint64)
    delta = 1/((ranges[:,1] - ranges[:,0]) / bins)
    for t in range(tracks.shape[1]):
        i = math.floor((tracks[0,t] - ranges[0,0]) * delta[0])
        j = math.floor((tracks[1,t] - ranges[1,0]) * delta[1])
        if 0 <= i < bins[0] and 0 <= j < bins[1]:
            H[i,j] += 1
    return H

vals = np.random.normal(size=[2, 1000000]).astype(np.float32)

%%timeit
h2(*vals, bins=edges)

Xeon Phi: 1.14 s ± 556 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
MacBook Pro: 293 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
H, *ledges = np.histogram2d(*vals, bins=bins, range=ranges)
Histogram2D(ledges, H)

Xeon Phi: 437 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
MacBook Pro: 114 ms ± 7.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit

H = fast_histogram.histogram2d(*vals, bins=100, range=((-1,1),(-1,1)))
Histogram2D(edges, H)

Xeon Phi: 53.7 ms ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
MacBook Pro: 10.4 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit

H = fast_histogram2d(vals, bins=bins, ranges=ranges)
Histogram2D(edges, H)

Xeon Phi: 41.6 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
MacBook Pro: 10.2 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

henryiii · 2018-10-31T20:48:44Z

I also think it would be helpful to add the numpy to physt example to the docs, since there's a speed benefit.

henryiii · 2018-10-31T21:21:35Z

Extra example: With numba, you can also release the GIL, though for some reason it only works on my macbook or a normal Xeon:

from concurrent.futures import ThreadPoolExecutor
from functools import partial

%%timeit
splits = 4
with ThreadPoolExecutor(max_workers=splits) as pool:
    chunk = vals.shape[1] // splits
    chunks = (vals[:,i*chunk:(i+1)*chunk] for i in range(splits))
    f = partial(fast_histogram2d, bins=bins, ranges=ranges)
    results = pool.map(f, chunks)
    results = sum(results)

MacBook Pro: 4.23 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

janpipek · 2018-11-01T13:36:55Z

Hi Henry,

thanks a lot for the ideas. I know that physt is a bit slow (especially for cases that are trivial with numpy). In the new design (if it ever comes), I aim to distinguish cases that require special approach (various clever binnings) and those that don't (and computation can be passed to numpy directly). The very basic kick out for supporting various backends (and their prioritisation) started in the branch https://github.com/janpipek/physt/tree/v04-engines/ btw.

I'd like to incorporate numba as well (as optional - I don't want this as hard dependency).

Unfortunately, I won't be able to touch the code until the end of November (or even attend a discussion about it). I am putting this issue in my internal queue and will get to it (and you) immediately afterwards. Sorry for the necessary delay :-(

Cheers,
Jan

henryiii · 2018-11-01T14:58:46Z

Here's my post on histogram performance: https://iscinumpy.gitlab.io/post/histogram-speeds-in-python/ (all examples eventually produce a Physt histogram)

janpipek added idea? performance labels Oct 3, 2018

janpipek added this to the 0.4 milestone Oct 3, 2018

janpipek self-assigned this Oct 3, 2018

henryiii mentioned this issue Oct 31, 2018

Release the GIL astrofrog/fast-histogram#30

Closed

janpipek mentioned this issue Nov 1, 2018

h1 (resp. calculate_frequencies) is too slow #35

Open

janpipek removed this from the 0.5 milestone Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cupy / numba to speed up computation? #48

Use cupy / numba to speed up computation? #48

janpipek commented Oct 3, 2018

janpipek commented Oct 3, 2018

henryiii commented Oct 31, 2018 •

edited

Loading

henryiii commented Oct 31, 2018 •

edited

Loading

henryiii commented Oct 31, 2018 •

edited

Loading

janpipek commented Nov 1, 2018

henryiii commented Nov 1, 2018 •

edited

Loading

Use cupy / numba to speed up computation? #48

Use cupy / numba to speed up computation? #48

Comments

janpipek commented Oct 3, 2018

janpipek commented Oct 3, 2018

henryiii commented Oct 31, 2018 • edited Loading

henryiii commented Oct 31, 2018 • edited Loading

henryiii commented Oct 31, 2018 • edited Loading

janpipek commented Nov 1, 2018

henryiii commented Nov 1, 2018 • edited Loading

henryiii commented Oct 31, 2018 •

edited

Loading

henryiii commented Oct 31, 2018 •

edited

Loading

henryiii commented Oct 31, 2018 •

edited

Loading

henryiii commented Nov 1, 2018 •

edited

Loading