-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple outputs for guvectorize
on CUDA target
#8303
Comments
Hi @s-m-e, thanks for the report. I'll put this up for discussion at the next triage meeting. |
Testing with this example from numba#8303: ```python from numba import guvectorize, jit, cuda import numpy as np TARGET = 'cuda' assert TARGET in ('cpu', 'parallel', 'cuda') if TARGET == 'cuda': hjit = cuda.jit hkwargs = dict(device=True, inline=True) else: hjit = jit hkwargs = dict(nopython=True, inline='always') def _parse_signature(s): s = s.replace('M', 'Tuple([V,V,V])') s = s.replace('V', 'Tuple([f,f,f])') return s.replace('f', 'f8') @hjit(_parse_signature('V(V,M)'), **hkwargs) def matmul_VM_(a, b): return ( a[0] * b[0][0] + a[1] * b[1][0] + a[2] * b[2][0], a[0] * b[0][1] + a[1] * b[1][1] + a[2] * b[2][1], a[0] * b[0][2] + a[1] * b[1][2] + a[2] * b[2][2], ) @guvectorize( _parse_signature('void(f[:],f[:],f[:],f[:],f[:],f[:])'), '(n),(n),(n)->(n),(n),(n)', # "layout" in numba terms target=TARGET, nopython=True, ) def foo(a, b, c, x, y, z): R = ( (0.2, 0.8, 0.3), (0.3, 0.5, 0.6), (0.4, 0.1, 0.8), ) for idx in range(a.shape[0]): x[idx], y[idx], z[idx] = matmul_VM_((a[idx], b[idx], c[idx]), R) LEN = 100_000_000 data = np.arange(0, 3 * LEN, dtype='f8').reshape(3, LEN) res = np.zeros_like(data) foo(data[0], data[1], data[2], res[0], res[1], res[2]) print(res) ``` There are still some mismatches to fix up yet.
@s-m-e I'm starting working on this on this branch: https://github.com/gmarkall/numba/tree/issue-8303 Presently I'm just hacking away to see what needs changing ("brute force" modifications 🙂) because I'm not too familiar with the code in this area. Hopefully it will just need a bunch of small fixes to turn the assumption of a single output into iterations over multiple outputs in various locations. |
Feature request
The following works just fine for targets
cpu
andparallel
...... but fails to compile with
AssertionError: only support 1 output
for targetcuda
. It would be really useful if this would also work for CUDA.In the meantime, the current behavior is, as far as I can tell, not documented.
For a longer discussion, context and examples, see here.
The text was updated successfully, but these errors were encountered: