Allow multiple outputs for `guvectorize` on CUDA target #8303

s-m-e · 2022-08-01T11:00:03Z

Feature request

The following works just fine for targets cpu and parallel ...

from numba import guvectorize
import numpy as np

@guvectorize(
   'void(f8[:],f8[:],f8[:],f8[:],f8[:],f8[:])',
    '(n),(n),(n)->(n),(n),(n)',
    target = 'cpu',
    nopython = True,
)
def foo(a, b, c, x, y, z):
    for idx in range(a.shape[0]):
        x[idx], y[idx], z[idx] = a[idx] * b[idx], b[idx] * c[idx], c[idx] * a[idx]

LEN = 100_000_000

data = np.arange(0, 3 * LEN, dtype = 'f8').reshape(3, LEN)
res = np.zeros_like(data)

foo(data[0], data[1], data[2], res[0], res[1], res[2])  # providing views on `res`

print(res)

... but fails to compile with AssertionError: only support 1 output for target cuda. It would be really useful if this would also work for CUDA.

In the meantime, the current behavior is, as far as I can tell, not documented.

For a longer discussion, context and examples, see here.

The text was updated successfully, but these errors were encountered:

guilhermeleobas · 2022-08-01T13:46:29Z

Hi @s-m-e, thanks for the report. I'll put this up for discussion at the next triage meeting.

Testing with this example from numba#8303: ```python from numba import guvectorize, jit, cuda import numpy as np TARGET = 'cuda' assert TARGET in ('cpu', 'parallel', 'cuda') if TARGET == 'cuda': hjit = cuda.jit hkwargs = dict(device=True, inline=True) else: hjit = jit hkwargs = dict(nopython=True, inline='always') def _parse_signature(s): s = s.replace('M', 'Tuple([V,V,V])') s = s.replace('V', 'Tuple([f,f,f])') return s.replace('f', 'f8') @hjit(_parse_signature('V(V,M)'), **hkwargs) def matmul_VM_(a, b): return ( a[0] * b[0][0] + a[1] * b[1][0] + a[2] * b[2][0], a[0] * b[0][1] + a[1] * b[1][1] + a[2] * b[2][1], a[0] * b[0][2] + a[1] * b[1][2] + a[2] * b[2][2], ) @guvectorize( _parse_signature('void(f[:],f[:],f[:],f[:],f[:],f[:])'), '(n),(n),(n)->(n),(n),(n)', # "layout" in numba terms target=TARGET, nopython=True, ) def foo(a, b, c, x, y, z): R = ( (0.2, 0.8, 0.3), (0.3, 0.5, 0.6), (0.4, 0.1, 0.8), ) for idx in range(a.shape[0]): x[idx], y[idx], z[idx] = matmul_VM_((a[idx], b[idx], c[idx]), R) LEN = 100_000_000 data = np.arange(0, 3 * LEN, dtype='f8').reshape(3, LEN) res = np.zeros_like(data) foo(data[0], data[1], data[2], res[0], res[1], res[2]) print(res) ``` There are still some mismatches to fix up yet.

gmarkall · 2022-08-03T16:43:15Z

@s-m-e I'm starting working on this on this branch: https://github.com/gmarkall/numba/tree/issue-8303

Presently I'm just hacking away to see what needs changing ("brute force" modifications 🙂) because I'm not too familiar with the code in this area. Hopefully it will just need a bunch of small fixes to turn the assumption of a single output into iterations over multiple outputs in various locations.

guilhermeleobas added the feature_request label Aug 1, 2022

guilhermeleobas added the needtriage label Aug 1, 2022

gmarkall added the CUDA CUDA related issue/PR label Aug 3, 2022

esc added this to the Numba 0.57 RC milestone Aug 9, 2022

esc assigned gmarkall Aug 9, 2022

stuartarchibald removed the needtriage label Aug 15, 2022

gmarkall mentioned this issue Mar 7, 2023

CUDA: Support multiple outputs for Generalized Ufuncs #8341

Merged

sklam closed this as completed in #8341 Mar 23, 2023

s-m-e mentioned this issue Feb 24, 2024

[WIP] jit/core redesign pleiszenburg/hapsira#7

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple outputs for `guvectorize` on CUDA target #8303

Allow multiple outputs for `guvectorize` on CUDA target #8303

s-m-e commented Aug 1, 2022 •

edited

Loading

guilhermeleobas commented Aug 1, 2022

gmarkall commented Aug 3, 2022

Allow multiple outputs for guvectorize on CUDA target #8303

Allow multiple outputs for guvectorize on CUDA target #8303

Comments

s-m-e commented Aug 1, 2022 • edited Loading

Feature request

guilhermeleobas commented Aug 1, 2022

gmarkall commented Aug 3, 2022

Allow multiple outputs for `guvectorize` on CUDA target #8303

Allow multiple outputs for `guvectorize` on CUDA target #8303

s-m-e commented Aug 1, 2022 •

edited

Loading