This page describes the CUDA ufunc-like object.
To support the programming pattern of CUDA programs, CUDA Vectorize and GUVectorize cannot produce a conventional ufunc. Instead, a ufunc-like object is returned. This object is a close analog but not fully compatible with a regular NumPy ufunc. The CUDA ufunc adds support for passing intra-device arrays (already on the GPU device) to reduce traffic over the PCI-express bus. It also accepts a stream keyword for launching in asynchronous mode.
import math from numba import vectorize, cuda import numpy as np @vectorize(['float32(float32, float32, float32)', 'float64(float64, float64, float64)'], target='cuda') def cu_discriminant(a, b, c): return math.sqrt(b ** 2 - 4 * a * c) N = 10000 dtype = np.float32 # prepare the input A = np.array(np.random.sample(N), dtype=dtype) B = np.array(np.random.sample(N) + 10, dtype=dtype) C = np.array(np.random.sample(N), dtype=dtype) D = cu_discriminant(A, B, C) print(D) # print result
All CUDA ufunc kernels have the ability to call other CUDA device functions:
from numba import vectorize, cuda # define a device function @cuda.jit('float32(float32, float32, float32)', device=True, inline=True) def cu_device_fn(x, y, z): return x ** y / z # define a ufunc that calls our device function @vectorize(['float32(float32, float32, float32)'], target='cuda') def cu_ufunc(x, y, z): return cu_device_fn(x, y, z)
Generalized ufuncs may be executed on the GPU using CUDA, analogous to the CUDA ufunc functionality. This may be accomplished as follows:
from numba import guvectorize @guvectorize(['void(float32[:,:], float32[:,:], float32[:,:])'], '(m,n),(n,p)->(m,p)', target='cuda') def matmulcore(A, B, C): ...