In [1]:
import cupy as cp
import numpy as np

In [2]:
squared_diff = cp.ElementwiseKernel(
   'float32 x, float32 y',
   'float32 z',
   'z = (x - y) * (x - y)',
   'squared_diff')

In [6]:
x = cp.arange(10, dtype=np.float32).reshape(2, 5)
y = cp.arange(5, dtype=np.float32)
(x - y)**2

array([[ 0.,  0.,  0.,  0.,  0.],
       [25., 25., 25., 25., 25.]], dtype=float32)

The above kernel can be called on either scalars or arrays with broadcasting:

In [5]:
squared_diff(x, y)

array([[ 0.,  0.,  0.,  0.,  0.],
       [25., 25., 25., 25., 25.]], dtype=float32)

Output arguments can be explicitly specified (next to the input arguments):

In [8]:
z = cp.empty((2, 5), dtype=np.float32)
squared_diff(x, y, z)
print(z)

[[ 0.  0.  0.  0.  0.]
 [25. 25. 25. 25. 25.]]


### Type-generic kernels

If a type specifier is one character, then it is treated as a type placeholder. It can be used to define a type-generic kernels. 

Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.

In [20]:
squared_diff_generic = cp.ElementwiseKernel(
    'T x, T y',
    'T z',
    'z = (x - y) * (x - y)',
    'squared_diff_generic')

x = cp.linspace(0, 5, 6).reshape((2, 3))
y = cp.linspace(1, 6, 6).reshape((2, 3))
squared_diff_generic(x, y)

array([[1., 1., 1.],
       [1., 1., 1.]])

The type placeholder can be used in the loop body code

In [33]:
squared_diff_generic = cp.ElementwiseKernel(
    'T x, T y',
    'T z',
    '''
        T diff = x - y;
        z = diff * diff;
    ''',
    'squared_diff_generic')

squared_diff_generic(x, y)

array([[1., 1., 1.],
       [1., 1., 1.]])

In [31]:
x.size

6