# Zero to Hero with User Defined Functions in RAPIDS

Sometimes, we need to write our own functions (commonly known as user defined functions or UDFs) and execute them on our data. RAPIDS and the broader GPU PyData ecosystem allow us to execute UDFs on a variety of data structures. This guide covers writing and executing UDFs on all the following data structures:

- Series
- DataFrame
- Rolling Windows
- Groupby DataFrames
- CuPy NDArrays
- Numba DeviceNDArrays

It also demonstrates cuDF's default null handling behavior, and how you can write UDFs that interact with null values if need be. With that, let's dive in.

## Series UDFs

You can execute UDFs on Series in two core ways:

- Writing a standard Python function and using `applymap`
- Writing a Numba kernel and using Numba's `forall` syntax

Using `applymap` is easier, but writing a Numba kernel gives you the flexibility to build more complex functions (though we'll only be writing simple kernels in this guide).

**NOTE: IS THIS TRUE?**

Let's start by importing a few libraries and creating a DataFrame of several Series.

In [94]:
import numpy as np
from numba import cuda

import cudf
from cudf.datasets import randomdata 
from librmm_cffi import librmm as rmm # RAPIDS Memory Manager

df = randomdata(nrows=10, dtypes={'a':float, 'b':bool, 'c':str})
df.head()

Unnamed: 0,a,b,c
0,0.376004,True,Frank
1,0.686498,True,Charlie
2,-0.560749,True,Sarah
3,-0.038857,True,Patricia
4,-0.034129,False,Alice


Next, we'll define a basic Python UDF and use it with `applymap`.

In [30]:
def udf(x):
    return x + 5

In [31]:
df['a'].applymap(udf).head()

0    4.437562
1    4.033587
2    5.466282
3    4.849833
4    4.425846
Name: a, dtype: float64

That's all there is to it. For more complex UDFs, though, we'll want to write an actual Numba kernel.

The easiest way to write a Numba kernel is to use `cuda.grid(1)` to manage your thread indices, and then leverage Numba's `forall` method to configure our kernel for us. Below, we import Numba.cuda and define a basic multiplication kernel.

In [97]:
@cuda.jit
def multiply(in_col, out_col, multiplier):
    i = cuda.grid(1)
    if i < in_col.size:
        out_col[i] = in_col[i] * multiplier

This kernel will take an input array, multiply it by a configurable value (supplied at runtime), and store the result in an output array. To execute this, we just need to pre-allocate an output array and leverage the `forall` method mentioned above. First, we create a Series of all `0.0` in our DataFrame, since we want `float64` output. Next, we run the kernel with `forall`. `forall` requires us to specify our desired number of tasks, so we'll supply in the length of our Series (which we store in `size`).

In [33]:
size = len(df['a'])
df['e'] = 0.0
multiply.forall(size)(df['a'], df['e'], 10.0)

After calling our kernel, our DataFrame is now populated with the result.

In [34]:
df.head()

Unnamed: 0,a,b,c,e
0,-0.562438,False,Bob,-5.62438
1,-0.966413,False,Laura,-9.664129
2,0.466282,False,Wendy,4.662817
3,-0.150167,True,Frank,-1.501671
4,-0.574154,False,Dan,-5.74154


Note that, while we're operating on the Series `df['e']`, the kernel is actually executing on the `DeviceNDArrays` "underneath" the Series. If you ever need to grab the underlying DeviceNDArray of a Series, you can do so like this: `Series.data.mem`. We'll see an example of doing this in the Null Handling section of this guide.

# DataFrame UDFs

We can apply a UDF on a DataFrame just like we did above with `forall`. We'd simply need to write a kernel expecting multiple inputs, and pass multiple Series as arguments when we execute our kernel. Because this is fairly common and can get a bit difficult to manage, cuDF provides two APIs to streamline this: `apply_rows` and `apply_chunks`. Below, we walk through an example of using `apply_rows` (`apply_chunks` works the same way--it just also gives you more control over low-level kernel behavior).

Now that we have two numeric column in our DataFrame, let's write a kernel that uses both of them.

In [35]:
def conditional_add(a, e, out):
    for i, (ai, ei) in enumerate(zip(a, e)):
        if ai > 0:
            out[i] = ai + ei
        else:
            out[i] = ai

Notice that we need to `enumerate` through our `zipped` function arguments (which correspond to our input column names). We can pass this kernel to `apply_rows`. We'll need to specify a few arguments:
- incols
    - A list of the input columns to use from the DataFrame. Note that these must match both our DataFrame column names and our function arguments.
- outcols
    - A dictionary defining our output column names and their data types. This also must match our function arguments.
- kwargs (optional)
    - We could optionally pass keyword arguments as a dictionary. Since we don't need any, we pass an empty one.
    
With that, we're ready to use our UDF.

In [36]:
df = df.apply_rows(conditional_add, 
                   incols=['a', 'e'],
                   outcols={'out': np.float64},
                   kwargs={}
                  )
df.head()

Unnamed: 0,a,b,c,e,out
0,-0.562438,False,Bob,-5.62438,-0.562438
1,-0.966413,False,Laura,-9.664129,-0.966413
2,0.466282,False,Wendy,4.662817,5.129099
3,-0.150167,True,Frank,-1.501671,-0.150167
4,-0.574154,False,Dan,-5.74154,-0.574154


As expected, we see our conditional addition worked. At this point, we're able to execute UDFs on the core data structures of cuDF.

## Rolling Window UDFs

We can also directly apply UDFs to `rolling` Series and DataFrames using `apply`. This example is adapted from the cuDF API documentation. First, we'll create an example Series and then create a `rolling` object from the Series.

In [37]:
ser = cudf.Series([16, 25, 36, 49, 64, 81], dtype='float64')
ser

0    16.0
1    25.0
2    36.0
3    49.0
4    64.0
5    81.0
dtype: float64

In [38]:
rolling = ser.rolling(window=3, min_periods=3, center=False)
rolling

Rolling [window=3,min_periods=3,center=False]

Next, we'll define a function to use on our rolling windows. We created this one to highlight how you can include things like loops, mathematical functions, and conditionals. Rolling window UDFs do not yet support null values.

In [39]:
import math

def example_func(window):
    b = 0
    for a in window:
        b = max(b, math.sqrt(a))
    if b == 8:
        return 100    
    return b

We can execute the function by passing it to `apply`. With `window=3`, `min_periods=3`, and `center=False`, our first two values are `null`.

We can apply this function to every column in a DataFrame, too.

In [24]:
df = cudf.DataFrame()
df['a'] = np.arange(10, dtype='float64')
df['b'] = np.arange(10, dtype='float64')
df.head()

Unnamed: 0,a,b
0,0.0,0.0
1,1.0,1.0
2,2.0,2.0
3,3.0,3.0
4,4.0,4.0


In [26]:
rolling = df.rolling(window=3, min_periods=3, center=False)
rolling.apply(example_func)

Unnamed: 0,a,b
0,,
1,,
2,1.414213562,1.414213562
3,1.732050808,1.732050808
4,2.0,2.0
5,2.236067977,2.236067977
6,2.449489743,2.449489743
7,2.645751311,2.645751311
8,2.828427125,2.828427125
9,3.0,3.0


# GroupBy DataFrame UDFs

We can also apply UDFs to grouped DataFrames using `apply_grouped`. This example is also drawn and adapted from the RAPIDS API documentation.

First, we'll group our DataFrame based on column `b`, which is either True or False. Note that we currently need to pass `method="cudf"` to use UDFs with GroupBy objects.

In [85]:
df.head()

Unnamed: 0,a,b,c,e,out
0,-0.562438,False,Bob,-5.62438,-0.562438
1,-0.966413,False,Laura,-9.664129,-0.966413
2,0.466282,False,Wendy,4.662817,5.129099
3,-0.150167,True,Frank,-1.501671,-0.150167
4,-0.574154,False,Dan,-5.74154,-0.574154


In [86]:
grouped = df.groupby(['b'], method="cudf")

Next we'll define a function to apply to each group independently. In this case, we'll take the rolling average of column `e`, and call that new column `rolling_avg_e`.

In [90]:
def rolling_avg(e, rolling_avg_e):
    win_size = 3
    for i in range(cuda.threadIdx.x, len(e), cuda.blockDim.x):
        if i < win_size - 1:
            # If there is not enough data to fill the window,
            # take the average to be NaN
            rolling_avg_e[i] = np.nan
        else:
            total = 0
            for j in range(i - win_size + 1, i + 1):
                total += e[j]
            rolling_avg_e[i] = total / win_size

We can use with an API just like `apply_rows`, except applied to groups.

In [91]:
results = grouped.apply_grouped(rolling_avg,
                               incols=['e'],
                               outcols=dict(rolling_avg_e=np.float64))
results

Unnamed: 0,a,b,c,e,out,rolling_avg_e
0,-0.562438,False,Bob,-5.62438,-0.562438,
1,-0.966413,False,Laura,-9.664129,-0.966413,
2,0.466282,False,Wendy,4.662817,5.129099,-3.541897251
3,-0.574154,False,Dan,-5.74154,-0.574154,-3.580950778
4,-0.559391,False,Ray,-5.593913,-0.559391,-2.224212087
5,-0.150167,True,Frank,-1.501671,-0.150167,
6,-0.98994,True,Xavier,-9.899398,-0.98994,
7,-0.469771,True,Hannah,-4.697712,-0.469771,-5.366260518
8,-0.858747,True,Ray,-8.587473,-0.858747,-7.728194578
9,0.841863,True,Patricia,8.418625,9.260488,-1.622186712


Notice how, with a window size of two, the first two values in each group for our output column are null.

# Numba Kernels on CuPy Arrays

We can also execute Numba kernels on CuPy NDArrays thanks to the `__cuda_array_interface__`. We can even run the same UDF on the Series and the CuPy array. First, we define a Series and then create a CuPy array from that Series.

In [96]:
import cupy as cp

s = cudf.Series([1.0, 2, 3, 4, 10])
arr = cp.asarray(s)
arr

array([ 1.,  2.,  3.,  4., 10.])

Next, we define our UDF and execute it on our Series. We need to allocate a Series for our output, which we'll call `out`.

In [184]:
@cuda.jit
def multiply_by_5(x, out):
    i = cuda.grid(1)
    if i < x.size: # boundary guard
        out[i] = x[i] * 5
        
out = cudf.Series(rmm.device_array(5))
multiply_by_5.forall(s.shape[0])(s, out)
print(out)

0     5.0
1    10.0
2    15.0
3    20.0
4    50.0
dtype: float64


Finally, we execute the same function on our array.

In [185]:
out = cp.zeros_like(arr)
multiply_by_5.forall(arr.size)(arr, out)
print(out)

[ 5. 10. 15. 20. 50.]


# Null Handling in UDFs

At this point, we've covered almost everything you need to know to use UDFs in the RAPIDS ecosystem. This section covers null handling in UDFs, which is an advanced topic.

Writing UDFs that can handle null values is complicated by the fact that a separate bitmask is used to identify when a value is valid and when it's null. By default, DataFrame methods for applying UDFs like `apply_rows` will handle nulls pessimistically (all rows with a null value will be removed from the output). Going into the details of how not doing this can lead to undefined behavior is outside the scope of this guide. Suffice it to say, pessimistic null handling is the safe and consistent approach. You can see an example below.

In [142]:
def gpu_add(a, b, out):
    for i, (x, y) in enumerate(zip(a, b)):
        out[i] = x + y

df = randomdata(nrows=5, dtypes={'a':int, 'b':int})
df.loc[2, 'a'] = None
df.loc[3, 'b'] = None
df.head()

Unnamed: 0,a,b
0,1033.0,953.0
1,984.0,942.0
2,,1050.0
3,1034.0,
4,1009.0,994.0


In [143]:
df = df.apply_rows(gpu_add, 
              incols=['a', 'b'],
              outcols={'out':np.float64},
              kwargs={})
df.head()

Unnamed: 0,a,b,out
0,1033.0,953.0,1986.0
1,984.0,942.0,1926.0
2,,1050.0,
3,1034.0,,
4,1009.0,994.0,2003.0


We can see that all input rows containing a null value resulted in a null value in the output.

## Operating on Null Values

If you don't need to conditionally handle null values in your UDFs, feel free to skip these final two sections.

As a developer or data scientist, you may sometimes need to write UDFs that operate on null values. This means you need to think about the null bitmask array when writing your UDF. cuDF allows you to turn off pessimistic null handling, and provides the `mask_get` utility function to help you interact with null bitmasks from Python. The following examples illustrate how you can use them in stand-alone `Numba.cuda` kernels and with `apply_rows`.

### Stand-alone Kernels

First, we import `mask_get` and create a DataFrame with some null values.

In [144]:
from cudf.utils.cudautils import mask_get

df = randomdata(nrows=10, dtypes={'a':float, 'b':bool})
df.loc[[2,4], 'a'] = None
df.head()

Unnamed: 0,a,b
0,-0.374178995,True
1,0.542777075,True
2,,False
3,-0.818403822,True
4,,True


Next, we'll define a simple kernel like before, with a couple of differences. This kernel needs access to the null bitmask, so we include a `validity_mask` argument. We also wrap our logic in a conditional based on the results of `mask_get`:
- If the result of `mask_get` for that index **is** valid (there is a value), do the multiplication
- If the result of `mask_get` for that index **is not** valid (it's null), set the output -999999

In [145]:
@cuda.jit
def gpu_kernel_masked(in_col, validity_mask, out_col, multiplier):
    i = cuda.grid(1)
    if i < in_col.size:
        valid = mask_get(validity_mask, i)
        if valid:
            out_col[i] = in_col[i] * multiplier
        else:
            out_col[i] = -999999

We now grab the underlying DeviceArrays and execute our kernel like we did previously, except that this time we also pass in the DeviceArray of our column's null mask. Because Numba doesn't yet handle masked CUDA arrays, we can't directly pass our `Series` here.

In [154]:
a_dary = df.a._column.data.mem
a_mask = df.a.nullmask.mem
output_dary = rmm.device_array_like(a_dary)

gpu_kernel_masked.forall(output_dary.size)(a_dary, a_mask, output_dary, 10)
df['result'] = output_dary
df.head()

Unnamed: 0,a,b,result
0,-0.374178995,True,-3.74179
1,0.542777075,True,5.427771
2,,False,-999999.0
3,-0.818403822,True,-8.184038
4,,True,-999999.0


### Apply Rows

Let's now define a similar kernel to use with the `apply_rows` method.

In [62]:
def gpu_kernel_masked(a, out_col, validity_mask, multiplier):
    for i, x in enumerate(a):
        valid = mask_get(validity_mask, i)
        if valid:
            out_col[i] = x * multiplier
        else:
            out_col[i] = -999999

We can pass **both** the `validity_mask` and `multiplier` arguments to the kernel as `kwargs`.

In [63]:
df = df.apply_rows(gpu_kernel_masked, 
                   incols=['a'],
                   outcols=dict(out_col=np.float64),
                   kwargs=dict(validity_mask=a_mask, multiplier=10)
                  )
df.head()

Unnamed: 0,a,b,result,out_col
0,0.017407146,False,0.174071,0.174071461
1,-0.279925393,False,-2.799254,-2.799253933
2,,False,-999999.0,
3,-0.221737247,True,-2.217372,-2.217372472
4,,False,-999999.0,


# Summary

This guide has covered a lot of content. At this point, you should hopefully feel comfortable writing UDFs (with or without null values) that operate on

- Series
- DataFrame
- Rolling Windows
- GroupBy DataFrames
- CuPy NDArrays
- Numba DeviceNDArrays


For more information please see the cuDF, Numba.cuda, 