See blog post: [https://medium.com/rapids-ai/user-defined-functions-in-rapids-cudf-2d7c3fc2728d](https://medium.com/rapids-ai/user-defined-functions-in-rapids-cudf-2d7c3fc2728d)

In [None]:
from math import cos, sin, asin, sqrt, pi

import cudf
import numpy as np
from numba import cuda

In [None]:
np.random.seed(12)
data_length = 1000

df = cudf.DataFrame()
df['lat1'] = np.random.normal(10, 1, data_length)
df['lon1'] = np.random.normal(10, 1, data_length)
df['lat2'] = np.random.normal(10, 1, data_length)
df['lon2'] = np.random.normal(10, 1, data_length)

In [None]:
def haversine_distance_kernel(lat1, lon1, lat2, lon2, out):
    """Haversine distance formula taken from Michael Dunn's StackOverflow post:
    https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
    """
    for i, (x_1, y_1, x_2, y_2) in enumerate(zip(lat1, lon1, lat2, lon2)):
#         print('thread_id:', cuda.threadIdx.x, 'bid:', cuda.blockIdx.x,
#               'array size:', lat1.size, 'block threads:', cuda.blockDim.x, 'i:', i)

        x_1 = pi/180 * x_1
        y_1 = pi/180 * y_1
        x_2 = pi/180 * x_2
        y_2 = pi/180 * y_2
        
        dlon = y_2 - y_1
        dlat = x_2 - x_1
        a = sin(dlat/2)**2 + cos(x_1) * cos(x_2) * sin(dlon/2)**2
        
        c = 2 * asin(sqrt(a)) 
        r = 6371 # Radius of earth in kilometers
        
        out[i] = c * r
    print('thread_id:', cuda.threadIdx.x, 'bid:', cuda.blockIdx.x,
          'array size:', lat1.size, 'block threads:', cuda.blockDim.x, ' ran ', i+1, ' times.')
    

On my RTX 2070, **apply_rows** will create **15** blocks with **64** threads each, where most of the threads will execute the kernel function once (to process 15 x 64 = **960** entries). However 40 threads of block _0_ will run twice to cover the remaining 1000 - 960 = **40** entries.

In [None]:
df = df.apply_rows(haversine_distance_kernel,
                   incols=['lat1', 'lon1', 'lat2', 'lon2'],
                   outcols=dict(out=np.float64),
                   kwargs=dict())

In [None]:
print(df.head())

### _Note: print statements in kernels will only appear in terminal output; Jupyter Notebooks won't display them_
### _Sample print statement output:_
```
thread_id: 61 bid: 2 array size: 1 block threads: 64  ran  1  times.
thread_id: 62 bid: 2 array size: 1 block threads: 64  ran  1  times.
thread_id: 63 bid: 2 array size: 1 block threads: 64  ran  1  times.
thread_id: 0 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 1 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 2 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 3 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 4 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 5 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 6 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 7 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 8 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 9 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 10 bid: 0 array size: 2 block threads: 64  ran  2  times.
thread_id: 11 bid: 0 array size: 2 block threads: 64  ran  2  times.
...
```