# Euclidean distance matrix with CuPy

<p style="font-size:19px"> CuPy is an open-source array library accelerated with NVIDIA CUDA. It provides GPU accelerated computing with Python</p>
<table><tr>
<td> <img src=https://raw.githubusercontent.com/cupy/cupy/master/docs/image/cupy_logo_1000px.png alt="Drawing" style="width: 250px;"> </td>
<td> 
<ul style="font-size:18px">
 <li> CuPy uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL </li>
 <li> CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement </li>
 <li> It compiles a kernel code optimized for the shapes and dtypes of given arguments, sends it to the GPU device, and executes the kernel </li>
</ul>
</td>
</tr></table>

In [None]:
import numpy as np
import cupy as cp

In [None]:
# for cuda kernels, %timeit might miss some operations and
# give a time that's not correct. We implemented a 
# timer from cupy's utilities to time the cupy calls.
from cupy_timer import timer

In [None]:
def euclidean_cp(x, y):
    x2 = (x * x).sum(axis=1)[:, cp.newaxis]
    y2 = (y * y).sum(axis=1)[cp.newaxis, :]

    xy = x @ y.T

    return cp.abs(x2 + y2 - 2. * xy)

In [None]:
def euclidean_np(x, y):
    x2 = (x * x).sum(axis=1)[:, np.newaxis]
    y2 = (y * y).sum(axis=1)[np.newaxis, :]

    xy = x @ y.T

    return np.abs(x2 + y2 - 2. * xy)

In [None]:
nsamples = 6000
nfeat = 50

x_np = np.random.random([nsamples, nfeat])

In [None]:
%timeit euclidean_np(x_np, x_np)

In [None]:
x_cp = cp.asarray(x_np)

with timer():
    euclidean_cp(x_cp, x_cp)

Notice that the first time `euclidean_cp` is run is the slowest. This is because CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it compiles a kernel code optimized for the shapes and dtypes of given arguments and sends it to the GPU. Those steps are not necessary when the function is executed again, since the compiled kernel is cached in memory and already sent to the device. More info [here](https://docs.cupy.dev/en/stable/overview.html).

In [None]:
# let's check that the result is the same
edm_cp = euclidean_cp(x_cp, x_cp)
edm_np = euclidean_np(x_np, x_np)

np.abs(cp.asnumpy(edm_cp) - edm_np).max()

## CPU/GPU agnostic code

CuPy's `cp.get_array_module` can be used to know if an array comes from NumPy or from CuPy. With it, functions can be writting in a CPU/GPU-agnostic way!

In [None]:
xp = cp.get_array_module(x_cp)
xp

In [None]:
xp = cp.get_array_module(x_np)
xp

Let's *merge* our functions `euclidean_cp` and `euclidean_np`  into a single one `euclidean`:

In [None]:
def euclidean(x, y):
    xp = cp.get_array_module(x)
    # verify that x and y are of arrays of the same kind
    assert xp == cp.get_array_module(y)

    x2 = (x * x).sum(axis=1)[:, xp.newaxis]
    y2 = (y * y).sum(axis=1)[xp.newaxis, :]

    xy = x @ y.T

    return xp.abs(x2 + y2 - 2. * xy)

In [None]:
euclidean(x_cp, x_cp);  # returns a numpy array
euclidean(x_np, x_np);  # returns a cupy