# computing pairwise distance with GPU

The goal of this exercice is to compute pairwise distances among a set of $n$-dimensional vectors.


### What CPU and GPU am I using?

Before we start, lets check what processor and GPU we will be using. Performance can vary a lot depending on which model we are using. Google Collab does not allow us to choose the model, but it is free.

In [1]:
!echo "CPU:"
!cat /proc/cpuinfo | grep name
!echo "GPU:"
!nvidia-smi

CPU:
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
model name	: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
GPU:
/bin/bash: nvidia-smi: command not found


## CPU implementation

We provide a standard Python implementation for reference.

In [30]:
from numpy.random import seed
from numpy.random import random

## code to generate a random set of m points in n dimensions
m = 5
n = 3

small_data = random((m , n))
small_data

array([[0.00927446, 0.54247679, 0.76656062],
       [0.71933676, 0.69045087, 0.81789706],
       [0.54323127, 0.02609889, 0.87506565],
       [0.89616195, 0.96904053, 0.86202521],
       [0.4855708 , 0.57671429, 0.52784172]])

In [31]:
## scipy implementation
from scipy.spatial.distance import pdist,squareform

D_scipy = squareform( pdist(small_data) )
D_scipy

array([[0.        , 0.72713151, 0.75068592, 0.98875657, 0.53386995],
       [0.72713151, 0.        , 0.68967017, 0.33290635, 0.38950562],
       [0.75068592, 0.68967017, 0.        , 1.00691065, 0.65350325],
       [0.98875657, 0.33290635, 1.00691065, 0.        , 0.65892608],
       [0.53386995, 0.38950562, 0.65350325, 0.65892608, 0.        ]])

In [32]:
## simple numba cpu implementation
from numba import jit,njit,prange,cuda, types, float32
import numpy as np

@njit(parallel = True)
def pairwise_dist( points ):
    """
    Takes:
        - points : np.array  of dimension m x n  
                    - m: number of points
                    - n: number of dimensions
    Returns:
        m x m euclidean distance matrix
    """
    nb_points = points.shape[0]
    nb_dims = points.shape[1]
    
    D = np.zeros((nb_points,nb_points))
    
    for i in prange(nb_points):
        for j in range(nb_points):
            
            d = 0

            for k in range(nb_dims):

                d += (points[i,k] - points[j,k])**2

            D[i,j] = d**0.5
        

    return D
    

D_cpu = pairwise_dist( small_data )
D_cpu

array([[0.        , 0.72713151, 0.75068592, 0.98875657, 0.53386995],
       [0.72713151, 0.        , 0.68967017, 0.33290635, 0.38950562],
       [0.75068592, 0.68967017, 0.        , 1.00691065, 0.65350325],
       [0.98875657, 0.33290635, 1.00691065, 0.        , 0.65892608],
       [0.53386995, 0.38950562, 0.65350325, 0.65892608, 0.        ]])

In [33]:
## we check that the two methods give the same results
np.all(D_scipy == D_cpu)

True

The complexity is linear with the number of dimension generated, and square with the number of points.

Therefore here we use a larger number of points and dimensions to increase the execution time. 

In [35]:
# timing the estimate with a bigger dataset
m = 2000
n = 1000
data = random((m , n))

## scipy
%time D_scipy = squareform( pdist(data) )

## numba
%time D_cpu = pairwise_dist( data )

np.all( D_scipy == D_cpu )

CPU times: user 778 ms, sys: 3.1 ms, total: 781 ms
Wall time: 780 ms
CPU times: user 4.51 s, sys: 11.9 ms, total: 4.52 s
Wall time: 583 ms


True

## The CUDA implementation

Now it's your turn to implement the CUDA kernel! 



In [None]:
@cuda.jit
def pairwise_dist_gpu( points ):
    ## compute the distance between 2 points 
    ## determined by the position within the grid
    

# calling the function
size = data.size

blocksize = # block size = number of threads per block dimension
gridsize = # grid size = number of blocks per grid dimension

# Check!
D_gpu = pairwise_dist_gpu[gridsize, blocksize]( data )

np.all( D_scipy == D_gpu )

Now, time your function:

In [None]:
size = data.size

blocksize = # block size = number of threads per block dimension
gridsize = # grid size = number of blocks per grid dimension

%time D_gpu = pairwise_dist_gpu[gridsize, blocksize]( data )