# K Nearest Neighbours Algorithm Using Numba-dpex

![Assets/knn2.png](Assets/knn2.png)

## Sections
- [KNN Algorithm](#KMeans-Algorithm)
- _Code:_ [Implementation of KNN targeting CPU using Numba JIT](#Implementation-of-KNN-targeting-CPU-using-Numba-JIT)
- _Code:_ [Implementation of KNN targeting GPU using Numba JIT](#Implementation-of-KNN-targeting-GPU-using-Numba-JIT)
- _Code:_ [Implementation of KNN targeting GPU using Kernels](#Implementation-of-KNN-targeting-GPU-using-Kernels)

## Learning Objectives

* Build a Numba implementation of KNN targeting CPU and GPU using Numba JIT
* Build a Numba-dpex implementation of KNN on CPU and GPU using kernel approach

## numba-dpex

Numba-dpex is a standalone extension to the Numba JIT compiler that adds SYCL programming capabilities to Numba. Numba-dpex is packaged as part of the IDP that comes with oneAPI base toolkit, and you don’t need to install any specific Conda packages. The support for SYCL is via SYCL runtime and other SYCL compilers are not supported by Numba-dpex.



## Command Line parameters

| Type | Default Value | Description |
|:---|:---|:---|
| --steps | 10 | Number of workload runs |
| --step | 2  | Data growth factor on each iteration |
| --size | 2 ** 28 | Initial data size |
| --repeat | 1 | Iterations inside measured region |
| --json | False | Output json data filename |
| -d | 1 | Data Dimension |
| --usm | False | Use USM Shared |

# KNN Algorithm
KNN is a supervised ML algorithm that based on similar patters where the items are placed closer to each other. Supervised machine learning involves situations where there is a known outcome from historical data. ​ The goal is to take a small subset of data to learn how to predict these outcomes so that the same method can be applied later to predict outcomes for as yet unseen data. For supervised learning, you need both “X”, the feature data, as well as “y” data, also known as a target variable.

Classification routines are typically used to analyze features of a dataset. Think of these as columns of data in a spreadsheet, and to find patterns among the data in order to predict or classify a target variable, think of this as a single column (typically) that we might call the “y” value. In classification, the “y” value is usually a discrete variable – such as “tumor”, “no tumor”. Another example might be “cat” versus “dog” versus “bird”. 

In Regression problems, outcome is a continuous number​ For example, house prices, box office revenue, attendance to an event etc. 

KNN is a simple and powerful ML algorithm that places similar items together. We need model data with features that can be quantitated, labels that are known and method to measure similarity.

In KNN the first thing we need is the correctly determined value of k. Second we need to know how to measure the distance of neighbors using for example L2 distance.

# Implementation of KNN using Numba JIT
In the following example, we introduce a naive KNN implementation that targets a CPU using the Numba JIT.

This is the decorator-based approach, where we offload data parallel code sections like parallel-for, and certain NumPy function calls. With the decorator method, a programmer needs to simply identify the most time-consuming parts of the program. If those parts can be parallelized, the programmer needs to annotate those sections using Numba-dpex, and can expect those code sections to execute on a GPU.




1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

In [None]:
%%writefile lab/knn_jit.py
import base_knn_jit
import numpy as np

import numba
from numba import jit, njit, vectorize


@jit(nopython=True)
def euclidean_dist(x1, x2):
    distance = 0

    for i in range(len(x1)):
        diff = x1[i] - x2[i]
        distance += diff * diff

    result = distance**0.5
    # result = np.sqrt(distance)
    return result


@jit(nopython=True)
def push_queue(queue_neighbors, new_distance, index=4):
    while index > 0 and new_distance[0] < queue_neighbors[index - 1][0]:
        queue_neighbors[index] = queue_neighbors[index - 1]
        index = index - 1
        queue_neighbors[index] = new_distance


@jit(nopython=True)
def sort_queue(queue_neighbors):
    for i in range(len(queue_neighbors)):
        push_queue(queue_neighbors, queue_neighbors[i], i)


@jit(nopython=True)
def simple_vote(neighbors, classes_num=3):
    votes_to_classes = np.zeros(classes_num)

    for neighbor in neighbors:
        votes_to_classes[neighbor[1]] += 1

    max_ind = 0
    max_value = 0

    for i in range(classes_num):
        if votes_to_classes[i] > max_value:
            max_value = votes_to_classes[i]
            max_ind = i

    return max_ind


@jit(nopython=True, parallel=True)
def run_knn(train, train_labels, test, k=5, classes_num=3):
    test_size = len(test)
    train_size = len(train)

    predictions = np.empty(test_size)

    for i in numba.prange(test_size):
        queue_neighbors = []

        for j in range(k):
            dist = euclidean_dist(train[j], test[i])
            # queue_neighbors[j] = (dist, train_labels[j])
            queue_neighbors.append((dist, train_labels[j]))

        sort_queue(queue_neighbors)

        for j in range(k, train_size):
            dist = euclidean_dist(train[j], test[i])
            new_neighbor = (dist, train_labels[j])

            if dist < queue_neighbors[k - 1][0]:
                queue_neighbors[k - 1] = new_neighbor
                push_queue(queue_neighbors, new_neighbor)

        predictions[i] = simple_vote(queue_neighbors, classes_num)

    return predictions
base_knn_jit.run("K-Nearest-Neighbors Numba", run_knn)

### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [None]:
! chmod 755 q; chmod 755 run_knn_cpu.sh; if [ -x "$(command -v qsub)" ]; then ./q run_knn_cpu.sh; else ./run_knn_cpu.sh; fi

_If the Jupyter cells are not responsive or if they error out when you compile the code samples, please restart the Jupyter Kernel: 
"Kernel->Restart Kernel and Clear All Outputs" and compile the code samples again__

## Implementation of KNN targeting GPU using Kernels

## Writing Explicit Kernels in numba-dpex

Writing a SYCL kernel using the `@numba_dpex.kernel` decorator has similar syntax to writing OpenCL kernels. As such, the numba-dpex module provides similar indexing and other functions as OpenCL. The indexing functions supported inside a `numba_dpex.kernel` are:

* numba_dpex.get_local_id : Gets the local ID of the item
* numba_dpex.get_local_size: Gets the local work group size of the device
* numba_dpex.get_group_id : Gets the group ID of the item
* numba_dpex.get_num_groups: Gets the number of gropus in a worksgroup

Refer https://intelpython.github.io/numba-dpex/latest/user_guides/kernel_programming_guide/index.html for more details.

In the following example we use the dpex-kernel approach for explicit kernel programming where, if the programmer wants to extract further performance from the offloaded code, the programmer can use the explicit kernel programming approach using dpex-kernels, and tune the GPU parameters where we take advantage of the work groups and the work items in a device using the kernel approach.

1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

In [None]:
%%writefile lab/knn_kernel.py
# SPDX-FileCopyrightText: 2022 - 2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0

from math import sqrt
import base_knn

import numba_dpex as dpex
import numpy as np


@dpex.kernel
def _knn_kernel(  # noqa: C901: TODO: can we simplify logic?
    train,
    train_labels,
    test,
    k,
    classes_num,
    train_size,
    predictions,
    votes_to_classes_lst,
    data_dim,
):
    dtype = train.dtype
    i = dpex.get_global_id(0)
    # here k has to be 5 in order to match with numpy
    queue_neighbors = dpex.private.array(shape=(5, 2), dtype=dtype)

    for j in range(k):
        x1 = train[j]
        x2 = test[i]

        distance = dtype.type(0.0)
        for jj in range(data_dim):
            diff = x1[jj] - x2[jj]
            distance += diff * diff
        dist = sqrt(distance)

        queue_neighbors[j, 0] = dist
        queue_neighbors[j, 1] = train_labels[j]

    for j in range(k):
        new_distance = queue_neighbors[j, 0]
        new_neighbor_label = queue_neighbors[j, 1]
        index = j

        while index > 0 and new_distance < queue_neighbors[index - 1, 0]:
            queue_neighbors[index, 0] = queue_neighbors[index - 1, 0]
            queue_neighbors[index, 1] = queue_neighbors[index - 1, 1]

            index = index - 1

            queue_neighbors[index, 0] = new_distance
            queue_neighbors[index, 1] = new_neighbor_label

    for j in range(k, train_size):
        x1 = train[j]
        x2 = test[i]

        distance = dtype.type(0.0)
        for jj in range(data_dim):
            diff = x1[jj] - x2[jj]
            distance += diff * diff
        dist = sqrt(distance)

        if dist < queue_neighbors[k - 1][0]:
            queue_neighbors[k - 1][0] = dist
            queue_neighbors[k - 1][1] = train_labels[j]
            new_distance = queue_neighbors[k - 1, 0]
            new_neighbor_label = queue_neighbors[k - 1, 1]
            index = k - 1

            while index > 0 and new_distance < queue_neighbors[index - 1, 0]:
                queue_neighbors[index, 0] = queue_neighbors[index - 1, 0]
                queue_neighbors[index, 1] = queue_neighbors[index - 1, 1]

                index = index - 1

                queue_neighbors[index, 0] = new_distance
                queue_neighbors[index, 1] = new_neighbor_label

    votes_to_classes = votes_to_classes_lst[i]

    for j in range(len(queue_neighbors)):
        votes_to_classes[int(queue_neighbors[j, 1])] += 1

    max_ind = 0
    max_value = dtype.type(0)

    for j in range(classes_num):
        if votes_to_classes[j] > max_value:
            max_value = votes_to_classes[j]
            max_ind = j

    predictions[i] = max_ind


def knn(
    x_train,
    y_train,
    x_test,
    k,
    classes_num,
    test_size,
    train_size,
    predictions,
    votes_to_classes,
    data_dim,
):
    _knn_kernel[test_size,](
        x_train,
        y_train,
        x_test,
        k,
        classes_num,
        train_size,
        predictions,
        votes_to_classes,
        data_dim,
    )


### Build and Run
Select the cell below and click run ▶ to compile and execute the code:

In [None]:
! chmod 755 q; chmod 755 run_knn_kernel.sh; if [ -x "$(command -v qsub)" ]; then ./q  run_knn_kernel.sh; else ./run_knn_kernel.sh; fi

# Plot GPU Results

Below sample runs the KNN algorithm on the GPU and plots the points based on input data and the test data.

Here’s an example that runs the KNN algorithm using numba-dpex on a GPU and plots the centroids with the cluster of points:

1. Inspect the code cell below and click run ▶ to save the code to a file.
2. Next run ▶ the cell in the __Build and Run__ section below the code to compile and execute the code.

### View the results
Select the cell below and click run ▶ to view the graph:

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import joblib
def read_dictionary(fn):
    import joblib
    # Load data (deserialize)
    with open(fn, 'rb') as handle:
        dictionary = joblib.load(handle)
    return dictionary
resultsDict = read_dictionary('resultsDict_knn.dat')
limit = 5
from matplotlib import pyplot as plt
predictions = resultsDict['predictions']
votes_to_classes_lst = resultsDict['votes_to_classes_lst']
xC = resultsDict['xC']
yC = resultsDict['yC']
zC = resultsDict['zC']
xtest = resultsDict['xtest']
ytest = resultsDict['ytest']

plt.style.use('default')
#plt.style.use('dark_background')

print(predictions[0])
print(predictions[1])
print(predictions[2])
print(predictions[3])
print(predictions[4])
print(predictions[5])
print(predictions[6])
print(predictions[7])
print(predictions[8])
print(predictions[9])

plt.scatter(x=xC[:100], y=yC[:100],s=75,  c='r', edgecolor="k")
plt.scatter(x=xtest[:10], y=ytest[:10],s=150,  c='b', edgecolor="k")
#plt.scatter(x=predictions[:100], y=predictions[:100],s=75,  c='b', edgecolor="k")

plt.title('KNN')

#plt.grid()
plt.gcf().set_size_inches((16, 8))
plt.show()

## Summary
In this module you will have learned the following:
* Numba implementation of KNN using Numba JIT
* Numba-dpex implementation of KNN on GPU using the kernel approach
