## Nearest Neighbors

Nearest Neighbors allows the query of the k-nearest neighbors from a set of input samples.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or _cuda_array_interface_compliant), as well as cuDF DataFrames as the input. 

For information on converting your dataset to cuDF format, refer to the [cuDF documentation](https://rapidsai.github.io/projects/cudf/en/latest/)

For additional information on cuML's Nearest Neighbors implementation, refer to the [cuML documentation](https://rapidsai.github.io/projects/cuml/en/latest/api.html#nearest-neighbors)

In [None]:
import os

import numpy as np

from sklearn import datasets

import pandas as pd
import cudf as gd

from sklearn.neighbors import NearestNeighbors as skNN
from cuml.neighbors import NearestNeighbors as cumlNN

## Define Parameters

In [None]:
n_samples = 2**15
n_features = 40

n_neighbors = 10

## Generate Data

In [None]:
data, labels = datasets.make_blobs(
   n_samples=n_samples, n_features=n_features, centers=5, random_state=7)

## Fit Scikit-learn Model

In [None]:
%%time
knn_sk = skNN(metric = 'sqeuclidean', )
knn_sk.fit(data)

D_sk,I_sk = knn_sk.kneighbors(data, n_neighbors)

## Fit cuML Model

In [None]:
%%time
device_data = gd.DataFrame.from_pandas(data)

In [None]:
%%time
knn_cuml = cumlNN()
knn_cuml.fit(device_data)

D_cuml,I_cuml = knn_cuml.kneighbors(device_data, n_neighbors)

## Compare Results

In [None]:
passed = array_equal(D_sk,D_cuml, metric='abs') # metric used can be 'acc', 'mse', or 'abs'
message = 'compare knn: cuml vs sklearn distances %s'%('equal'if passed else 'NOT equal')
print(message)

In [None]:
# compare the labels obtained while using sklearn and cuml models
passed = accuracy(I_sk, I_cuml, threshold=1e-1)
message = 'compare knn: cuml vs sklearn indexes %s'%('equal'if passed else 'NOT equal')
print(message)