[class API](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors)

#### UNSUPERVISED

In [3]:
# finding nearest neighbors between 2 datasets

from sklearn.neighbors import NearestNeighbors
import numpy as np

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)

distances, indices = nbrs.kneighbors(X)

In [4]:
indices

array([[0, 1],
       [1, 0],
       [2, 1],
       [3, 4],
       [4, 3],
       [5, 4]])

In [5]:
distances

array([[ 0.        ,  1.        ],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.41421356],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.41421356]])

In [6]:
# build sparse graph showing connections betw neighboring points

nbrs.kneighbors_graph(X).toarray()

array([[ 1.,  1.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.]])

In [7]:
# KDTree

from sklearn.neighbors import KDTree
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kdt = KDTree(X, leaf_size=30, metric='euclidean')
kdt.query(X, k=2, return_distance=False)

array([[0, 1],
       [1, 0],
       [2, 1],
       [3, 4],
       [4, 3],
       [5, 4]])

In [8]:
# BallTree

from sklearn.neighbors import BallTree
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kdt = BallTree(X, leaf_size=30, metric='euclidean')
kdt.query(X, k=2, return_distance=False)

array([[0, 1],
       [1, 0],
       [2, 1],
       [3, 4],
       [4, 3],
       [5, 4]])

#### NN CLASSIFICATION

[NN, iris dataset](plot_classification.ipynb)

#### NN REGRESSION

[API (k-neighbors-based)](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor) | [API (radius-based)](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.RadiusNeighborsRegressor.html#sklearn.neighbors.RadiusNeighborsRegressor)

[NN, random dataset](plot_regression.ipynb) | [
face completion, multi-output](plot_multioutput_face_completion.ipynb)

#### ALGOS (BRUTE FORCE, K-D TREE, BALL TREE)

* BruteForce:
* K-D tree:
* Ball tree:

##### Tradeoffs:

* #samples
* dimensionality (#features)
* data structure (dimensionality, sparsity)
* #neighbors requested (K)
* #query points

##### Leaf Size:

* brute force > tree-based for small sample sizes
* larger leaf size ==> faster tree construction (fewer nodes)
* good compromise: leaf_size = 30 to optimize search time
* as leaf_size goes up, memory rqmnts go up.

#### NEAREST CENTROID (CLASSIFIER)

[example](plot_nearest_centroid.ipynb)

In [9]:
from sklearn.neighbors.nearest_centroid import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid()
clf.fit(X, y)

print(clf.predict([[-0.8, -1]]))

[1]


#### APPROXIMATE NEAREST NEIGHBOR SEARCH

*** For apps with K>50, AKA The 'Curse of Dimensionality' ***

[LSH Forest (API)](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LSHForest.html#sklearn.neighbors.LSHForest)

[LSHF: accuracy vs #candidates,#estimators](plot_approximate_nearest_neighbors_hyperparameters.ipynb)

[LSFH: query time vs #samples](plot_approximate_nearest_neighbors_scalability.ipynb)