# $k$-Nearest Neighbors (kNN): Classification

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import mglearn.plots
from mglearn.datasets import make_wave
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
import numpy as np

np.random.seed(1)

 ## Nearest Neighbors Classification

The $k$-NN algorithm is arguably the simplest machine learning algorithm. Building the model consists only of storing the training dataset. To make a prediction for a new data point, the algorithm finds the closest data points in the training dataset—its “nearest neighbors.”

the `sklearn.neighbors.KNeighborsClassifier`Classifier implementing the $k$-nearest neighbors vote.

### Making it with some forged data

Function make_forge generates some (deterministic) data

In [None]:
X, y = mglearn.datasets.make_forge()
X.shape

In [None]:
y

As usual, the next step is to build the training and testing sets

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X, y)

and "train" the model

In [None]:
knc = KNeighborsClassifier(n_neighbors=3).fit(x_train, y_train)

prediction are made using the `predict` method (predict the class labels for the provided data)

In [None]:
print(x_test)
knc.predict(x_test)

The real values are in the `y_test` vector

In [None]:
y_test

The `score` methods allows to score(!) the classifier. The method returns the mean accuracy on the given test data and labels, so 1 is good!

In [None]:
knc.score(x_test, y_test)

In [None]:
ks = list(range(1, 10))
score = []
for n_neighbors in ks:
    clf = KNeighborsClassifier(n_neighbors=n_neighbors).fit(x_train, y_train)
    score.append(clf.score(x_test, y_test))

plt.plot(ks, score)   

Let us see how was data separeted

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 3))
for n_neighbors, ax in zip([1, 3, 9], axes):
    clf = KNeighborsClassifier(n_neighbors=n_neighbors).fit(x_train, y_train)
    mglearn.plots.plot_2d_separator(clf, x_train, fill=True, eps=0.5, ax=ax, alpha=.4)
    mglearn.discrete_scatter(x_train[:, 0], x_train[:, 1], y_train, ax=ax)
    mglearn.discrete_scatter(x_test[:, 0], x_test[:, 1], y_test, ax=ax, markers='d')
    ax.set_title("{} neighbor(s)".format(n_neighbors))
    ax.set_xlabel("feature 0")
    ax.set_ylabel("feature 1")
axes[0].legend(loc=3)
plt.show()

`predict_proba` returns probability estimates for the test data X.

In [None]:
knc.predict_proba(x_test)