Skip to content

1. kNN Algorithm

Sumit Kant edited this page Aug 2, 2017 · 8 revisions

K Nearest Neighbors is a supervised learning algorithm that predicts the new instances by grouping them together with most similar cases. This is also called instance based or memory based supervised learning. These learning methods work by memorizing the labeled examples in the training set and use those memorized examples to classify new objects.

KNN can be used for both classification and regression problems. The kNN classification or regression task can be broken down into 3 primary functions

  1. Calculate the distance between any two points
  2. Find the nearest neighbors on pairwise distances
  3. Majority vote on a class labels based on the nearest neighbor list

Important Parameters to consider

  1. Model Complexity

    • n_neighbors - Number of nearest neighbors (k) to consider.
    • Default = 5.
  2. Model Fitting

    • Metric - distance between data points.
    • Default: Minkowski distance with power parameter = 2 (Euclidean)

knn steps

Python Implementation using Scikit-Learn

Split training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

Create Classifier Object

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

OR, Create Regressor Object

from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=3)

Train Classifier / Regressor

knn.fit(X_train, y_train)

Predict with classifier / Regressor

knn.predict(X_test)

Accuracy Score / R-squared for regressor

knn.score(X_test, y_test)

Examples

  1. A complete hands on example of implementing knn-algorithm to build an Object Recognition System
  2. KNN classification example on a synthetic dataset [ipynb] [py]
  3. KNN Regression Example on a synthetic Dataset [ipynb] [py]

Sources

Clone this wiki locally