Skip to content

Latest commit

 

History

History
42 lines (25 loc) · 1.39 KB

File metadata and controls

42 lines (25 loc) · 1.39 KB

K Nearest neighbours

KNN classifies data according to the majority of labels in the nearest neighbourhood, according to some underlying distance function d(x, x′).

For k = 1, the label for a test point x* is predicted to be the same as for its closest training point xk, i.e. yk, where

$$k=\argmin_j d(x^*, x_j).$$

See Chapter 14 in barber2012bayesian for a detailed introduction.

See 2996 for known issues.

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

knn.sg:create_features

In order to run CKNN, we need to choose a distance, for example CEuclideanDistance, or other sub-classes of CDistance. The distance is initialized with the data we want to classify.

knn.sg:choose_distance

Once we have chosen a distance, we create an instance of the CKNN classifier, passing it training data and labels.

knn.sg:create_instance

Then we run the train KNN algorithm, apply it to test data, and print the predictions.

knn.sg:train_and_apply

References

K-nearest_neighbors_algorithm

../../references.bib