KNN classifies data according to the majority of labels in the nearest neighbourhood, according to some underlying distance function d(x, x′).
For k = 1, the label for a test point x* is predicted to be the same as for its closest training point xk, i.e. yk, where
See Chapter 14 in barber2012bayesian
for a detailed introduction.
See 2996
for known issues.
Imagine we have files with training and test data. We create CDenseFeatures
(here 64 bit floats aka RealFeatures) and CMulticlassLabels
as
knn.sg:create_features
In order to run CKNN
, we need to choose a distance, for example CEuclideanDistance
, or other sub-classes of CDistance
. The distance is initialized with the data we want to classify.
knn.sg:choose_distance
Once we have chosen a distance, we create an instance of the CKNN
classifier, passing it training data and labels.
knn.sg:create_instance
Then we run the train KNN algorithm, apply it to test data, and print the predictions.
knn.sg:train_and_apply
K-nearest_neighbors_algorithm
../../references.bib