In [1]:
%matplotlib inline
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
iris_data = iris.data
iris_labels = iris.target

In [2]:
def distance(instance1, instance2):
    # just in case, if the instances are lists or tuples:
    instance1 = np.array(instance1) 
    instance2 = np.array(instance2)
    
    return np.linalg.norm(instance1 - instance2)

The function 'get_neighbors returns a list with 'k' neighbors, which are closest to the instance 'test_instance':

In [3]:
def get_neighbors(training_set, 
                  labels, 
                  test_instance, 
                  k, 
                  distance=distance):
    """
    get_neighors calculates a list of the k nearest neighbors
    of an instance 'test_instance'.
    The list neighbors contains 3-tuples with  
    (index, dist, label)
    where 
    index    is the index from the training_set, 
    dist     is the distance between the test_instance and the 
             instance training_set[index]
    distance is a reference to a function used to calculate the 
             distances
    """
    distances = []
    for index in range(len(training_set)):
        dist = distance(test_instance, training_set[index])
        distances.append((training_set[index], dist, labels[index]))
    distances.sort(key=lambda x: x[1])
    neighbors = distances[:k]
    return neighbors

__Voting to get a Single Result__

We will write a vote function now. This functions uses the class 'Counter' from collections to count the quantity of the classes inside of an instance list. This instance list will be the neighbors of course. The function 'vote' returns the most common class:

In [4]:
from collections import Counter

def vote_prob(neighbors):
    class_counter = Counter()
    for neighbor in neighbors:
        class_counter[neighbor[2]] += 1
    labels, votes = zip(*class_counter.most_common())
    winner = class_counter.most_common(1)[0][0]
    votes4winner = class_counter.most_common(1)[0][1]
    return winner, votes4winner/sum(votes)

---

Aufbauend darauf machen wir eine Prediction für die vorgegebenen Daten 4.8,2.5,5.3,2.4

---

In [5]:
to_find_testset = [4.8,2.5,5.3,2.4] #die zu suchende Klasse

neighbors = get_neighbors(iris_data, #das volle iris set nutzen
                              iris_labels,
                              to_find_testset, 
                              5, 
                              distance=distance)
print("vote_prob: ", vote_prob(neighbors), 
          ", data: ", to_find_testset)

vote_prob:  (2, 1.0) , data:  [4.8, 2.5, 5.3, 2.4]


---
vote_prob gibt uns die Klassennummer und die Wahrscheinlichkeit für die Klasse. Wir haben also wieder eine sehr hohe Wahrscheinlichkeit für die Virginica!

---