# KNN classifier

Code a Nearest Neighbors algorithm that works for two dimensional data. You can use either arrays or dataframes to do this. Test it against the SKLearn package on the music dataset from above to ensure that it's correct. The goal here is to confirm your understanding of the model and continue to practice your Python skills. We're just expecting a brute force method here.

## My KNN Classifier 

In [78]:
import pandas as pd
import math
from sklearn.neighbors import KNeighborsClassifier

def get_distance_between(point_1, point_2):
    x_diff = abs(point_1[0] - point_2[0])
    y_diff = abs(point_1[1] - point_2[1])
    
    return math.sqrt((x_diff ** 2) + (y_diff ** 2))

def my_knn(x_vals, y_vals, outcome, observation, n):
    my_x = observation[0]
    my_y = observation[1]
    
    all_distances = []
    
    for idx, val in enumerate(x_vals):
        current_distance = get_distance_between([x_vals[idx], y_vals[idx]], observation)
        if len(all_distances) < n: 
            all_distances.append([current_distance, idx])
        else:
            just_distances = list(map(lambda x: x[0], all_distances))
            highest_distance = max(just_distances)
            if current_distance < highest_distance:
                all_distances = list(filter(lambda x: x[0] != highest_distance, all_distances))
                all_distances.append([current_distance, idx])

    closest_point_indices = list(map(lambda x: x[1], all_distances))
    closest_outcomes = list(map(lambda x: outcome[x], closest_point_indices))
    
    verdict = max(set(closest_outcomes), key=closest_outcomes.count)
    return verdict
    

In [79]:
music = pd.DataFrame()

# Some data to play with.
music['duration'] = [184, 134, 243, 186, 122, 197, 294, 382, 102, 264, 
                     205, 110, 307, 110, 397, 153, 190, 192, 210, 403,
                     164, 198, 204, 253, 234, 190, 182, 401, 376, 102]
music['loudness'] = [18, 34, 43, 36, 22, 9, 29, 22, 10, 24, 
                     20, 10, 17, 51, 7, 13, 19, 12, 21, 22,
                     16, 18, 4, 23, 34, 19, 14, 11, 37, 42]

# We know whether the songs in our training data are jazz or not.
music['jazz'] = [ 1, 0, 0, 0, 1, 1, 0, 1, 1, 0,
                  0, 1, 1, 0, 1, 1, 0, 1, 1, 1,
                  1, 1, 1, 1, 0, 0, 1, 1, 0, 0]


In [80]:
points = [[30, 234], [30, 134], [10, 234]]
knns = [3, 5, 10]

for idx, point in enumerate(points):
    print(my_knn(music['loudness'], music['duration'], music['jazz'], point, knns[idx]))

0
1
1


## Testing My KNN Classifier Against sklearn KNeighborsClassifier

In [81]:
for idx, point in enumerate(points):
    neighbors = KNeighborsClassifier(n_neighbors=knns[idx])
    X = music[['loudness', 'duration']]
    Y = music.jazz
    neighbors.fit(X,Y)
    prediction = neighbors.predict([point])
    print(prediction)

[0]
[1]
[1]
