## Introduction
This python program tries to predict if a specific combination of 'duration' and 'loudness' for a song is more closely related to a jazz song or a rock song.  If it predicts a jazz song, the result of the funtion will return a 1, otherwise it will be classified as a rock song and return a 0.  

This program is the calculation occurring under the hood of the Nearest Neighbor algorithm from SciKit Learn.

## Import python modules

In [81]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
import math
%matplotlib inline

## Create the dataframe 

In [82]:
music = pd.DataFrame()

# Some data to play with.
music['duration'] = [184, 134, 243, 186, 122, 197, 294, 382, 102, 264, 
                     205, 110, 307, 110, 397, 153, 190, 192, 210, 403,
                     164, 198, 204, 253, 234, 190, 182, 401, 376, 102]
music['loudness'] = [18, 34, 43, 36, 22, 9, 29, 22, 10, 24, 
                     20, 10, 17, 51, 7, 13, 19, 12, 21, 22,
                     16, 18, 4, 23, 34, 19, 14, 11, 37, 42]

# We know whether the songs in our training data are jazz or not.
music['jazz'] = [ 1, 0, 0, 0, 1, 1, 0, 1, 1, 0,
                  0, 1, 1, 0, 1, 1, 0, 1, 1, 1,
                  1, 1, 1, 1, 0, 0, 1, 1, 0, 0]



In [83]:
# convert the colums of the dataframe to lists
dur = list(music['duration'])
loud = list(music['loudness'])
jazz = list(music['jazz'])

## Build the program

In [84]:
# set intitial list, 'list_distance' equal to an empty list.
list_distance = []

# create the nn (neirest neighbor) function
def nn(d, l):
    list_distance = []
    for x, y in zip(dur, loud):
        
        # calculate distance between all training points from the new inputs
        distance = math.sqrt((d - x)**2 + math.sqrt((l - y)**2))
        
        # append the distances to 'list_distance'
        list_distance.append(distance)
        
        # find the minimum distance in list_distance
        min_distance = min(list_distance) 
        
        # capture the index for the min_distance value
        index = list_distance.index(min_distance)
    
    # return the minimum distance betweeen the user input data and the closest training data
    # then return the corresponding jazz or not jazz value
    return min_distance, jazz[index]                          
                                 
# run some tests; if the second returned value is a 1 we can classify as jazz, otherwise we classify as rock
print(nn(190, 24))
print(nn(400, 5))
print(nn(100, 3))
print(nn(100, 50))


(2.23606797749979, 0)
(2.6457513110645907, 1)
(3.3166247903554, 1)
(3.4641016151377544, 0)


## Evaluate the program

In [85]:
# run some tests; if the second returned value is a 1 we can classify as jazz, otherwise we classify as rock
print(nn(190, 24))
print(nn(400, 5))
print(nn(100, 3))
print(nn(100, 50))


(2.23606797749979, 0)
(2.6457513110645907, 1)
(3.3166247903554, 1)
(3.4641016151377544, 0)


In [88]:
from sklearn.neighbors import KNeighborsClassifier
neighbors = KNeighborsClassifier(n_neighbors=1)
X = music[['loudness', 'duration']]
Y = music.jazz
neighbors.fit(X,Y)

## Predict for a song with 24 loudness that's 190 seconds long.
print(neighbors.predict([[24, 190]]))
print(neighbors.predict([[5, 400]]))
print(neighbors.predict([[3, 100]]))
print(neighbors.predict([[50, 100]]))


[0]
[1]
[1]
[0]


As you can see, the results from the code matches the results from the Nearest Neighbor algo from SKLearn.

## Conclusion and discussion
As you can see, the Nearest Neighbor algorithm is fairly simple and can be easily coded out by hand in this robust manner by checking all of the distances between the training data and input data.  If the training data has millions of data points, we might in the future select a random sample to use as the training dataset to extract our prediction.