# KNN Recommender

👉 K-Nearest-Neighbors (KNN) models can be used to model and make predictions, but they can alternatively be utilized to find the closest points in a dataset.

In [10]:
import pandas as pd

url = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_spotify_data.csv'

# Using pandas, load the data from the provided URL
df = pd.read_csv(url)
df.head()

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


In [3]:
queen_song = df.iloc[4295:4296] # Another One Bites the Dust - Queen
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


🎯 Let's find songs that are "similar" to Queen's mythical Another One Bites the Dust.

# Calculating the distances

👇 First, fit a KNN on the (scaled) dataset.

💡 Since we are only concerned with the similarity of features between the songs, it doesn't matter which target the model is fitted on.

In [4]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler

# Define X and y
X = df.drop(columns=['name','artists']) #Remove non numerical features
y = df['tempo']

# Scale the features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

knn_model = KNeighborsRegressor().fit(X_scaled,y) # Instanciate and train model

# Passing the new point

In [5]:
X_new = queen_song.drop(columns=['name', 'artists'])
X_new_scaled = scaler.transform(X_new)

knn_model.kneighbors(X_new_scaled,n_neighbors=2) # Return the distances and index of the 2 closest points

(array([[0.        , 0.35999219]]), array([[4295, 1164]]))

In [6]:
df.iloc[4295] # The closest point is the actual song, obviously

name            Another One Bites The Dust - Live at Wembley '86
artists                                                ['Queen']
popularity                                                    29
danceability                                               0.534
valence                                                    0.114
energy                                                     0.984
explicit                                                       0
key                                                            4
liveness                                                   0.982
loudness                                                  -5.058
speechiness                                                0.297
tempo                                                    115.991
Name: 4295, dtype: object

In [7]:
df.iloc[1164] # The second closest point is this song

name            Hi, Hi, Hi - Live / Remastered
artists                              ['Wings']
popularity                                  27
danceability                             0.219
valence                                  0.162
energy                                   0.939
explicit                                     0
key                                          4
liveness                                 0.993
loudness                                -9.275
speechiness                              0.226
tempo                                  140.832
Name: 1164, dtype: object

# Making a playlist

In [8]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [9]:
# Pass song to model, ask for 11 closest points, and unpack the corresponding indices to a list
ind_list = list(knn_model.kneighbors(X_new_scaled,n_neighbors=11)[1][0])

# Filter original dataframe with indices list and sort by tempo
df.iloc[ind_list, :].sort_values(by="tempo")

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
3307,Graveyard,['Butthole Surfers'],27,0.504,0.135,0.949,0,7,0.913,-8.797,0.0385,98.128
704,"It Ain't Me, Babe - Live at LA Forum, Inglewoo...",['Bob Dylan'],23,0.455,0.308,0.981,0,7,0.995,-6.409,0.183,100.49
737,"Like a Rolling Stone - Live at LA Forum, Ingle...",['Bob Dylan'],23,0.392,0.234,0.983,0,0,0.99,-5.436,0.207,103.616
1761,Liar,['The Damned'],25,0.348,0.203,0.939,0,4,0.838,-11.54,0.0745,107.064
1211,A Light In The Black,['Rainbow'],32,0.334,0.0936,0.982,0,4,0.753,-10.19,0.0735,109.414
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991
2705,A Sort Of Homecoming - Live,['U2'],22,0.505,0.363,0.883,0,6,0.97,-6.794,0.0578,125.824
8607,Cheat Codes,['Nitro Fun'],51,0.626,0.146,0.96,0,4,0.894,-4.234,0.0837,128.001
1164,"Hi, Hi, Hi - Live / Remastered",['Wings'],27,0.219,0.162,0.939,0,4,0.993,-9.275,0.226,140.832
2233,YYZ - Live In Canada / 1980,['Rush'],26,0.334,0.278,0.911,0,4,0.937,-12.017,0.0642,145.905
