# KNN Recommender 

üëâ K-Nearest-Neighbors can be used to model and make predictions, but they can also be derived to find the closest points in a dataset. In this recap, we'll use a KNN to make a basic music recommender system.

In [0]:
import pandas as pd

df = pd.read_csv('data/spotify_data.csv')

df.head()

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


üéØ Let's find songs similar to Queen's mythical Another one bites the dust.

In [0]:
queen_song = df.iloc[4295:4296] # Another one bites the dust - Queen

queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


# Calculating the distances

üëá First, train the KNN to have it learn the distances between each observation of the dataset. Since we are only concerned by the similarity of features between the songs, it doesn't matter which target it is fitted to.

In [0]:
from sklearn.neighbors import KNeighborsRegressor

# Define X and y
X = df.drop(columns=['name','artists']) #Remove non numerical features
y = df['tempo']

knn_model = KNeighborsRegressor().fit(X,y) # Instanciate and train model

Check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.kneighbors)

# Passing the new point

üëá You can now pass a new point to the KNN model and find it's closest point.

In [0]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [0]:
knn_model.kneighbors(queen_song.drop(columns=['name','artists']),n_neighbors=2) # Return the distances and index of the 2 closest points

(array([[0.        , 2.55269431]]), array([[4295, 3488]]))

In [0]:
df.iloc[4295] # The closest point is the actual song, obviously

name            Another One Bites The Dust - Live at Wembley '86
artists                                                ['Queen']
popularity                                                    29
danceability                                               0.534
valence                                                    0.114
energy                                                     0.984
explicit                                                       0
key                                                            4
liveness                                                   0.982
loudness                                                  -5.058
speechiness                                                0.297
tempo                                                    115.991
Name: 4295, dtype: object

In [0]:
df.iloc[3488] # The second closest point is this song

name                      Confidence Man
artists         ['The Jeff Healey Band']
popularity                            30
danceability                        0.56
valence                            0.868
energy                             0.927
explicit                               0
key                                    6
liveness                           0.316
loudness                          -5.682
speechiness                       0.0715
tempo                            116.236
Name: 3488, dtype: object

# Making a playlist!

üëá Make a 10 song long playlist based on Queen's "Another one bites the dust" with increasing tempo

In [0]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [0]:
# Pass song to model, ask for 11 closest points, and unpack the corresponding indices to a list
ind_list = list(knn_model.kneighbors(queen_song.drop(columns=['name','artists']),n_neighbors=11)[1][0])

# Filter original dataframe with indices list and sort by tempo
df.iloc[ind_list, :].sort_values(by="tempo")

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
3648,Mary Jane - Remastered,['Megadeth'],30,0.429,0.364,0.959,0,2,0.342,-4.789,0.12,113.361
2700,Âêõ„ÅÆ„Éè„Éº„Éà„ÅØ„Éû„É™„É≥„Éï„Çô„É´„Éº,"['„Ç™„É°„Ç¨„Éà„É©„Ç§„Éñ', 'Kiyotaka Sugiyama']",29,0.602,0.624,0.794,0,4,0.413,-5.512,0.0271,113.612
1704,Baba O'Riley - Live At Shepperton,['The Who'],27,0.304,0.412,0.835,0,5,0.857,-7.372,0.0662,114.621
2794,Reaction to Action,['Foreigner'],30,0.631,0.404,0.935,0,2,0.151,-6.459,0.0564,115.687
5179,On Silent Wings,['Tina Turner'],30,0.519,0.518,0.581,0,2,0.0613,-6.9,0.0337,115.851
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991
3586,LOVE IN THE FIRST DEGREE ~ÊÇ™„ÅÑ„ÅÇ„Å™„Åü~ (Remastered 2...,['Wink'],29,0.784,0.757,0.944,0,2,0.234,-6.579,0.0505,116.058
3488,Confidence Man,['The Jeff Healey Band'],30,0.56,0.868,0.927,0,6,0.316,-5.682,0.0715,116.236
2507,Too Much Blood - Remastered,['The Rolling Stones'],30,0.592,0.479,0.909,0,6,0.0571,-5.887,0.0512,116.439
3047,Millionaires Against Hunger,['Red Hot Chili Peppers'],29,0.815,0.549,0.97,1,2,0.0348,-3.384,0.0834,117.264


# üèÅ