# KNN Recommender 

👉 K-Nearest-Neighbors (KNN) models can be used to model and make predictions, but they can alternatively be utilized to find the closest points in a dataset.  

👨🏻‍🏫 In this recap, we will use a KNN model to create a basic music recommender system.

In [3]:
import pandas as pd

url = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_spotify_data.csv'

# Using pandas, load the data from the provided URL
df = pd.read_csv(url)
df.head()

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


🎯 Let's find songs that are "similar" to Queen's mythical *Another One Bites the Dust*.

In [4]:
queen_song = df.iloc[4295:4296] # Another One Bites the Dust - Queen

queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


## 1. Calculating the distances

👇 First, train the KNN to have it learn the distances between each observation of the dataset.  
Since we are only concerned with the similarity of features between the songs, it doesn't matter which target the model is fitted on.

In [5]:
from sklearn.neighbors import NearestNeighbors

# sort features
features = ['danceability', 'valence', 'energy', 'explicit', 'key', 'liveness', 'loudness', 'speechiness', 'tempo']

# extract features
X = df[features]

# create KNN
knn = NearestNeighbors()

# train with data
knn.fit(X)

Check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.kneighbors)

## 2. Passing the new point

👇 You can now pass a new point to the KNN model and find its closest point.

In [6]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [7]:
queen_features = queen_song[features]

distances, indices = knn.kneighbors(queen_features, n_neighbors=5)

# get indices of nearest songs
closest_songs = df.iloc[indices[0]]
closest_songs

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991
8633,Horsey,['Macross 82-99'],50,0.484,0.388,0.829,0,4,0.548,-5.434,0.0996,116.996
9097,Tony Montana,"['Agust D', 'Yankie']",60,0.609,0.272,0.776,0,4,0.105,-5.837,0.059,116.01
599,Tonight,['Raspberries'],24,0.261,0.442,0.897,0,4,0.174,-4.585,0.0836,114.926
8516,Let the Groove Get In,['Justin Timberlake'],50,0.785,0.437,0.841,0,4,0.29,-5.962,0.064,116.921


## 3. Making a playlist!

👇 Make a playlist with 10 songs based on Queen's *Another One Bites the Dust*, sorted by increasing tempo.

In [None]:
queen_song

In [None]:
distances, indices = knn.kneighbors(queen_features, n_neighbors=11)

# Récupérer les indices des chansons les plus proches
closest_songs = df.iloc[indices[0]]

# Trier les chansons par tempo croissant
sorted_playlist = closest_songs.sort_values(by='tempo')

# Sélectionner les 10 premières chansons (en excluant la chanson de Queen)
playlist = sorted_playlist.iloc[1:11]
playlist