# Music Recommender Engine

This notebook uses a kaggle dataset - [Spotify Dataset 1921-2020, 160k+ Tracks](https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks) 

The "data.csv" file contains more than 160.000 songs collected from Spotify Web API whose primary key is represented by an ID generated by the Spotify API.


This notebook uses K-Nearest-Neighbors to recommend N other tracks based on the input track.

In [1]:
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix

In [2]:
df = pd.read_csv('Dataset/data.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,valence,year
0,0,0.732,['Dennis Day'],0.819,180533,0.341,0,7xPhfUan2yNtyFG0cUWkt8,0.0,7,0.16,-12.441,1,Clancy Lowered the Boom,8,1921,0.415,60.936,0.963,1921
1,1,0.982,"['Sergei Rachmaninoff', 'James Levine', 'Berli...",0.279,831667,0.211,0,4BJqT0PrAfrxzMOxytFOIz,0.878,10,0.665,-20.096,1,"Piano Concerto No. 3 in D Minor, Op. 30: III. ...",5,1921,0.0366,80.954,0.0594,1921
2,2,0.996,['John McCormack'],0.518,159507,0.203,0,5uNZnElqOS3W4fRmRYPk4T,0.0,0,0.115,-10.589,1,The Wearing of the Green,6,1921,0.0615,66.221,0.406,1921
3,3,0.982,"['Sergei Rachmaninoff', 'James Levine', 'Berli...",0.279,831667,0.211,0,1SCWBjhk5WmXPxhDduD3HM,0.878,10,0.665,-20.096,1,"Piano Concerto No. 3 in D Minor, Op. 30: III. ...",4,1921,0.0366,80.954,0.0594,1921
4,4,0.957,['Phil Regan'],0.418,166693,0.193,0,4d6HGyGT8e121BsdKmw9v6,2e-06,3,0.229,-10.096,1,When Irish Eyes Are Smiling,4,1921,0.038,101.665,0.253,1921


In [4]:
#remove unnecessary features

df_new = df.drop(['Unnamed: 0','artists','duration_ms','explicit','key','mode','release_date','name','popularity','year'],axis=1)

df_new.head()

Unnamed: 0,acousticness,danceability,energy,id,instrumentalness,liveness,loudness,speechiness,tempo,valence
0,0.732,0.819,0.341,7xPhfUan2yNtyFG0cUWkt8,0.0,0.16,-12.441,0.415,60.936,0.963
1,0.982,0.279,0.211,4BJqT0PrAfrxzMOxytFOIz,0.878,0.665,-20.096,0.0366,80.954,0.0594
2,0.996,0.518,0.203,5uNZnElqOS3W4fRmRYPk4T,0.0,0.115,-10.589,0.0615,66.221,0.406
3,0.982,0.279,0.211,1SCWBjhk5WmXPxhDduD3HM,0.878,0.665,-20.096,0.0366,80.954,0.0594
4,0.957,0.418,0.193,4d6HGyGT8e121BsdKmw9v6,2e-06,0.229,-10.096,0.038,101.665,0.253


In [5]:
#rescale values and change ID to be primary key Spotify ID

df_new['loudness'] = df_new['loudness']+60
df_new['loudness'] = df_new['loudness']/63.855
df_new['tempo'] = df_new['tempo']/244.091
df_new.index = df_new['id']
df_new = df_new.drop(['id'],axis=1)

df_new.head()

Unnamed: 0_level_0,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
7xPhfUan2yNtyFG0cUWkt8,0.732,0.819,0.341,0.0,0.16,0.744797,0.415,0.249645,0.963
4BJqT0PrAfrxzMOxytFOIz,0.982,0.279,0.211,0.878,0.665,0.624916,0.0366,0.331655,0.0594
5uNZnElqOS3W4fRmRYPk4T,0.996,0.518,0.203,0.0,0.115,0.7738,0.0615,0.271296,0.406
1SCWBjhk5WmXPxhDduD3HM,0.982,0.279,0.211,0.878,0.665,0.624916,0.0366,0.331655,0.0594
4d6HGyGT8e121BsdKmw9v6,0.957,0.418,0.193,2e-06,0.229,0.781521,0.038,0.416505,0.253


In [6]:
model_knn = NearestNeighbors(algorithm='kd_tree',n_neighbors=20)

In [7]:
mat_songs = csr_matrix(df_new.values)

In [8]:
model_knn.fit(mat_songs)



NearestNeighbors(algorithm='kd_tree', n_neighbors=20)

In [9]:
def recommend(idx, model, number_of_recommendations=5):
    query = df_new.loc[idx].to_numpy().reshape(1,-1)
    print('Searching for recommendations...')
    distances, indices = model.kneighbors(query,n_neighbors = number_of_recommendations)
    
    for i in indices:
        print(df[['name','artists']].loc[i].where(df['id']!=idx).dropna())

In [10]:
#Tester

name = input('Enter song title: ')
print('Search results: ')
print(df[['artists','name']].where(df['name'] == name).dropna())

ind = int(input('Enter the index value of the required song: '))
idx = df['id'].loc[ind]

song = df['name'].loc[ind]
artists = df['artists'].loc[ind]

print('Song selected is ', song, 'by', artists)

nor = int(input('Enter number of recommendations: '))

recommend(idx, model_knn, nor)

Enter song title: Believer
Search results: 
                     artists      name
9593     ['Imagine Dragons']  Believer
34960      ['Ozzy Osbourne']  Believer
114436  ['American Authors']  Believer
Enter the index value of the required song: 9593
Song selected is  Believer by ['Imagine Dragons']
Enter number of recommendations: 20
Searching for recommendations...
                                                     name  \
9780                                      Don't Start Now   
19676                                     Don't Start Now   
106105                                          Superstar   
86448                                       I'm Satisfied   
160428                                  Country Boy Fresh   
153123                   Can't Believe It (feat. Pitbull)   
114866                                       Take Me Away   
130203                                    Dare (La La La)   
166102                                       Buenos Aires   
114761                

It is observed that for a given input data, the output is always the same for that input data. 
This can be due to the model comparing the features of the input track and suggesting tracks having similar features. There is no user data taken into consideration to personalize the results even more. 