## Music Recommendation System (Machine Learning)

This project is aimed upon building a music recommendation system that gives the user recommendations on music based on his music taste by analysing his previously heard music and playlist. This project is done in two ways, using 'User - to - User Recommendation' and 'Item - to - Item Recommendation'. Birch, MiniBatchKMeans and KMeans algorithms are being used along with 'Surprise' module to compute the similarity between recommendations and user's already existing playlist for evaluation

### Obtaining Data

In [8]:
import pandas as pd
import numpy as np

In [9]:
final = pd.read_csv('datasets/final/final.csv')
metadata = pd.read_csv('datasets/final/metadata.csv')

### Model Selection - K Means Algorithm

In [10]:
from sklearn.cluster import KMeans
from sklearn.utils import shuffle

In [11]:
final = shuffle(final)

In [12]:
X = final.loc[[i for i in range(0, 6000)]]
Y = final.loc[[i for i in range(6000, final.shape[0])]]

In [13]:
X = shuffle(X)
Y = shuffle(Y)

In [14]:
metadata.head()

Unnamed: 0,track_id,album_title,artist_name,genre,track_title
0,2,AWOL - A Way Of Life,AWOL,HipHop,Food
1,3,AWOL - A Way Of Life,AWOL,HipHop,Electric Ave
2,5,AWOL - A Way Of Life,AWOL,HipHop,This World
3,10,Constant Hitmaker,Kurt Vile,Pop,Freeway
4,134,AWOL - A Way Of Life,AWOL,HipHop,Street Music


In [15]:
metadata = metadata.set_index('track_id')

In [16]:
X.drop(['label'], axis= 1, inplace= True)

KeyError: "['label'] not found in axis"

In [17]:
kmeans = KMeans(n_clusters=6)

In [18]:
Y.head()

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Holiday,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental
6014,6014,24460,0.518125,0.660847,0.238628,0.822161,0.064348,0.109718,119.972,0.187625,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12318,12318,97973,1.4e-05,0.405093,0.805474,0.866243,0.076751,0.048676,159.982,0.066659,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12829,12829,116456,0.754971,0.642743,0.260128,0.875182,0.105528,0.026497,159.969,0.445554,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10511,10511,48693,0.057085,0.521876,0.404832,0.916337,0.085267,0.033999,120.002,0.472159,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9031,9031,40512,0.188422,0.725841,0.768943,0.92225,0.121649,0.049036,89.92,0.980723,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
def fit(df, algo, flag=0):
    if flag:
        algo.fit(df)
    else:
         algo.partial_fit(df)          
    df['label'] = algo.labels_
    return (df, algo)

In [20]:
def predict(t, Y):
    y_pred = t[1].predict(Y)
    mode = pd.Series(y_pred).mode()
    return t[0][t[0]['label'] == mode.loc[0]]

In [21]:
def recommend(recommendations, meta, Y):
    dat = []
    for i in Y['track_id']:
        dat.append(i)
    genre_mode = meta.loc[dat]['genre'].mode()
    artist_mode = meta.loc[dat]['artist_name'].mode()
    return meta[meta['genre'] == genre_mode.iloc[0]], meta[meta['artist_name'] == artist_mode.iloc[0]], meta.loc[recommendations['track_id']]

In [22]:
t = fit(X, kmeans, 1)

In [23]:
recommendations = predict(t, Y)

In [24]:
output = recommend(recommendations, metadata, Y)

In [25]:
genre_recommend, artist_name_recommend, mixed_recommend = output[0], output[1], output[2]

In [26]:
genre_recommend.shape

(3892, 4)

In [27]:
artist_name_recommend.shape

(52, 4)

In [28]:
mixed_recommend.shape

(1142, 4)

In [29]:
# Genre wise recommendations
genre_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [30]:
# Artist wise recommendations
artist_name_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [31]:
# Mixed Recommendations
mixed_recommend.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
14533,Love in the Air,dmyra,AvantGarde|International|Blues|,Cherry Chrome
22097,netBloc Vol. 25: From Darkness Cometh The Light,The Gasoline Brothers,Pop,Over Me
15860,Come Fly With Me,The Kid Daytona,HipHop,The Groove feat. Mickey Factz {prod. Deputy}
12194,netBloc Vol. 17: Refined Excursions For The Di...,Just Plain Ant,HipHop,Revolution (Featuring Precise)
17717,Le Voyage,Pigeons & Crazy Porridgemakers,Rock,Birds Tomtits


In [32]:
recommendations.head()

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
3673,3673,14533,0.226239,0.850336,0.559656,0.8300631,0.145556,0.066582,130.049,0.673632,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
5533,5533,22097,0.984798,0.572549,0.165733,0.04473591,0.307328,0.030137,75.082,0.253154,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
3975,3975,15860,0.519513,0.4452,0.723961,2e-10,0.374165,0.116073,116.549,0.639169,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
3036,3036,12194,0.418959,0.596007,0.503217,1.80215e-05,0.535645,0.14199,92.948,0.564175,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
4446,4446,17717,0.928122,0.538063,0.51706,0.9488267,0.672513,0.026933,85.032,0.501998,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


In [33]:
artist_name_recommend['artist_name'].value_counts()

artist_name
51%    52
Name: count, dtype: int64

In [34]:
genre_recommend['genre'].value_counts()

genre
Rock    3892
Name: count, dtype: int64

In [35]:
genre_recommend['artist_name'].value_counts()

artist_name
Glove Compartment               65
Blah Blah Blah                  62
Mors Ontologica                 50
Les Baudouins Morts             38
Kraus                           35
                                ..
Alone in 1982                    1
Ostrich Tuning                   1
The Dalai Lama Rama Fa Fa Fa     1
The Rusty Bells                  1
Lost Boy                         1
Name: count, Length: 725, dtype: int64

#### Testing

In [36]:
testing = Y.iloc[6:12]['track_id']

In [37]:
testing

10072     46728
11857     75910
7884      33701
11028     54172
12569    108461
8378      36726
Name: track_id, dtype: int64

In [38]:
ids = testing.loc[testing.index]

In [39]:
songs = metadata.loc[testing.loc[list(testing.index)]]

In [40]:
songs

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
46728,netBloc Vol. 32: Make Way For What Lies Ahead,AEED,AvantGarde|International|,Electricity Part 2
75910,The Fired Dept.,The Monitors,Rock,Got a Job
33701,Húsares de la Muerte,H.D.M.,HipHop,Furioso 15
54172,Favorites 01: Sebastian Blanck,Benji Cossa,AvantGarde|International|,"Uh-Huh (Solo, Live at the Pink House Near the ..."
108461,MEEMS,Miracles of Modern Science,Pop,Physics Is Our Business
36726,Ekleipsi Net-Label Halloween Compilation,Comascape,Rock,Procession


In [41]:
re = predict(t, Y.iloc[6:12])

In [42]:
output = recommend(re, metadata, Y.iloc[6:12])

In [43]:
ge_re, ge_ar, ge_mix = output[0], output[1], output[2]

In [44]:
ge_re.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
236,Bersa Discos #8,Banana Clipz,AvantGarde|International|,"Push Am (Left, Right)"
461,blissblood.com,Cantonement Jazz Band,AvantGarde|International|,Bessemer
462,blissblood.com,Cantonement Jazz Band,AvantGarde|International|,Has Been Blues
463,blissblood.com,Cantonement Jazz Band,AvantGarde|International|,I'll Be Blue
464,blissblood.com,Cantonement Jazz Band,AvantGarde|International|,The Way I Feel Today


In [45]:
ge_ar.head(10)

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
25037,Bag of Nothingness,AEED,AvantGarde|International|,Particles
43153,netlabelism.com - Compilation 01/11,AEED,Electronic,Through The City
46728,netBloc Vol. 32: Make Way For What Lies Ahead,AEED,AvantGarde|International|,Electricity Part 2


In [46]:
ge_mix.head(10)

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
14533,Love in the Air,dmyra,AvantGarde|International|Blues|,Cherry Chrome
22097,netBloc Vol. 25: From Darkness Cometh The Light,The Gasoline Brothers,Pop,Over Me
15860,Come Fly With Me,The Kid Daytona,HipHop,The Groove feat. Mickey Factz {prod. Deputy}
12194,netBloc Vol. 17: Refined Excursions For The Di...,Just Plain Ant,HipHop,Revolution (Featuring Precise)
17717,Le Voyage,Pigeons & Crazy Porridgemakers,Rock,Birds Tomtits
10026,Beethoven's Sonata No. 1 In F Minor,Daniel Veesey,Classical,"Sonata No. 1 in F Minor, Op. 2 No. 1 - I. Allegro"
23313,Accident Consultancy Live / Undead,THF Drenching,AvantGarde|International|,Farah Khan (Dressed As A Woman) (Undead)
19434,Atlas Sound Live at ATP-NY 2009 on WFMU,Atlas Sound,AvantGarde|International|,Criminals
18764,Exploding Head Disease,Saskrotch,Electronic,Miss Lady in Green
20002,Embered Recollections,Heosphoros,AvantGarde|International|,Currents of Chaos


In [47]:
ge_re.shape

(1902, 4)

In [48]:
ge_ar.shape

(3, 4)

In [49]:
ge_mix.shape

(1142, 4)

### Model Selection - MiniBatchKMeans

In [50]:
from sklearn.cluster import MiniBatchKMeans

In [51]:
mini = MiniBatchKMeans(n_clusters = 6)

In [52]:
X.drop('label', axis=1, inplace=True)

In [53]:
# Let's divide the intital dataset into pieces to demonstrate online learning
part_1, part_2, part_3 = X.iloc[0: 2000], X.iloc[2000:4000], X.iloc[4000:6000]

In [54]:
for i in [part_1, part_2, part_3]:
    t = fit(i, mini)
    mini = t[1]
    i = t[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_


In [55]:
X = pd.concat([part_1, part_2, part_3])

In [56]:
X.columns

Index(['Unnamed: 0', 'track_id', 'acousticness', 'danceability', 'energy',
       'instrumentalness', 'liveness', 'speechiness', 'tempo', 'valence',
       ...
       'Salsa', 'NuJazz', 'HipHop Beats', 'Modern Jazz', 'Turkish', 'Tango',
       'Fado', 'Christmas', 'Instrumental', 'label'],
      dtype='object', length=931)

In [57]:
X.head(3)

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
2648,2648,11271,0.254378,0.718421,0.480101,0.519283,0.10634,0.052537,125.329,0.526295,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
712,712,1696,0.955104,0.214842,0.808108,0.956953,0.096952,0.095159,147.269,0.036735,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2229,2229,8261,0.970445,0.58804,0.398071,0.000609,0.10346,0.444726,190.081,0.764157,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2


In [58]:
X['label'].value_counts()

label
2    2813
1    1182
0    1147
4     586
3     151
5     121
Name: count, dtype: int64

In [59]:
recommendations = predict((X, mini), Y)

In [60]:
output = recommend(recommendations, metadata, Y)

In [61]:
genre_recommend_mini, artist_name_recommend_mini, mixed_mini = output[0], output[1], output[2]

In [62]:
genre_recommend_mini.shape

(3892, 4)

In [63]:
artist_name_recommend_mini.shape

(52, 4)

In [64]:
# Genre wise recommendations
genre_recommend_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [65]:
# Artist wise recommendations
artist_name_recommend_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [66]:
# Mixed Recommendations
mixed_mini.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
14533,Love in the Air,dmyra,AvantGarde|International|Blues|,Cherry Chrome
22097,netBloc Vol. 25: From Darkness Cometh The Light,The Gasoline Brothers,Pop,Over Me
15860,Come Fly With Me,The Kid Daytona,HipHop,The Groove feat. Mickey Factz {prod. Deputy}
12194,netBloc Vol. 17: Refined Excursions For The Di...,Just Plain Ant,HipHop,Revolution (Featuring Precise)
17717,Le Voyage,Pigeons & Crazy Porridgemakers,Rock,Birds Tomtits


### Model Selection - Birch

In [67]:
from sklearn.cluster import Birch

In [68]:
birch = Birch(n_clusters = 6)

In [69]:
X.drop('label', axis=1, inplace=True)

In [70]:
# Let's divide the intital dataset into pieces to demonstrate online learning
part_1, part_2, part_3 = X.iloc[0: 2000], X.iloc[2000:4000], X.iloc[4000:6000]

In [71]:
for i in [part_1, part_2, part_3]:
    t = fit(i, birch)
    mini = t[1]
    i = t[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = algo.labels_


In [72]:
X = pd.concat([part_1, part_2, part_3])

In [73]:
X.columns

Index(['Unnamed: 0', 'track_id', 'acousticness', 'danceability', 'energy',
       'instrumentalness', 'liveness', 'speechiness', 'tempo', 'valence',
       ...
       'Salsa', 'NuJazz', 'HipHop Beats', 'Modern Jazz', 'Turkish', 'Tango',
       'Fado', 'Christmas', 'Instrumental', 'label'],
      dtype='object', length=931)

In [74]:
X.head(3)

Unnamed: 0.1,Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,...,Salsa,NuJazz,HipHop Beats,Modern Jazz,Turkish,Tango,Fado,Christmas,Instrumental,label
2648,2648,11271,0.254378,0.718421,0.480101,0.519283,0.10634,0.052537,125.329,0.526295,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
712,712,1696,0.955104,0.214842,0.808108,0.956953,0.096952,0.095159,147.269,0.036735,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2229,2229,8261,0.970445,0.58804,0.398071,0.000609,0.10346,0.444726,190.081,0.764157,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2


In [75]:
X['label'].value_counts()

label
2    1962
1    1063
4     892
0     869
5     750
3     464
Name: count, dtype: int64

In [76]:
recommendations = predict((X, birch), Y)

In [77]:
output = recommend(recommendations, metadata, Y)

In [78]:
genre_recommend_birch, artist_name_recommend_birch, mixed_birch = output[0], output[1], output[2]

In [79]:
genre_recommend_birch.shape

(3892, 4)

In [80]:
artist_name_recommend_birch.shape

(52, 4)

In [81]:
# Genre wise recommendations
genre_recommend_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
153,Arc and Sender,Arc and Sender,Rock,Hundred-Year Flood
154,Arc and Sender,Arc and Sender,Rock,Squares And Circles
155,unreleased demo,Arc and Sender,Rock,Maps of the Stars Homes
169,Boss of Goth,Argumentix,Rock,Boss of Goth
170,Nightmarcher,Argumentix,Rock,Industry Standard Massacre


In [82]:
# Artist wise recommendations
artist_name_recommend_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
34660,Zehu,51%,AvantGarde|International|Blues|Jazz|,Hadri Ha'Kat
34661,Zehu,51%,AvantGarde|International|Blues|Jazz|,Blender Tzivoni
34662,Zehu,51%,AvantGarde|International|Blues|Jazz|,Naniah
34663,Zehu,51%,AvantGarde|International|Blues|Jazz|,Yoter Miday
34664,Zehu,51%,AvantGarde|International|Blues|Jazz|,"Yamim, Lielot"


In [83]:
# Mixed Recommendations
mixed_birch.head()

Unnamed: 0_level_0,album_title,artist_name,genre,track_title
track_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
11271,The WIRED CD: Rip. Sample. Mash. Share.,Spoon,AvantGarde|International|,Revenge!
12853,Folk Den Project,Roger McGuinn,Folk,America for Me
23188,Instrumentals 1,Lee Maddeford,Jazz,Irresistible Yvette (with Les Gauchers Quintet)
12815,Folk Den Project,Roger McGuinn,Folk,King Kong Kitchie Kitchie Ki Me O
12794,Folk Den Project,Roger McGuinn,Folk,Go Tell It On The Mountain
