<center><h2> Collaborative Model Based Recommendation system </center></h2>

Model-based collaborative filtering algorithms provide item recommendations by first developing a model of user ratings. The recommendations can be made using the deterministic algorithms as well as Bayesian and in the project, we'll use ALS which is a deterministic approach.

Model-based approaches include: 

* Matrix-factorisation-based approaches,
* Clustering-approaches and
* Deep-learning-based approaches.

Some of the features of the ALS algorithm are discussed below:

* Alternating least squares (ALS) is a matrix factorization algorithm that separates the user-rating matrix into two matrices: the user matrix and the item matrix.
* It learns the parameters of the user and the item matrices using gradient descent in an alternating manner.
* Once the user and item matrices are learnt, we multiply them to get the ratings in the user-rating matrix where the rating was unknown, which helps in deciding the recommendations.
* The matrix multiplications occur parallelly, not sequentially.
* ALS is built for implementing large-scale collaborative filtering on huge data sets.
* The user-rating matrix may contain many empty entries, indicating that users have not watched the movies yet. These kinds of matrices are called sparse matrices. The ALS algorithm is able to work with such sparse matrices also, unlike other recommendation algorithms.

In [1]:
import string
import warnings
import implicit
import numpy as np
import pandas as pd
from sklearn import preprocessing
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
warnings.filterwarnings('ignore')

In [2]:
song_data = pd.read_csv("kaggle/song_data.txt", sep = ',')
song_data.head()

Unnamed: 0,user,song,play_count,track_id,artist,title
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero


In [3]:
song_data.shape

(1450933, 6)

In [4]:
song_data1 = song_data.copy()

In [5]:
encoding_user_song = preprocessing.OrdinalEncoder()
song_data1['user_id'] = encoding_user_song.fit_transform(song_data1[['user']])
song_data1['song_id'] = encoding_user_song.fit_transform(song_data1[['song']])

In [6]:
song_data1.head()

Unnamed: 0,user,song,play_count,track_id,artist,title,user_id,song_id
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One,108811.0,10546.0
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...,108811.0,28684.0
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim,108811.0,36622.0
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...,108811.0,51861.0
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero,108811.0,24663.0


In [7]:
print(song_data1.user.nunique(), song_data1.user_id.nunique())

110000 110000


In [8]:
print(song_data1.song.nunique(), song_data1.song_id.nunique())

163206 163206


In [9]:
song_data1['user_id'] = song_data1['user_id'].astype('int')
song_data1['song_id'] = song_data1['song_id'].astype('int')

In [10]:
song_data1.head()

Unnamed: 0,user,song,play_count,track_id,artist,title,user_id,song_id
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One,108811,10546
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...,108811,28684
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim,108811,36622
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...,108811,51861
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero,108811,24663


<h3> Creating a user-item sparse matrix </h3>

In [11]:
alpha = 40
sparse_user_item = csr_matrix(([alpha]*song_data1.shape[0], (song_data1['user_id'], song_data1['song_id'])))

In [12]:
sparse_user_item

<110000x163206 sparse matrix of type '<class 'numpy.intc'>'
	with 1450933 stored elements in Compressed Sparse Row format>

<h3> Creating a item-user sparse matrix </h3>

In [13]:
sparse_item_user = sparse_user_item.T.tocsr()

In [14]:
sparse_item_user

<163206x110000 sparse matrix of type '<class 'numpy.intc'>'
	with 1450933 stored elements in Compressed Sparse Row format>

<h3> Model training </h3>

In [15]:
train, test = train_test_split(sparse_item_user, train_size=0.8)

In [16]:
train

<130564x110000 sparse matrix of type '<class 'numpy.intc'>'
	with 1163574 stored elements in Compressed Sparse Row format>

In [17]:
test

<32642x110000 sparse matrix of type '<class 'numpy.intc'>'
	with 287359 stored elements in Compressed Sparse Row format>

In [18]:
model = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20, calculate_training_loss=False)



In [19]:
model.fit(train)

  0%|          | 0/20 [00:00<?, ?it/s]

<h3> Song recommendations for any user id </h3>

In [20]:
user = 108811
model.recommend(user, sparse_user_item)

[(25353, 0.95523936),
 (80060, 0.95519245),
 (52441, 0.91682243),
 (67381, 0.91670614),
 (106530, 0.88098913),
 (72129, 0.7032826),
 (68336, 0.6019877),
 (94663, 0.57464206),
 (45588, 0.53022385),
 (1287, 0.5071815)]

In [21]:
user = 'fd50c4007b68a3737fe052d5a4f78ce8aa117f3d'
user_id = song_data1['user_id'][song_data['user']==user]
user = user_id[0]

In [22]:
model.recommend(user, sparse_user_item)

[(25353, 0.95523936),
 (80060, 0.95519245),
 (52441, 0.91682243),
 (67381, 0.91670614),
 (106530, 0.88098913),
 (72129, 0.7032826),
 (68336, 0.6019877),
 (94663, 0.57464206),
 (45588, 0.53022385),
 (1287, 0.5071815)]

In [23]:
output = model.recommend(user, sparse_user_item)

In [24]:
output_df = pd.DataFrame(output, columns=['song_id', 'als_score'])

In [25]:
output_df.shape

(10, 2)

In [26]:
song_data2 = song_data1[['song', 'song_id', 'title']].copy()

In [27]:
merged = pd.merge(output_df, song_data2, how='left', on='song_id')

In [28]:
merged.drop_duplicates(inplace=True)

In [29]:
merged.reset_index(drop=True)

Unnamed: 0,song_id,als_score,song,title
0,25353,0.955239,SODTOQT12A8AE4596E,Medley SPC: Nosso Sonho Não É Ilusão / Tão Só ...
1,80060,0.955192,SOMDQVL12A58A7CF19,Cupido
2,52441,0.916822,SOHVTMT12A58A79A18,Timekiller (Video Edit / Original)
3,67381,0.916706,SOKDXEU12A8C139241,Margarita
4,106530,0.880989,SOQKHPE12A6D4FB4FF,Throes Of Rejection (LP Version)
5,72129,0.703283,SOKXIZH12AAF3B34D6,Among The Lazarae
6,68336,0.601988,SOKHRWE12A6D4F6BE7,Cuando Seolvida El Amor
7,94663,0.574642,SOOLXMH12AB018AB09,Welcome The Day
8,45588,0.530224,SOGUJIM12AB0183637,Do You Know What It Means To Miss New Orleans?
9,1287,0.507182,SOAEZBT12A6701FDCA,Little Ghetto Boy (Live Version)


<h3> Normalize the als score </h3>

In [30]:
merged['als_score_normalized'] = (merged['als_score'] - min(merged['als_score'])) / (max(merged['als_score']) - min(merged['als_score']))

In [31]:
merged

Unnamed: 0,song_id,als_score,song,title,als_score_normalized
0,25353,0.955239,SODTOQT12A8AE4596E,Medley SPC: Nosso Sonho Não É Ilusão / Tão Só ...,1.0
2,80060,0.955192,SOMDQVL12A58A7CF19,Cupido,0.999895
12,52441,0.916822,SOHVTMT12A58A79A18,Timekiller (Video Edit / Original),0.914259
14,67381,0.916706,SOKDXEU12A8C139241,Margarita,0.913999
52,106530,0.880989,SOQKHPE12A6D4FB4FF,Throes Of Rejection (LP Version),0.834284
55,72129,0.703283,SOKXIZH12AAF3B34D6,Among The Lazarae,0.437669
56,68336,0.601988,SOKHRWE12A6D4F6BE7,Cuando Seolvida El Amor,0.211594
57,94663,0.574642,SOOLXMH12AB018AB09,Welcome The Day,0.150562
58,45588,0.530224,SOGUJIM12AB0183637,Do You Know What It Means To Miss New Orleans?,0.051427
59,1287,0.507182,SOAEZBT12A6701FDCA,Little Ghetto Boy (Live Version),0.0


<h3> Song recommendations for any song id </h3>

In [32]:
song = 10546
model.similar_items(song)

[(10546, 0.08205632),
 (2010, 0.072531745),
 (120294, 0.06939317),
 (96296, 0.06870074),
 (79849, 0.06819586),
 (11663, 0.0646287),
 (28211, 0.06420706),
 (3720, 0.063761875),
 (114492, 0.06371321),
 (20641, 0.063381895)]

In [33]:
song = 'SOBONKR12A58A7A7E0'
song_id = song_data1['song_id'][song_data['song']==song]
song = song_id[0]

In [34]:
similar_items = model.similar_items(song)

In [35]:
similar_items_df = pd.DataFrame(similar_items, columns=['song_id', 'als_score'])

In [36]:
similar_items_df.shape

(10, 2)

In [37]:
merged1 = pd.merge(similar_items_df, song_data2, how='left', on='song_id')

In [38]:
merged1.drop_duplicates(inplace=True)

In [39]:
merged1.reset_index(drop=True)

Unnamed: 0,song_id,als_score,song,title
0,10546,0.082056,SOBONKR12A58A7A7E0,You're The One
1,2010,0.072532,SOAHSAF12A81C238F8,Hoop Dreams (Ralph Myerz 5th Floor Magic Retake)
2,120294,0.069393,SOSRXLJ12A8C13F168,Bullet In The Gun (Club Mix)
3,96296,0.068701,SOOSXDG12A8C13A139,Meditation
4,79849,0.068196,SOMCSNL12AB018A0DE,VERONA - Stay With Me
5,11663,0.064629,SOBSPFN12A8C138896,Wherever Jah Send Me
6,28211,0.064207,SOEELYL12A6D4F6F26,Homage To The Mountain
7,3720,0.063762,SOAOCSW12AB0189D7C,Mi Gran Noche
8,114492,0.063713,SORTAKH12A8C135C53,The Big Bright Green Pleasure Machine
9,20641,0.063382,SODBSIY12A6D4F8CCE,La Boulette (Génération Nan Nan)


In [40]:
merged1['als_score_normalized'] = (merged1['als_score'] - min(merged1['als_score'])) / (max(merged1['als_score']) - min(merged1['als_score']))

In [41]:
merged1

Unnamed: 0,song_id,als_score,song,title,als_score_normalized
0,10546,0.082056,SOBONKR12A58A7A7E0,You're The One,1.0
4136,2010,0.072532,SOAHSAF12A81C238F8,Hoop Dreams (Ralph Myerz 5th Floor Magic Retake),0.489967
4140,120294,0.069393,SOSRXLJ12A8C13F168,Bullet In The Gun (Club Mix),0.321899
4141,96296,0.068701,SOOSXDG12A8C13A139,Meditation,0.28482
4153,79849,0.068196,SOMCSNL12AB018A0DE,VERONA - Stay With Me,0.257784
4169,11663,0.064629,SOBSPFN12A8C138896,Wherever Jah Send Me,0.066765
4180,28211,0.064207,SOEELYL12A6D4F6F26,Homage To The Mountain,0.044187
4181,3720,0.063762,SOAOCSW12AB0189D7C,Mi Gran Noche,0.020348
4182,114492,0.063713,SORTAKH12A8C135C53,The Big Bright Green Pleasure Machine,0.017742
4185,20641,0.063382,SODBSIY12A6D4F8CCE,La Boulette (Génération Nan Nan),0.0
