<center><h2> Collaborative Memory Based Recommendation system </center></h2>

In the memory-based approach, we try to predict a user’s preference based on the ratings given by other similar users or received by other similar items.

Memory-based approaches include: 

* user-based collaborative filtering,
* item-based collaborative filtering

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import string
import warnings
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.metrics.pairwise import pairwise_distances 
from sklearn.model_selection import train_test_split
warnings.filterwarnings('ignore')

In [3]:
cd drive/MyDrive/Colab Notebooks/rs

/content/drive/MyDrive/Colab Notebooks/rs


In [4]:
song_data = pd.read_csv("song_data.txt", sep = ',')
song_data.head()

Unnamed: 0,user,song,play_count,track_id,artist,title
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero


In [5]:
song_data1 = song_data[['user', 'song', 'play_count', 'title']]

In [6]:
song_data1

Unnamed: 0,user,song,play_count,title
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,You're The One
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,Tive Sim
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,El Cuatrero
...,...,...,...,...
1450928,5e650759ebf89012044c6d52121eeada8b0ec814,SOVLNXV12A6D4F706E,1,Ms. Fat Booty
1450929,5e650759ebf89012044c6d52121eeada8b0ec814,SOVDSJC12A58A7A271,2,Ain't Misbehavin
1450930,5e650759ebf89012044c6d52121eeada8b0ec814,SOBRHVR12A8C133F35,2,Luvstruck
1450931,5e650759ebf89012044c6d52121eeada8b0ec814,SOMGVYU12A8C1314FF,2,Sinisten tähtien alla


In [7]:
song_data1 = song_data1[:80000]

In [8]:
encoding_user_song = preprocessing.OrdinalEncoder()
song_data1['user_id'] = encoding_user_song.fit_transform(song_data1[['user']])
song_data1['song_id'] = encoding_user_song.fit_transform(song_data1[['song']])

In [9]:
song_data1['user_id'] = song_data1['user_id'].astype('int')
song_data1['song_id'] = song_data1['song_id'].astype('int')

In [10]:
train, test = train_test_split(song_data1, test_size=0.30, random_state=31)

In [11]:
train.head()

Unnamed: 0,user,song,play_count,title,user_id,song_id
41872,68a462e9d2da60c7218a9a542c9b7b86e5726ce8,SOXFPND12AB017C9D1,1,I Gotta Feeling,2372,32185
6646,350c1b37d7f71de913783f47c38a9f50d3dd9592,SOULBZW12A58A80BA2,1,Steady As We Go,1200,28624
58152,5dd52bf193de097951b2d347fbc4d1d46b3788e2,SOFIJKI12A8C13C950,3,Still (Album Version),2123,7791
77573,1860137a21a76eca2ae5c7802417b3a599edd64b,SOWKSFG12A8C13E535,2,Torn,521,31153
13418,2ae49e99b5e0c012a6e3d77e2da5e0b0de1ce9b9,SOFLJQZ12A6D4FADA6,1,Tive Sim,955,7962


In [12]:
song_data1.shape

(80000, 6)

In [13]:
print(train.shape, test.shape)

(56000, 6) (24000, 6)


In [14]:
n_users = song_data1.user.nunique()
n_items = song_data1.song.nunique()

In [15]:
print(n_users, n_items)

6003 35727


<h3> Item based Recommendation system </h3>

<h4> Create empty data matrix: item*song </h4>

In [16]:
data_matrix = np.zeros((n_users, n_items))

<h4> Fill item*song matrix with rating values </h4> 

In [17]:
for line in train.itertuples():
    data_matrix[line[5]-1, line[6]-1] = line[3]

In [18]:
data_matrix.shape

(6003, 35727)

<h3> Pairwise distance with cosine metric </h3>

In [19]:
item_similarity = 1 - pairwise_distances(data_matrix.T, metric='cosine')

In [20]:

np.unique(item_similarity)

array([0.00000000e+00, 1.79324668e-05, 1.83263170e-05, ...,
       1.00000000e+00, 1.00000000e+00, 1.00000000e+00])

In [21]:
item_similarity.shape

(35727, 35727)

<h3> Dot product of Data Matrix with Item similarity </h3>

In [22]:
prediction_df = pd.DataFrame(item_similarity)

In [23]:
prediction_df.shape

(35727, 35727)

<h3> Song recommendations for any song id </h3>

In [24]:
item_recommendation = pd.DataFrame(prediction_df.iloc[20].sort_values(ascending=False))

In [25]:
song = 'SOBONKR12A58A7A7E0'
song_id = song_data1['song_id'][song_data['song']==song]
song = song_id[0]

In [26]:
song

2307

In [27]:
prediction_df.iloc[song].sort_values(ascending=False)[:10]

2307     1.000000
23756    1.000000
14240    1.000000
33972    1.000000
19387    1.000000
16983    1.000000
11098    0.707107
12276    0.102598
656      0.052200
11902    0.000000
Name: 2307, dtype: float64

In [28]:
recommended_songs_df = pd.DataFrame(prediction_df.iloc[song].sort_values(ascending=False)[:10])

In [29]:
recommended_songs_df.reset_index(inplace=True)
recommended_songs_df.columns = ['song_id', 'score']

In [30]:
song_data2 = song_data1[['song', 'song_id', 'title']].copy()

In [31]:
merged = pd.merge(recommended_songs_df, song_data2, how='left', on='song_id')

In [32]:
merged.drop_duplicates(inplace=True)

In [33]:
merged.reset_index(drop=True)

Unnamed: 0,song_id,score,song,title
0,2307,1.0,SOBONKR12A58A7A7E0,You're The One
1,23756,1.0,SOQQPZA12A6D4F9B69,Travel Agent
2,14240,1.0,SOJUYXY12A8C143472,You Can Get Murked
3,33972,1.0,SOYPGAR12A6D4F742E,Se Dagen Kom
4,19387,1.0,SONLURB12A8C13C72E,Another Perfect Day
5,16983,1.0,SOLTEEM12A8C13A5BD,Are you Sleeping?
6,11098,0.707107,SOHOXQK12A6D4FBF7C,Amerika (Album Version)
7,12276,0.102598,SOIJVVR12A6701C2EE,Everything About You
8,656,0.0522,SOALDLA12A6D4F8657,Loverboy
9,11902,0.0,SOICZHZ12AAA8C6654,Laichzeit


<h3> Normalize the score </h3>

In [34]:

merged['score_normalized'] = (merged['score'] - min(merged['score'])) / (max(merged['score']) - min(merged['score']))

In [35]:
merged

Unnamed: 0,song_id,score,song,title,score_normalized
0,2307,1.0,SOBONKR12A58A7A7E0,You're The One,1.0
249,23756,1.0,SOQQPZA12A6D4F9B69,Travel Agent,1.0
250,14240,1.0,SOJUYXY12A8C143472,You Can Get Murked,1.0
251,33972,1.0,SOYPGAR12A6D4F742E,Se Dagen Kom,1.0
255,19387,1.0,SONLURB12A8C13C72E,Another Perfect Day,1.0
256,16983,1.0,SOLTEEM12A8C13A5BD,Are you Sleeping?,1.0
257,11098,0.707107,SOHOXQK12A6D4FBF7C,Amerika (Album Version),0.707107
262,12276,0.102598,SOIJVVR12A6701C2EE,Everything About You,0.102598
264,656,0.0522,SOALDLA12A6D4F8657,Loverboy,0.0522
266,11902,0.0,SOICZHZ12AAA8C6654,Laichzeit,0.0
