## Music Recommendation Engine

* This notebook recommends a set of music similar to other music based on crowdsourced music play count by users
* It uses item based collberative filtering to achieve this.
* Dataset was downloaded from Kaggle

In [30]:
%matplotlib inline
import pandas as pd
import numpy as np
from numpy import int64

import requests
import IPython.display as Disp
import sklearn
from sklearn.decomposition import TruncatedSVD

In [31]:
dataset_file = r"C:\Users\HP\Downloads\Music Info.csv\song_data.csv"

song_df= pd.read_csv(dataset_url)
song_df.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


### Removing Duplicates
* First, let us check if there are any duplicate Song titles. 
* These are redundant to the algorithm and must be removed:

In [32]:
song_df.duplicated(subset='title').sum() # 297,571

297571

### Random Sampling
we need to randomly sample 15,000 rows from the dataframe to avoid running into memory errors:

In [50]:
sample_size = 15000
song_df = song_df.sample(n=sample_size, replace=False, random_state=490)

song_df = song_df.reset_index()
song_df = song_df.drop('index',axis=1)

### Processing Text Data
Now, let us print the head of the dataframe again:

In [51]:
song_df.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOASSJV12AC468A410,Rain,Radio for the deaf,Unreal,0
1,SOXTVAV12A58A7F54A,Rachel Blues,Legendary Country Blues Artists - CD C,Yank Rachell,0
2,SOWXPVJ12A8C135EC7,Sexomatic (Für Dich),Here We Are (Back Again),Orange Sector,1993
3,SOCKOXX12AB01852D7,Partout_ C'Est L'Amour,Fleur De Paris,Maurice Chevalier,0
4,SOJINIC12A8C14148F,Work. Rest. Play. Reggae.,All Sewn Up - A Tribute To Patrik Fitzgerald,Benjamin Zephaniah,0


In [52]:
song_df.describe()

Unnamed: 0,year
count,15000.0
mean,1038.707733
std,998.422285
min,0.0
25%,0.0
50%,1971.0
75%,2002.0
max,2010.0


In [53]:
song_df.groupby("artist_name")["song_id"].count().sort_values(ascending=False)

artist_name
Pearl Jam                8
Blind Lemon Jefferson    6
Donovan                  6
Björk                    6
Herbie Hancock           6
                        ..
I Wayne                  1
I-F                      1
I-Octane                 1
I:Scintilla              1
Überflüssig              1
Name: song_id, Length: 11477, dtype: int64

In [54]:
Filter_Artist=song_df['artist_name']=='Johnny Cash'
song_df[Filter_Artist]

Unnamed: 0,song_id,title,release,artist_name,year
1020,SOHKWON12AB0182B4F,Five Feet High and Rising,The Legendary Performance,Johnny Cash,1959
9529,SOODHRX12A8C13F958,Come Along And Ride This Train,The Great Lost Performance,Johnny Cash,2000


### Read dataset that shows how many times a user plays each song into pandas dataframe

In [55]:
triplets_file = r"C:\Users\HP\Downloads\Music Info.csv\10000.txt"
songs_to_user_df = pd.read_table(triplets_file,header=None)
songs_to_user_df.columns = ['user_id', 'song_id', 'listen_count']
songs_to_user_df.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [56]:
songs_to_user_df.describe()

Unnamed: 0,listen_count
count,2000000.0
mean,3.045485
std,6.57972
min,1.0
25%,1.0
50%,1.0
75%,3.0
max,2213.0


In [57]:
songs_to_user_df.groupby('user_id')['listen_count'].count().sort_values(ascending=False)

user_id
6d625c6557df84b60d90426c0116138b617b9449    711
fbee1c8ce1a346fa07d2ef648cec81117438b91f    643
4e11f45d732f4861772b2906f81a7d384552ad12    556
24b98f8ab023f6e7a1c37c7729c623f7b821eb95    540
1aa4fd215aadb160965110ed8a829745cde319eb    533
                                           ... 
10d3b027f494805b9223551e3db03f903953e2cf      1
87c22fcd7f5f833a8e33ba8bc5c7f4863dab5aa8      1
421be8356c6464ae9da340754c1b0b9510ae50b5      1
87a2826a059570052283d542fc03651c3a570afb      1
bec79e2e90bf0fe7238385b2ae6af711dd6c6d1d      1
Name: listen_count, Length: 76353, dtype: int64

### Merge songs and songs to user dataset¶

In [58]:
combined_songs_df = pd.merge(songs_to_user_df, song_df, on='song_id')

Type Markdown and LaTeX: α².

In [59]:
combined_songs_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,969cc6fb74e076a68e36a04409cb9d3765757508,SOXIIIM12A6D4F66C8,2,All My Friends,All My Friends,LCD Soundsystem,2007
1,5a905f000fc1ff3df7ca807d57edb608863db05d,SOXIIIM12A6D4F66C8,3,All My Friends,All My Friends,LCD Soundsystem,2007
2,b61afb42335287239bd40e1dea50d849cbf8a9a9,SOXIIIM12A6D4F66C8,4,All My Friends,All My Friends,LCD Soundsystem,2007
3,e21477efb83bd323205ce6f5bd662f3df9d477e5,SOXIIIM12A6D4F66C8,1,All My Friends,All My Friends,LCD Soundsystem,2007
4,b12c786deef0e618b5f277bc337f67128f425efe,SOXIIIM12A6D4F66C8,20,All My Friends,All My Friends,LCD Soundsystem,2007


### Get most listened songs

In [60]:
combined_songs_df.groupby('song_id')['listen_count'].count().sort_values(ascending=False)

song_id
SOLWZVR12AB01849C6    2028
SOUDLVN12AAFF43658    1481
SOBBKGF12A8C1311EE     720
SOBEVGM12A67ADBCA7     675
SOODSPH12AB01819C3     618
                      ... 
SOHNDOZ12A8C13684C      69
SOFODNB12A6D4FD584      67
SOKBJJO12A8C140826      65
SOGMORP12A8C13EF63      59
SOYYJAM12A6701ED36      56
Name: listen_count, Length: 178, dtype: int64

In [61]:
combined_songs_df.groupby('title')['listen_count'].count().sort_values(ascending=False)

title
All The Right Moves              2028
Make Love To Your Mind           1481
Where Is My Mind?                 720
Baby Boy [feat. Beyonce]          675
Between Two Lungs                 618
                                 ... 
World Looking In (Radio Edit)      69
Just Friends (Sunny)               67
Drop The Pressure                  65
Auburn and Ivory                   59
Sleeping In The Ground             56
Name: listen_count, Length: 178, dtype: int64

In [62]:
#songs_df_2 = pd.DataFrame(combined_songs_df.groupby('title')['listen_count'].count())
songs_df_2 = pd.DataFrame({'count' : combined_songs_df.groupby( [ "title"] ).size()}).reset_index()
songs_df_2.columns=['title','count']
#songs_df_2.head()
songs_df_2[(songs_df_2['count'] > 1000)  & (songs_df_2['count']<2028) ].head()
song_title = str(songs_df_2[songs_df_2['count'] ==2028 ]['title'].values[0])
print("this is title")
print(song_title)

this is title
All The Right Moves


In [63]:
filtered_songs = songs_df_2[(songs_df_2['count'] > 1000) & (songs_df_2['count'] < 2028)]
song_titles = filtered_songs['title']
print(song_titles)

96    Make Love To Your Mind
Name: title, dtype: object


In [64]:
combined_songs_df.groupby('artist_name')['listen_count'].count().sort_values(ascending=False)

artist_name
OneRepublic     2028
Bill Withers    1481
Pixies           720
The Strokes      678
Sean Paul        675
                ... 
Morcheeba         69
Musiq             67
Mylo              65
Beach House       59
Blind Faith       56
Name: listen_count, Length: 161, dtype: int64

In [65]:
Filter = combined_songs_df['song_id']=="SOLWZVR12AB01849C6"
combined_songs_df[Filter]['artist_name'].unique()
#printSongCover(Filter)

array(['OneRepublic'], dtype=object)

### Create Pivot Table of User Vs Songs

In [66]:
ct_df = combined_songs_df.pivot_table(values='listen_count', index='user_id', columns='title', fill_value=0)

In [67]:
ct_df.head()

title,10 Miles Wide,A Horse Is Not A Home,Ain't Nobody,All My Friends,All The Right Moves,Angel Malherido,Ante Up (Robbin Hoodz Theory),Auburn and Ivory,Away With Murder,B-B-B-Baby,...,Whenever I Say Your Name,Where Is My Mind?,Where You'll Find Me Now,Whisper,With Everything,Workin For A Livin,World Looking In (Radio Edit),Wrecking Ball,Yellow Sun,You Never Let Go
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000a5c8b4d8b2c98f7a205219181d039edcd4506,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
000d2df2cb8ad7300f89512f8fe8fadc4f99e733,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
000ebc858861aca26bac9b49f650ed424cf882fc,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
000ef25cc955ad5841c915d269432eea41f4a1a5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
001322829b5dc3edc59bf78189617ddd8f23c82a,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
X = ct_df.values.T
X.shape

(178, 22251)

### Compress dataset by applying Singular Value Decomposition (SVD)

In [69]:
SVD  = TruncatedSVD(n_components=20, random_state=17)
result_matrix = SVD.fit_transform(X)
result_matrix.shape

(178, 20)

### Create Pearson coorelation matrix

In [70]:
corr_mat = np.corrcoef(result_matrix)
corr_mat.shape

(178, 178)

### Print songs related to specific song

In [71]:
song_names = ct_df.columns
song_list = list(song_names)
print(song_list)


['10 Miles Wide', 'A Horse Is Not A Home', "Ain't Nobody", 'All My Friends', 'All The Right Moves', 'Angel Malherido', 'Ante Up (Robbin Hoodz Theory)', 'Auburn and Ivory', 'Away With Murder', 'B-B-B-Baby', 'Baby Boy [feat. Beyonce]', 'Back At One', 'Bad Actors', 'Barfly', 'Beat It', 'Bermuda', 'Best Of Both Worlds (Remastered Album Version)', 'Better in time', 'Between Two Lungs', 'Big Weenie', 'Born to raise hell', 'Break Through', 'Calm Down Baby', 'Cannabis', 'Change Down / The Sugar Rhyme', 'Che Sara', 'Check On It', "Crockett's Theme", 'Cruel Summer', 'Damn Girl', 'Dial Me Up', 'Do You Love Me', "Don't Wake Me (Album Version)", "Don't You Evah (Album version)", 'Downfall', 'Driven To Tears', 'Drop The Pressure', 'Dysfunctional (feat. Big Scoob & Krizz Kaliko)', 'Elevator Love Letter', 'Engwish Bwudd', 'Espera', 'Every Little Thing (Album Version)', 'Faster [Explicit Version]', "Fire Coming Out Of The Monkey's Head", 'Flash', 'Floaty', 'Fog. Vs. Mould For The Next Of Love', 'Foreve

In [72]:
#query_index = song_list.index('Yellow Sun')
#query_index = song_list.index('The Hardest Part')
#query_index = song_list.index('Whenever I Say Your Name')
query_index = song_list.index("Tobacco Island")

#query_index = song_list.index(song_title)
print(query_index)

154


In [73]:
corr_similar_songs = corr_mat[query_index]
corr_similar_songs.shape
print(corr_similar_songs)
print(type(song_list))
print((corr_similar_songs<1.0) & (corr_similar_songs>0.9))

[ 1.75621668e-01 -1.08128919e-01  6.80193218e-01 -4.04942478e-02
 -6.93222622e-02 -1.03811950e-01  9.73068588e-02 -1.21051886e-01
  1.44125933e-01 -4.05837759e-02 -1.43233947e-02  7.30931164e-01
  7.52742921e-02 -6.74826304e-02  4.77369299e-01 -7.04825135e-02
 -4.15638163e-02 -5.49527132e-02  2.74128937e-03  4.36831778e-02
  4.97805576e-01 -4.14198278e-02 -1.42018148e-01 -1.60442221e-02
  2.97651096e-01 -9.20702087e-02 -1.30827000e-01 -3.28028784e-02
  3.43322227e-01  2.65900093e-01 -7.10491427e-02  7.43728915e-02
  1.03870893e-01  2.07495783e-02  3.57417480e-01  5.74654025e-01
  1.12847404e-02  1.71272414e-01 -4.36296044e-02 -1.23748821e-01
 -1.45028604e-01 -7.10574802e-02 -3.76019460e-02 -6.70734800e-02
 -5.43792463e-02  1.48621881e-01  1.66226394e-01  6.50993670e-01
 -8.36220561e-02  1.48163063e-01 -9.25550434e-02 -1.57141645e-01
 -7.03815770e-02 -7.11921253e-02  8.50737278e-01  6.63992046e-02
  4.78253054e-01  8.29431656e-02 -1.78224757e-02  4.47989572e-01
 -9.22389844e-02 -9.04421

In [74]:
list(song_names[(corr_similar_songs<1.0) & (corr_similar_songs>0.98)])

['I Put A Spell On You', 'Tobacco Island']