![alt text](https://greenheritagellc.github.io/portal/images//Green%20Heritage%20Logo.png "Logo Title Text 1")

# Music Recommendation Engine

* This notebook recommends a set of music similar to other music based on crowdsourced music play count by users
* It uses item based collberative filtering to achieve this
* Citation: Songs dataset has been downloaded from http://millionsongdataset.com/sites/default/files/millionsong_ismir11_1.bib
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 
The Million Song Dataset. In Proceedings of the 12th International Society
for Music Information Retrieval Conference (ISMIR 2011), 2011.

In [12]:
%matplotlib inline
import pandas as pd
import numpy as np
from numpy import int64

import requests
import IPython.display as Disp
import sklearn
from sklearn.decomposition import TruncatedSVD

### Read dataset that shows metadata of each song into Pandas dataframe

In [17]:
songs_metadata_file = 'https://static.turi.com/datasets/millionsong/song_data.csv'
songs_df =  pd.read_csv(songs_metadata_file)
songs_df.head()

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


In [26]:
songs_df.describe()

Unnamed: 0,year
count,1000000.0
mean,1030.325652
std,998.745002
min,0.0
25%,0.0
50%,1969.0
75%,2002.0
max,2011.0


In [43]:
songs_df.groupby("artist_name")["song_id"].count().sort_values(ascending=False)

artist_name
Michael Jackson                            194
Johnny Cash                                193
Beastie Boys                               187
Joan Baez                                  181
Neil Diamond                               176
                                          ... 
Greg Davis & Jeph Jerman                     1
Greg Davis_ Sébastien Roux                   1
Greg Edwards                                 1
Greg Hawks & The Tremblers                   1

Little Louie" Vega Feat. Arnold Jarvis      1
Name: song_id, Length: 72665, dtype: int64

In [128]:
Filter_Artist=songs_df['artist_name']=='Johnny Cash'
songs_df[Filter_Artist]

Unnamed: 0,song_id,title,release,artist_name,year
178,SONTDUZ12D021989FD,Rowboat,Unchained,Johnny Cash,1996
7317,SOILMNQ12AB0186349,I Just Thought Youd Like To Know (Digitally R...,Mi Love Collection,Johnny Cash,0
9618,SOLKQPF12A8BEE956D,Old Doc Brown,Original Album Classics,Johnny Cash,1960
9633,SOWLYYP12D0219044F,I Was There When It Happened,The Legend,Johnny Cash,1957
14721,SOGATAN12AB01859E3,Thanks A Lot - Alternate,The Sun Years CD3,Johnny Cash,0
...,...,...,...,...,...
959408,SOHMWYF12AB018C0D5,Born To Lose,The Legend,Johnny Cash,1964
971336,SOXMOBG12D0219798A,Orange Blossom Special,Greatest Hits,Johnny Cash,1965
978035,SOOUGLY12A8BEE954A,You Dreamer You,Original Album Classics,Johnny Cash,1968
988937,SOKVPFB12A58A79D86,What On Earth Will You Do (For Heaven's Sake),God,Johnny Cash,1974


### Read dataset that shows how many times a user plays each song into pandas dataframe

In [19]:
triplets_file = 'https://static.turi.com/datasets/millionsong/10000.txt'
songs_to_user_df = pd.read_table(triplets_file,header=None)
songs_to_user_df.columns = ['user_id', 'song_id', 'listen_count']
songs_to_user_df.head()

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [25]:
songs_to_user_df.describe()

Unnamed: 0,listen_count
count,2000000.0
mean,3.045485
std,6.57972
min,1.0
25%,1.0
50%,1.0
75%,3.0
max,2213.0


In [37]:
songs_to_user_df.groupby('user_id')['listen_count'].count().sort_values(ascending=False)

user_id
6d625c6557df84b60d90426c0116138b617b9449    711
fbee1c8ce1a346fa07d2ef648cec81117438b91f    643
4e11f45d732f4861772b2906f81a7d384552ad12    556
24b98f8ab023f6e7a1c37c7729c623f7b821eb95    540
1aa4fd215aadb160965110ed8a829745cde319eb    533
                                           ... 
f19b6e160a2c88805c66aa5068001a96f22ef228      1
eb0a706547d7e173757a83358def009af40fd74f      1
b506264c38f0739cb352605ce534e7d03d016553      1
cd4a00b93a113ededaab455e3a62118ab11b5c47      1
d33330810b25dd593e6645e4a6ad791bc2f05685      1
Name: listen_count, Length: 76353, dtype: int64

## Merge songs and songs to user dataset

In [44]:
combined_songs_df = pd.merge(songs_to_user_df, songs_df, on='song_id')

In [45]:
combined_songs_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,7c86176941718984fed11b7c0674ff04c029b480,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
2,76235885b32c4e8c82760c340dc54f9b608d7d7e,SOAKIMP12A8C130995,3,The Cove,Thicker Than Water,Jack Johnson,0
3,250c0fa2a77bc6695046e7c47882ecd85c42d748,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
4,3f73f44560e822344b0fb7c6b463869743eb9860,SOAKIMP12A8C130995,6,The Cove,Thicker Than Water,Jack Johnson,0


### Get most listened songs

In [47]:
combined_songs_df.groupby('song_id')['listen_count'].count().sort_values(ascending=False)

song_id
SOFRQTD12A81C233C0    8277
SOWCKVR12A8C142411    7952
SOAUWYT12A81C206F1    7032
SOAXGDH12A8C13F8A1    6949
SOBONKR12A58A7A7E0    6412
                      ... 
SOLIGVL12AB017DBAE      51
SOWNLZF12A58A79811      51
SOBPGWB12A6D4F7EF3      50
SOGSPGJ12A8C134FAA      48
SOYYBJJ12AB017E9FD      48
Name: listen_count, Length: 10000, dtype: int64

In [76]:
combined_songs_df.groupby('title')['listen_count'].count().sort_values(ascending=False)

title
Sehr kosmisch                     8277
Use Somebody                      7952
Undo                              7032
Dog Days Are Over (Radio Edit)    6949
You're The One                    6729
                                  ... 
Scared                              51
Historia Del Portero                51
Don´t Leave Me Now                  50
Ghosts (Toxic Avenger Mix)          48
No Creo En El Jamas                 48
Name: listen_count, Length: 9593, dtype: int64

In [234]:
#songs_df_2 = pd.DataFrame(combined_songs_df.groupby('title')['listen_count'].count())
songs_df_2 = pd.DataFrame({'count' : combined_songs_df.groupby( [ "title"] ).size()}).reset_index()
songs_df_2.columns=['title','count']
#songs_df_2.head()
songs_df_2[(songs_df_2['count'] > 3000)  & (songs_df_2['count']<3113) ].head()
song_title = str(songs_df_2[songs_df_2['count'] ==3113 ]['title'].values[0])
print("this is title")
print(song_title)


this is title
Billionaire [feat. Bruno Mars]  (Explicit Album Version)


In [77]:
combined_songs_df.groupby('artist_name')['listen_count'].count().sort_values(ascending=False)

artist_name
Coldplay            32572
Kings Of Leon       26169
The Black Keys      19862
Jack Johnson        19590
Muse                19282
                    ...  
Shotta                 54
Umphrey's McGee        52
Ricardo Montaner       52
The Four Seasons       52
Amparanoia             50
Name: listen_count, Length: 3379, dtype: int64

In [78]:
Filter = combined_songs_df['song_id']=="SOWCKVR12A8C142411"
combined_songs_df[Filter]['artist_name'].unique()
#printBookCover(Filter)

array(['Kings Of Leon'], dtype=object)

### Create Pivot Table of User Vs Songs

In [72]:
ct_df = combined_songs_df.pivot_table(values='listen_count', index='user_id', columns='title', fill_value=0)

In [73]:
ct_df.head()

title,#!*@ You Tonight [Featuring R. Kelly] (Explicit Album Version),#40,& Down,' Cello Song,'97 Bonnie & Clyde,'Round Midnight,'Til We Die (Album Version),'Till I Collapse,('Til) I Kissed You,(Anaesthesia) Pulling Teath,...,sillyworld (Album Version),sleep_ eat food_ have visions,smile around the face,sun drums and soil,teachme (Album Version),the Love Song,you were there with me,¡Viva La Gloria! (Album Version),¿Lo Ves? [Piano Y Voz],Época
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00003a4459f33b92906be11abe0e93efc423c0ff,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
00005c6177188f12fb5e2e82cdbd93e8a3f35e64,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
00030033e3a2f904a48ec1dd53019c9969b6ef1f,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0007235c769e610e3d339a17818a5708e41008d9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0007c0e74728ca9ef0fe4eb7f75732e8026a278b,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [79]:
X = ct_df.values.T
X.shape

(9593, 76353)

### Compress dataset by applying Singular Value Decomposition (SVD)

In [80]:
SVD  = TruncatedSVD(n_components=20, random_state=17)
result_matrix = SVD.fit_transform(X)
result_matrix.shape

(9593, 20)

### Create Pearson coorelation matrix

In [81]:
corr_mat = np.corrcoef(result_matrix)
corr_mat.shape


(9593, 9593)

In [82]:
corr_mat[0][2]

0.5110047853800223

### Print books related to specified book


In [83]:
song_names = ct_df.columns
song_list = list(song_names)
print(song_list)




In [240]:
#query_index = song_list.index('Waiting For Tonight')
#query_index = song_list.index('Dog Days Are Over (Radio Edit)')
#query_index = song_list.index('My Happy Ending')
query_index = song_list.index("Heartbreak Warfare")

#query_index = song_list.index(song_title)
print(query_index)

3234


In [241]:
corr_similar_songs = corr_mat[query_index]
corr_similar_songs.shape
print(corr_similar_songs)
print(type(song_list))
print((corr_similar_songs<1.0) & (corr_similar_songs>0.9))

[0.76008426 0.7608103  0.67635799 ... 0.87151773 0.75495399 0.80549467]
<class 'list'>
[False False False ... False False False]


In [242]:
list(song_names[(corr_similar_songs<1.0) & (corr_similar_songs>0.98)])

['Billionaire [feat. Bruno Mars]  (Explicit Album Version)',
 'Black',
 'Bulletproof',
 'Clocks',
 'Fag Hag',
 'Fireflies',
 'Girls_ Girls_ Girls',
 'Half Of My Heart',
 'How You Remind Me',
 'If I Had You',
 "Livin' On A Prayer",
 'OMG',
 'Resistance',
 'Supermassive Black Hole (Album Version)',
 'Supermassive Black Hole (Twilight Soundtrack Version)',
 'Un Violinista En Tu Tejado',
 'Uprising']