# Million songs Recomendation Engine
- Using python library `Recomenders`

The dataset contains 2 files:
- The triplet_file contains user_id, song_id and listen time.
- The metadata_file contains song_id, title, release, year and artist_name.


Million Songs Dataset is a mixture of song from various website with the rating that users gave after listening to the song.

### Import Modules

In [1]:
import numpy as np 
import pandas as pd 
import Recommenders as Recommenders 

### Load dataset

In [2]:
song_df = pd.read_csv('triplets_file.csv')
song_df.head(5)

Unnamed: 0,user_id,song_id,listen_count
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1


In [3]:
song_df2 = pd.read_csv('song_data.csv')
song_df2.head(5)

Unnamed: 0,song_id,title,release,artist_name,year
0,SOQMMHC12AB0180CB8,Silent Night,Monster Ballads X-Mas,Faster Pussy cat,2003
1,SOVFVAK12A8C1350D9,Tanssi vaan,Karkuteillä,Karkkiautomaatti,1995
2,SOGTUKN12AB017F4F1,No One Could Ever,Butter,Hudson Mohawke,2006
3,SOBNYVR12A8C13558C,Si Vos Querés,De Culo,Yerba Brava,2003
4,SOHSBXH12A8C13B0DF,Tangle Of Aspens,Rene Ablaze Presents Winter Sessions,Der Mystic,0


In [4]:
# combine both dataset 

song_df = pd.merge(song_df, song_df2.drop_duplicates(['song_id']), on = 'song_id', how = 'left')

song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999


### Data Preprocessing 

In [5]:
# creating a new feature song , by combining artist name and title
song_df['song'] = song_df['artist_name'] + '-' +song_df['title']
song_df.head()

Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year,song
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOAKIMP12A8C130995,1,The Cove,Thicker Than Water,Jack Johnson,0,Jack Johnson-The Cove
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBBMDR12A8C13253B,2,Entre Dos Aguas,Flamenco Para Niños,Paco De Lucia,1976,Paco De Lucia-Entre Dos Aguas
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBXHDL12A81C204C0,1,Stronger,Graduation,Kanye West,2007,Kanye West-Stronger
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SOBYHAJ12A6701BF1D,1,Constellations,In Between Dreams,Jack Johnson,2005,Jack Johnson-Constellations
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,SODACBL12A8C13C273,1,Learn To Fly,There Is Nothing Left To Lose,Foo Fighters,1999,Foo Fighters-Learn To Fly


In [6]:
# trimming upto to 10k results for faster prediction 
song_df = song_df.head(10000)

# gropu songs by the number of poeple tuning in  
song_grouped = song_df.groupby(['song']).aggregate({"listen_count":"count"}).reset_index()

song_grouped.head(5)

Unnamed: 0,song,listen_count
0,+ / - {Plus/Minus}-The Queen of Nothing,2
1,10 Years-Beautiful,1
2,112-Only You-Bad Boy Remix (Featuring The Noto...,1
3,12 Stones-The Way I Feel (Not Our Master),3
4,16Volt-Machine Kit,1


In [7]:
gropuped_sum = song_grouped['listen_count'].sum()
song_grouped['percentge'] = (song_grouped['listen_count']/gropuped_sum)*100
song_grouped.sort_values(['listen_count','song'], ascending=False)


Unnamed: 0,song,listen_count,percentge
1994,Harmonia-Sehr kosmisch,45,0.45
1402,Dwight Yoakam-You're The One,32,0.32
515,Björk-Undo,32,0.32
3429,OneRepublic-Secrets,28,0.28
1694,Florence + The Machine-Dog Days Are Over (Radi...,28,0.28
...,...,...,...
6,1990s-Pollockshields,1,0.01
5,1990s-Cult Status,1,0.01
4,16Volt-Machine Kit,1,0.01
2,112-Only You-Bad Boy Remix (Featuring The Noto...,1,0.01


### Popularity Recommendation 

In [25]:
# picking a random user 
song_df[song_df.index == 12]['song']

12    The B-52's-Love Shack
Name: song, dtype: object

In [26]:
# creating a popularity based recommendation for that user 

reco = Recommenders.popularity_recommender_py()

reco.create(song_df, 'user_id','song')

reco.recommend(song_df['user_id'][12])

Unnamed: 0,user_id,song,score,Rank
1994,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Harmonia-Sehr kosmisch,45,1.0
515,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Björk-Undo,32,2.0
1402,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Dwight Yoakam-You're The One,32,3.0
1694,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Florence + The Machine-Dog Days Are Over (Radi...,28,4.0
3429,b80344d063b5ccb3212f76538f3d9e43d87dca9e,OneRepublic-Secrets,28,5.0
988,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Coldplay-The Scientist,27,6.0
2603,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Kings Of Leon-Use Somebody,27,7.0
2598,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Kings Of Leon-Revelry,26,8.0
874,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Charttraxx Karaoke-Fireflies,24,9.0
374,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Barry Tuckwell/Academy of St Martin-in-the-Fie...,23,10.0


### Item similarity Recommendation 

In [29]:
reco = Recommenders.item_similarity_recommender_py()
reco.create(song_df, 'user_id','song')

user_items = reco.get_user_items(song_df['user_id'][12])

# the song history of the user 
for user in user_items[:10]:
    print(user)

Jack Johnson-The Cove
Paco De Lucia-Entre Dos Aguas
Kanye West-Stronger
Jack Johnson-Constellations
Foo Fighters-Learn To Fly
Héroes del Silencio-Apuesta Por El Rock 'N' Roll
Lady GaGa-Paper Gangsta
Foo Fighters-Stacked Actors
Harmonia-Sehr kosmisch
Thievery Corporation feat. Emiliana Torrini-Heaven's gonna burn your eyes


In [30]:
# recommending the user some songs 

reco.recommend(song_df['user_id'][12])

No. of unique songs for the user: 45
no. of unique songs in the training set: 5151
Non zero values in cooccurence_matrix :6844


Unnamed: 0,user_id,song,score,rank
0,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Oliver James,0.043076,1
1,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Quiet Houses,0.043076,2
2,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Your Protector,0.043076,3
3,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Tiger Mountain Peasant Song,0.043076,4
4,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Sun It Rises,0.043076,5
5,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Pearl Jam-The End,0.037531,6
6,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Dave Grusin-St. Elsewhere,0.037531,7
7,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Céline Dion-Misled,0.037531,8
8,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Incubus-Oil And Water,0.037531,9
9,b80344d063b5ccb3212f76538f3d9e43d87dca9e,Fleet Foxes-Meadowlarks,0.037531,10


In [31]:
# give realative songs to the user base don the words

reco.get_similar_items(['Fleet Foxes-Oliver James','Fleet Foxes-Quiet Houses'])

no. of unique songs in the training set: 5151
Non zero values in cooccurence_matrix :92


Unnamed: 0,user_id,song,score,rank
0,,Fleet Foxes-Your Protector,1.0,1
1,,Fleet Foxes-Tiger Mountain Peasant Song,1.0,2
2,,Fleet Foxes-Sun It Rises,1.0,3
3,,Fleet Foxes-He Doesn't Know Why,0.666667,4
4,,Velvet Underground & Nico-There She Goes Again,0.5,5
5,,Pixies-Wave of Mutilation (UK Surf Version),0.5,6
6,,Fleet Foxes-Drops In The River,0.5,7
7,,The Velvet Underground-Oh! Sweet Nuthin' (LP V...,0.5,8
8,,Pixies-Debaser,0.5,9
9,,Pixies-Monkey Gone To Heaven,0.5,10
