# Song Embeddings - Skipgram Recommender
> In this notebook, we'll use human-made music playlists to learn song embeddings. We'll treat a playlist as if it's a sentence and the songs it contains as words. We feed that to the word2vec algorithm which then learns embeddings for every song we have. These embeddings can then be used to recommend similar songs.

- toc: true
- badges: true
- comments: true
- categories: [Word2Vec, Embedding, Music, Sequence]
- author: "<a href='https://github.com/jalammar/jalammar.github.io'>Jay Alammar</a>"
- image:

This technique is used by Spotify, AirBnB, Alibaba, and others. It accounts for a vast portion of their user activity, user media consumption, and/or sales (in the case of Alibaba). The dataset we'll use was collected by Shuo Chen from Cornell University. The [dataset](https://www.cs.cornell.edu/~shuochen/lme/data_page.html) contains playlists from hundreds of radio stations from around the US.

## Downloading data

In [1]:
!wget -q https://www.cs.cornell.edu/~shuochen/lme/dataset.tar.gz
!tar -xf dataset.tar.gz

## Setup

In [7]:
import numpy as np
import pandas as pd
import gensim 
from gensim.models import Word2Vec
from urllib import request

In [8]:
import warnings
warnings.filterwarnings('ignore')

## Training dataset

In [14]:
with open("/content/dataset/yes_complete/train.txt", 'r') as f:
  # skipping first 2 lines as they contain only metadata
  lines = f.read().split('\n')[2:]
  # select playlists with at least 2 songs, a minimum threshold for sequence learning 
  playlists = [s.rstrip().split() for s in lines if len(s.split()) > 1]

In [18]:
print( 'Playlist #1:\n ', playlists[0], '\n')
print( 'Playlist #2:\n ', playlists[1])

Playlist #1:
  ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '2', '42', '43', '44', '45', '46', '47', '48', '20', '49', '8', '50', '51', '52', '53', '54', '55', '56', '57', '25', '58', '59', '60', '61', '62', '3', '63', '64', '65', '66', '46', '47', '67', '2', '48', '68', '69', '70', '57', '50', '71', '72', '53', '73', '25', '74', '59', '20', '46', '75', '76', '77', '59', '20', '43'] 

Playlist #2:
  ['78', '79', '80', '3', '62', '81', '14', '82', '48', '83', '84', '17', '85', '86', '87', '88', '74', '89', '90', '91', '4', '73', '62', '92', '17', '53', '59', '93', '94', '51', '50', '27', '95', '48', '96', '97', '98', '99', '100', '57', '101', '102', '25', '103', '3', '104', '105', '106', '107', '47', '108', '109', '110', '111', '112', '113', '25', '63', '62', '114', '115', '84', '116', '117',

## Training Word2vec

Our dataset is now in the shape the the Word2Vec model expects as input. We pass the dataset to the model, and set the following key parameters:

- size: Embedding size for the songs.
- window: word2vec algorithm parameter -- maximum distance between the current and predicted word (song) within a sentence
- negative: word2vec algorithm parameter -- Number of negative examples to use at each training step that the model needs to identify as noise

In [19]:
model = Word2Vec(playlists, size=32, window=20, negative=50, min_count=1, workers=-1)

The model is now trained. Every song has an embedding. We only have song IDs, though, no titles or other info. Let's grab the song information file.

## Prepare songs metadata

### Title and artist

In [4]:
!head /content/dataset/yes_complete/song_hash.txt

0 	Gucci Time (w\/ Swizz Beatz)	Gucci Mane
1 	Aston Martin Music (w\/ Drake & Chrisette Michelle)	Rick Ross
2 	Get Back Up (w\/ Chris Brown)	T.I.
3 	Hot Toddy (w\/ Jay-Z & Ester Dean)	Usher
4 	Whip My Hair	Willow
5 	Down On Me (w\/ 50 Cent)	Jeremih
6 	Black And Yellow	Wiz Khalifa
7 	Blowing Me Kisses	Soulja Boy
8 	Lay It Down	Lloyd
9 	Good For My Money (w\/ Lloyd)	Baby Bash


In [20]:
with open("/content/dataset/yes_complete/song_hash.txt", 'r') as f:
  songs_file = f.read().split('\n')
  songs = [s.rstrip().split('\t') for s in songs_file]

In [21]:
songs_df = pd.DataFrame(data=songs, columns = ['id', 'title', 'artist'])
songs_df = songs_df.set_index('id')
songs_df.head()

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Gucci Time (w\/ Swizz Beatz),Gucci Mane
1,Aston Martin Music (w\/ Drake & Chrisette Mich...,Rick Ross
2,Get Back Up (w\/ Chris Brown),T.I.
3,Hot Toddy (w\/ Jay-Z & Ester Dean),Usher
4,Whip My Hair,Willow


In [62]:
songs_df.iloc[[1,10,100]]

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Aston Martin Music (w\/ Drake & Chrisette Mich...,Rick Ross
10,Shake It,Elephant Man
100,I'm Yours,Jason Mraz


In [71]:
songs_df[songs_df.artist == 'Rush'].head()

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
1861,Tom Sawyer,Rush
2640,Red Barchetta,Rush
2655,Fly By Night,Rush
2691,Freewill,Rush
2748,Limelight,Rush


### Tags

In [5]:
!head /content/dataset/yes_complete/tag_hash.txt

0, rock
1, pop
2, favorites
3, alternative
4, love
5, male vocalists
6, american
7, indie
8, classic rock
9, awesome


In [85]:
with open("/content/dataset/yes_complete/tag_hash.txt", 'r') as f:
  tags_file = f.read().split('\n')
  tags = [s.rstrip().split(',') for s in tags_file]
  tag_name = {a:b.strip() for a,b in tags}
  tag_name['#'] = 'no tag'

In [88]:
print('Tag name for tag id {} is "{}"\n'.format('10', tag_name['10']))
print('Tag name for tag id {} is "{}"\n'.format('80', tag_name['80']))
print('There are total {} tags'.format(len(tag_name.items())))

Tag name for tag id 10 is "jazz"

Tag name for tag id 80 is "rhythm and blues"

There are total 251 tags


In [34]:
!head /content/dataset/yes_complete/tags.txt

154
20 35 40 65 72 130 154 193
154
1 49
1 6 21 35 49 65 78 80 141 154
21 35 38 49 65 72 114 141 154
1 5 6 21 33 49 63 65 72 87 98 110 141 147 154 197
49 65 72 141 197
11 35 154
#


In [38]:
with open("/content/dataset/yes_complete/tags.txt", 'r') as f:
  song_tags = f.read().split('\n')
  song_tags = [s.split(' ') for s in song_tags]
  song_tags = {a:b for a,b in enumerate(song_tags)}

In [89]:
def tags_for_song(song_id=0):
  tag_ids = song_tags[int(song_id)]
  return [tag_name[tag_id] for tag_id in tag_ids]

In [61]:
print('Tags for song "{}" : {}\n'.format(songs_df.iloc[0].title, tags_for_song(0)))

Tags for song "Gucci Time (w\/ Swizz Beatz)" : ['wjlb-fm']



## Recommend

In [147]:
def recommend(song_id=0, topn=5):
  # song info
  song_info = songs_df.iloc[song_id]
  song_tags = [', '.join(tags_for_song(song_id))]
  query_song = pd.DataFrame({'title':song_info.title,
                             'artist':song_info.artist,
                             'tags':song_tags})

  # similar songs
  similar_songs = np.array(model.wv.most_similar(positive=str(song_id), topn=topn))[:,0]
  recommendations = songs_df.iloc[similar_songs]
  recommendations['tags'] = [tags_for_song(i) for i in similar_songs]

  recommendations = pd.concat([query_song, recommendations])

  axis_name = ['Query'] + ['Recommendation '+str((i+1)) for i in range(topn)]
  # recommendations.index = axis_name
  recommendations = recommendations.style.set_table_styles([{'selector': 'th', 'props': [('background-color', 'gray')]}])
  
  return recommendations

In [128]:
recs = recommend(10)
recs

Unnamed: 0,title,artist,tags
Query,Shake It,Elephant Man,no tag
Recommendation 1,Pudrete,Banda MS,['no tag']
Recommendation 2,Let Me Know,Roisin Murphy,"['rock', 'pop', 'favorites', 'love', 'female vocalists', '00s', 'dance', 'favourites', 'cool', 'chillout', 'electronic', 'sexy', 'british', 'upbeat', 'sad', 'seen live', 'indie pop', 'love it', 'electronica', 'female', 'good stuff', 'uk', 'lovely', 'disco', 'electro', 'favorite artists', '2007']"
Recommendation 3,In This Lifetime,The Psycho Realm,['no tag']
Recommendation 4,Take A Bow,Rihanna,"['pop', 'love', 'american', 'beautiful', 'soul', 'female vocalists', '00s', 'mellow', 'favorite', 'dance', 'favourites', 'cool', 'chillout', 'rnb', 'sexy', 'female vocalist', 'hip-hop', 'love songs', 'sad', 'hip hop', 'ballad', 'piano', 'memories', 'relaxing', 'love at first listen', 'female', 'r&b', 'slow', 'sweet', 'love song', 'soft', 'rb', 'r and b', 'emo', '<3', 'slow jams', 'major key tonality', 'guilty pleasures', 'emotional', '2008', 'a subtle use of vocal harmony', 'cute']"
Recommendation 5,Get Right,Jennifer Lopez,"['pop', 'favorites', 'love', 'american', 'soul', 'female vocalists', '00s', 'dance', 'singer-songwriter', '90s', 'favourites', 'cool', 'catchy', 'rnb', 'sexy', 'fun', 'party', 'happy', 'female vocalist', 'hip-hop', 'funk', 'upbeat', 'hip hop', 'female', 'funky', 'r&b', '2000s', 'latin', 'energetic', 'top 40', 'vocal', 'female vocals', 'english', 'urban', 'uplifting', 'r and b', 'hardcore', 'guilty pleasures', 'guilty pleasure', 'hiphop', 'new york', 'sing along', 'feelgood']"


### Paranoid Android - Radiohead

In [148]:
recommend(song_id=19563)

Unnamed: 0,title,artist,tags
0,Paranoid Android,Radiohead,"rock, pop, favorites, alternative, love, male vocalists, indie, classic rock, awesome, beautiful, mellow, alternative rock, favorite, chill, 90s, classic, favourites, chillout, indie rock, guitar, favorite songs, male vocalist, electronic, loved, british, favourite, soundtrack, amazing, sad, favourite songs, great song, ballad, melancholy, epic, experimental, psychedelic, memories, electronica, love at first listen, fucking awesome, progressive rock, great, best, nostalgia, melancholic, fav, good stuff, uk, great lyrics, ambient, perfect, psychedelic rock, dark, britpop, brilliant, alternative punk, progressive, emotional, masterpiece, best songs ever, rockin, genius, all time favourites, alt rock, 1990s"
43036,Que Te Quieran Mas Que Yo,Marco Antonio Solis,['no tag']
64157,Paryer And Meditation,Jessica Williams,['no tag']
65275,Hallelujah Goat,Volbeat,"['rock', 'awesome', 'hard rock', 'metal', 'heavy metal', 'good', 'rock and roll']"
66070,You're My Christmas Present,Jimmy Beaumont & The Skyliners,['christmas']
16550,Jump Start,Nils,['no tag']


### California Love - 2Pac

In [149]:
recommend(song_id=842)

Unnamed: 0,title,artist,tags
0,California Love (w\/ Dr. Dre & Roger Troutman),2Pac,"favorites, love, oldies, dance, 90s, classic, loved, party, hip-hop, hip hop, rap, fav, old school, songs i absolutely love, hiphop, acclaimed music top 3000, california, 1990s"
20597,Monk'n Around,Ryan Cohan,['no tag']
41172,Nadie Como Tu,Ramon Ayala Y Sus Bravos Del Norte,['no tag']
44549,Crash,The Primitives,"['rock', 'pop', 'favorites', 'alternative', 'indie', 'female vocalists', '80s', '90s', 'cool', 'catchy', 'party', 'favourite', 'happy', 'female vocalist', 'upbeat', 'soundtrack', 'indie pop', 'memories', 'female', 'new wave', 'uplifting', 'britpop', 'major key tonality', ""80's"", '1980s', 'acclaimed music top 3000', 'rockin']"
53636,Always There For You,Stryper,"['80s', 'hard rock', 'heavy metal', 'christian', 'christian rock']"
48409,-,-,['no tag']


### Billie Jean - Michael Jackson

In [150]:
recommend(song_id=3822)

Unnamed: 0,title,artist,tags
0,Billie Jean,Michael Jackson,"rock, pop, favorites, alternative, love, male vocalists, american, classic rock, awesome, beautiful, soul, oldies, favorite, 80s, dance, singer-songwriter, 90s, classic, favourites, cool, 70s, catchy, favorite songs, rnb, male vocalist, electronic, sexy, loved, fun, party, favourite, pop rock, funk, amazing, usa, rhythm and blues, memories, funky, r&b, best, nostalgia, energetic, top 40, old school, nice, english, rb, urban, groovy, disco, perfect, guilty pleasures, 80's, brilliant, my favorites, guilty pleasure, 1980s, retro, masterpiece, motown, classics, best songs ever, classic soul, legend"
48164,Heartbeat,Could Nothings,['no tag']
30835,Steve's Tune,Steve Lambert,['no tag']
11901,So Much Trouble In The World,Bob Marley & The Wailers,"['mellow', 'chill', 'classic', 'favourites', 'cool', '70s', 'favourite', 'sad', 'great song', 'summer', 'relax', 'best', 'drjazzmrfunkmusic', 'reggae', 'the best', 'legend']"
70557,Carmelita,Warren Zevon,"['rock', 'classic rock', 'mellow', 'singer-songwriter', '70s', 'hard rock', 'guitar', 'male vocalist', 'folk', 'acoustic', 'progressive rock', 'vocal', 'folk rock', 'americana', 'radioparadise', 'covers', 'perfect', 'radio paradise', 'my pop music', 'southern rock']"
14550,Heavy,Collective Soul,"['rock', 'pop', 'favorites', 'alternative', 'male vocalists', 'awesome', '00s', 'alternative rock', '90s', 'hard rock', 'loved', 'upbeat', 'great song', 'memories', 'progressive rock', 'grunge', 'faves', 'heavy', 'a subtle use of vocal harmony', 'favs']"
