# Lastfm Song Recommender

The data used in this notebook are lastfm from the Million Song Dataset https://labrosa.ee.columbia.edu/millionsong/lastfm

This notebook is built for fun, to try it:
- Download the dataset
- replace the tags list with your favorite song tags
- Page rank will do the rest :)

In [8]:
import pandas as pd
import numpy as np
import zipfile
import os
import json
from functools import reduce
from scipy.sparse import coo_matrix
import glob
import urllib

## Data Preparation

In [9]:
#One way to prepare the data is to flatten the nested directories
#!find ../../lastfm_train/ -iname *.json -type f -exec mv -i '{}' ../../train_data ';'

- Replace the data folder in data_path
- file name accordingly

In [10]:
data_path = "../../lastfm_train//"
files = glob.glob(data_path + "*/**/**/**")
N = len(files)

## PageRank on Songs

Parameters

In [22]:
# (g) Tags Threshold
g = 50
# (t) Similarity Threshold
t = 0
# Teleport probability
beta = 0.8
epsilon = 0.00001
max_iterations = 100
tags = ['Hip-Hop']
n_songs_to_recommend = 10

Lookup Indicies

In [12]:
index_to_song = [x.split("/")[-1].split(".")[0] for x in files]
song_to_index = {index_to_song[index]: index for index in range(len(files))}
index_to_title = [None] * N
song_to_path = {x.split("/")[-1].split(".json")[0]: "/".join(x.split("/")[:-1]) for x in files}

In [13]:
tag_to_songs = {}
def load_tags(js, song_index):
    global tag_to_songs
    for tag in js['tags']:
        if int(tag[1]) < g: continue
        tag_to_songs[tag[0]] = tag_to_songs.get(tag[0], []) + [song_index]

In [14]:
def union_by_tag(tags):
    return list(set([song for t in tags for song in tag_to_songs[t]]))

In [15]:
similarity_graph = [None] * len(index_to_song)
def load_similarity(js, song_index):
    global similarity_graph
    similarity_graph[song_index] = list(filter(lambda x: x, 
                                        map(lambda edge: edge[1] >= t
                                            and edge[0] in song_to_index
                                            and song_to_index[edge[0]], js['similars'])))
    

In [16]:
for index in range(len(index_to_song)):
    song_id = index_to_song[index]
    with open(song_to_path[song_id] + "/" + song_id + ".json") as f:
        j = json.load(f)
        index_to_title[index] = j['title']
        load_tags(j, index)
        load_similarity(j, index)

Converting the graph to a sparse scipy Matrix

In [17]:
rows = []; cols = []; data = [] 
for i in range(len(similarity_graph)):
    for j in similarity_graph[i]:
        rows.append(i)
        cols.append(j)
    if similarity_graph[i]:
        data += ([1.0 / len(similarity_graph[i])] * len(similarity_graph[i]))
graph = coo_matrix((data, (rows, cols)), shape=(N, N))
graph = graph.transpose()

Init the Rank vector

In [24]:
N

633515

In [18]:
rank = np.array([1.0 / N] * N)
rank.shape = (N, 1)

Iterate to untill the Rank vector converges according to the formula:
$$ r \leftarrow \beta M r + (1-\beta) S $$

In [23]:
i = 0
topic_songs = union_by_tag(tags)
topic_vector = coo_matrix( ([1/len(topic_songs)] * len(topic_songs), (topic_songs, [0] * len(topic_songs))), shape=(N, 1))
previous_rank = rank
for i in range(max_iterations):
    print(i)
    rank = beta * graph.dot(rank) + (1 - beta) * topic_vector
    time_to_end = sum(abs(previous_rank - rank)) < epsilon
    previous_rank = rank
    i += 1
    if time_to_end: break
print("Number of Iterations to converge = " + str(i))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Number of Iterations to converge = 59


## Results

Get the top N songs

In [26]:
our_lovely_recommended_songs = []
for i in range(n_songs_to_recommend):
    index = np.argmax(rank); rank[index] = -2
    our_lovely_recommended_songs.append(index)

print("Below are the recommended songs for tags " + str(tags) + "\n")
for name in [index_to_title[i] for i in our_lovely_recommended_songs]: print(name + 
                "\thttps://www.youtube.com/results?search_query=" + urllib.parse.quote_plus(name+ " Song"))

Below are the recommended songs for tags ['Hip-Hop']

It's a Shame	https://www.youtube.com/results?search_query=It%27s+a+Shame+Song
It's a Shame	https://www.youtube.com/results?search_query=It%27s+a+Shame+Song
Buck Em Down	https://www.youtube.com/results?search_query=Buck+Em+Down+Song
Robot	https://www.youtube.com/results?search_query=Robot+Song
Physical Stamina	https://www.youtube.com/results?search_query=Physical+Stamina+Song
Physical Stamina	https://www.youtube.com/results?search_query=Physical+Stamina+Song
Ghetto Knows	https://www.youtube.com/results?search_query=Ghetto+Knows+Song
Journey	https://www.youtube.com/results?search_query=Journey+Song
9th Chamber	https://www.youtube.com/results?search_query=9th+Chamber+Song
Runnin'	https://www.youtube.com/results?search_query=Runnin%27+Song
