# Building a Song Recommender  
by Braden Weber (blw22) and Kelsey Yen (kny4)

## Goal  

The goal of our project was to create a song recommender. Using the fundamentals from Units 6 and 12, we created a song recommender and validated the results based on how well it recommends songs to a genre-specific playlist.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import tensorflow as tf
import pandas as pd## Goal  
import matplotlib.pyplot as plt
from IPython.display import display
import math
import json

pd.options.mode.chained_assignment = None  # default='warn'

def num_parameters(model):
    """Count the number of trainable parameters in a model"""
    return sum(param.numel() for param in model.parameters() if param.requires_grad)

## The Dataset  

The dataset we used is from the [Spotify Million Playlist Dataset Challenge](https://research.atspotify.com/the-million-playlist-dataset-remastered/). The data comes from sampling random existing Spotify playlists made by real users for Spotify's AI research (much of which focuses on recommendation systems).  
Each playlist is a dictionary that includes data. The data given about each track is as follows:
* **track_name** - the name of the track
* **track_uri** - the Spotify URI of the track
* **album_name** - the name of the track's album
* **album_uri** - the Spotify URI of the album
* **artist_name** - the name of the track's primary artist
* **artist_uri** - the Spotify URI of track's primary artist
* **duration_ms** - the duration of the track in milliseconds
* **pos** - the position of the track in the playlist (zero-based)  

Unlike other datasets we explored, the dataset does not include song attributes that can be compared to one another, things like tempo, genre, or danceability (although attributes are available using Spotify Web API). Since our model is trained on the occurence of songs in a certain type of playlist, the data is only used to identify unique songs in the dataset.  

To use the dataset, it was unzipped then made into a datafram using this `for loop`:

In [2]:
# How many .json files to load in
files_to_load = 1

# Path to the folder containing the songs
path = "/home/blw22/Desktop/songs"

# Start a songs dataframe
data = json.load(open(path + '/mpd.slice.0-999.json'))
playlists = pd.DataFrame(data['playlists'])

# Load in each json file and append it to the songs dataframe
for i in range(1000, 1000 * (files_to_load - 1) + 1, 1000):
    file = path + '/mpd.slice.' + str(i) + '-' + str(i + 999) + '.json'
    print(i, end = ' ')
    data = json.load(open(file))
    playlists2 = pd.DataFrame(data['playlists'])
    playlists = pd.concat([playlists, playlists2])
# Reset index so there is not duplicate indices
playlists = playlists.reset_index()

# Drop all song data that isn't going to be used
playlists = playlists.drop(labels=['collaborative', 'modified_at', 'num_tracks', 'num_albums', 'num_followers', 'num_edits', 'duration_ms', 'num_artists', 'description', 'name', 'pid'], axis=1)

To make the data easier to handle, we shortened playlists to 30 songs and deleted playlists with less than 30 songs. This ensures that every playlist has the same size and will work with tensors.  

Next, we create the 10 training sets by pulling out the last ten songs in each playlist, respectively assigning them as the target song for each of the training sets (train_1 uses the song at index 21, train_2 uses the song at index 22, etc) .

In [3]:
playlists_train1 = []
playlists_train2 = []
playlists_train3 = []
playlists_train4 = []
playlists_train5 = []
playlists_train6 = []
playlists_train7 = []
playlists_train8 = []
playlists_train9 = []
playlists_train10 = []

for index, row in playlists.iterrows():
    # Drop the song if it is less than 30 songs long
    if (len(row['tracks']) < 30):
        playlists = playlists.drop(labels=index, axis=0)
        continue
    # Take the last 10 songs and add them all to the 10 training sets
    playlists_train1.append(row['tracks'][20])
    playlists_train2.append(row['tracks'][21])
    playlists_train3.append(row['tracks'][22])
    playlists_train4.append(row['tracks'][23])
    playlists_train5.append(row['tracks'][24])
    playlists_train6.append(row['tracks'][25])
    playlists_train7.append(row['tracks'][26])
    playlists_train8.append(row['tracks'][27])
    playlists_train9.append(row['tracks'][28])
    playlists_train10.append(row['tracks'][29])
    # Leave the first 20 songs of the playlist as our playlist data
    playlists['tracks'][index] = row['tracks'][0:20]

The model needs to turn each unique track into a number in order to work, so we put each track into a set, turn that set into a list, and use this list in the future to match songs up with their index.

In [4]:
tracks = set()
for index, row in playlists.iterrows():
    for track in row['tracks']:
        tracks.add(track['track_name'] + ' - ' + track['artist_name'])
for song in playlists_train1:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train2:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train3:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train4:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train5:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train6:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train7:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train8:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train9:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
for song in playlists_train10:
    tracks.add(song['track_name'] + ' - ' + song['artist_name'])
tracks = list(tracks)

len(tracks)

13572

Now that each song can be identified with a unique index, we go back through the playlists and replace the songs with their track index.

In [5]:
playlist_list = []

for index, row in playlists.iterrows():
    playlist = []
    for track in row['tracks']:
        playlist.append(tracks.index(track['track_name'] + ' - ' + track['artist_name']))
    playlist_list.append(playlist)
playlists_tensor = torch.tensor(playlist_list)

In [6]:
playlists_train1 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train1]
playlists_train2 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train2]
playlists_train3 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train3]
playlists_train4 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train4]
playlists_train5 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train5]
playlists_train6 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train6]
playlists_train7 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train7]
playlists_train8 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train8]
playlists_train9 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train9]
playlists_train10 = [tracks.index(song['track_name'] + ' - ' + song['artist_name']) for song in playlists_train10]

playlists_train1 = torch.tensor(playlists_train1)
playlists_train2 = torch.tensor(playlists_train2)
playlists_train3 = torch.tensor(playlists_train3)
playlists_train4 = torch.tensor(playlists_train4)
playlists_train5 = torch.tensor(playlists_train5)
playlists_train6 = torch.tensor(playlists_train6)
playlists_train7 = torch.tensor(playlists_train7)
playlists_train8 = torch.tensor(playlists_train8)
playlists_train9 = torch.tensor(playlists_train9)
playlists_train10 = torch.tensor(playlists_train10)

Let's look at the shapes of our tensors:

Training shape should equal the number of playlists.    

In [7]:
playlists_train1.shape

torch.Size([700])

Playlists shape shows that there are 20 songs in every playlist

In [8]:
playlists_tensor.shape

torch.Size([700, 20])

## Embedder setup

To set up the embedding of each song, we used the number of unique songs and the embedding dimension.

In [9]:
unique_songs = len(tracks)
unique_songs

13572

In [10]:
n_vocab = unique_songs
emb_dim = 50
embedder = nn.Embedding(n_vocab, emb_dim)
embedder.weight.shape

torch.Size([13572, 50])

In [11]:
num_parameters(embedder)

678600

Next, we inserted tensor of playlists into `embedder` to create a tensor that has the embeddings for each song in a given playlist.

In [12]:
emb_play = embedder(playlists_tensor)
emb_play.shape

torch.Size([700, 20, 50])

## Model Setup

Our model architecture is as follows:

<div>
    <img src="model.png" width="500">
</div>

The playlist tensor on top holds all the playlists and their 20 songs. Each playlist can be put into the model to output the top 15 songs recommended to add to the playlist.

Let's talk about the MLPs. Each corresponds to the new parameter added:
* MLP_mean: The input and output size is the embedding dimension. Mean is a tensor of length 50 and is the average of the rows for each embedding for the songs (an average of all the songs). 
* MLP_variance: This runs after the variance of each embedding in the playlist is added into the model. This also has an input and output size equal to the embedding dimension.
* MLP_head: This has three layers that have outputs sized respectively at 100, 50, and the amount of unique tracks. We keep the layer before the final one at 50 because if it is too large, there are too many parameters to keep track of and the kernel dies.

In [13]:
hidden_layer1 = 100
hidden_layer2 = 50
mlp_mean = nn.Sequential(
    nn.Linear(in_features=emb_dim, out_features=hidden_layer1),
    nn.ReLU(),
    nn.Linear(in_features=hidden_layer1, out_features=emb_dim)
)
mlp_variance = nn.Sequential(
    nn.Linear(in_features=emb_dim, out_features=hidden_layer1),
    nn.ReLU(),
    nn.Linear(in_features=hidden_layer1, out_features=emb_dim)
)
mlp_head = nn.Sequential(
    nn.Linear(in_features=emb_dim, out_features = hidden_layer1),
    nn.ReLU(),
    nn.Linear(in_features=hidden_layer1, out_features = hidden_layer2),
    nn.ReLU(),
    nn.Linear(in_features=hidden_layer2, out_features = unique_songs)
)
print(num_parameters(mlp_mean))
print(num_parameters(mlp_variance))
print(num_parameters(mlp_head))

10150
10150
702322


This function runs our model, which takes in the mean and variance of a plyalist. It returns a large array of scores for each possible song.

In [14]:
def my_model(mean, var):
    output1 = mlp_mean(mean)
    output1 += var + mean
    output2 = mlp_variance(output1)
    output2 += output1
    return mlp_head(output2)

## Training

The models use an optimizer to step through the gradients and zero them out afterward. Pytorch provides a nice function to wrap the optimizing process into one object.

In [15]:
learning_rate = 0.00001

optimizer_mean = torch.optim.SGD(mlp_mean.parameters(), lr=learning_rate, momentum=0.9)
optimizer_variance = torch.optim.SGD(mlp_variance.parameters(), lr=learning_rate, momentum=0.9)
optimizer_head = torch.optim.SGD(mlp_head.parameters(), lr=learning_rate, momentum=0.9)
optimizer_embedding = torch.optim.SGD(embedder.parameters(), lr=learning_rate, momentum=0.9)

To run a playlist through the model, we need someway to convert the playlist into its mean and variance tensors. This funciton takes in a playlist and the list of all song embeddings and finds the mean and variance to input into the model, then it runs the model and returns the result.

In [16]:
def get_recommendations(playlist_index, emb):
    songs = emb[playlist_index]
    
    mean = torch.tensor([0.0 for i in range(emb_dim)])
    var = torch.tensor([0.0 for i in range(emb_dim)])

    # calculate mean
    for j, emb in enumerate(songs):
        mean += emb
    mean /= len(songs)

    # calculate variance
    for j, emb in enumerate(songs):
        var += (emb - mean) * (emb - mean)
    var /= len(songs)
    for j, _ in enumerate(var):
        var[j] = math.sqrt(var[j])

    # Run through the model
    return my_model(mean, var)

To train the model, this function enumerates through every playlist in the list "playlists", runs it through the model, calculates the loss of the model, and then backpropogates on the sum of all losses.

In [17]:
# Training set is 1, 2, or 3
def train(playlists, training_set):
    loss_total = torch.tensor([0.])
    
    for i, songs in enumerate(playlists):
                
        # Run through the model
        output = get_recommendations(i, playlists)

        # Select training set
        if training_set == 1:
            answer = playlists_train1[i].reshape((1))
        if training_set == 2:
            answer = playlists_train2[i].reshape((1))
        if training_set == 3:
            answer = playlists_train3[i].reshape((1))
        if training_set == 4:
            answer = playlists_train4[i].reshape((1))
        if training_set == 5:
            answer = playlists_train5[i].reshape((1))
        if training_set == 6:
            answer = playlists_train6[i].reshape((1))
        if training_set == 7:
            answer = playlists_train7[i].reshape((1))
        if training_set == 8:
            answer = playlists_train8[i].reshape((1))
        if training_set == 9:
            answer = playlists_train9[i].reshape((1))
        if training_set == 10:
            answer = playlists_train10[i].reshape((1))
        
        # Calculate loss
        output = output.reshape((1, unique_songs))
        # Cross-entropy takes two argument: the huge array of probailities 
        # of each unique song and the index of the target song.
        loss_total += F.cross_entropy(output, answer, reduction='none')

    # Goes backward down the gradient to update each parameter in the model
    loss_total.backward(retain_graph=True)
    
    # Step along gradient
    optimizer_mean.step()
    optimizer_variance.step()
    optimizer_head.step()
    optimizer_embedding.step()

    # IMPORTANT: Zero the gradient every time the loss is used to back propogate to create the gradients. 
    # This saves memory space and keeps the kernel alive.
    optimizer_mean.zero_grad()
    optimizer_variance.zero_grad()
    optimizer_head.zero_grad()
    optimizer_embedding.zero_grad()

## Optimizing

The optimization of the model comes from adjusting the following 4 parameters:
* **learning rate**: how fast the weights change as it goes backwards along the gradient in the mlp  
* **files**: changes how many playlists are loaded in  
* **training set**: how many traning sets the model runs through  
* **epochs**: how many times the whole process is repeated  

To handle training, we have the two parameters epochs and training_sets. Epochs determines how many times the full model is run, and the training_sets determines how many of the 10 training sets it runs through per epoch. 

In [18]:
# currently, if either of these are set to one, the loss.backwards() function gives an error
training_sets = 1 
epochs = 2

print('Epochs: {}, Sets: {}'.format(epochs, training_sets))
print()

for epoch in range(epochs):
    print('epoch ' + str(epoch))
    for i in range(1 , training_sets + 1):
        train(emb_play, i)
        print('set {},'.format(i), end=' ')
    print()

Epochs: 2, Sets: 1

epoch 0
set 1, 
epoch 1
set 1, 


Unfortunately, our model currently relies on the "retain_grad" flag on the loss.backward() function. We believe this is causing errors, but we can't find a way around it. Because of this, the songs recommended come out a little funky, as explained in the next section.

## Validation

This funciton shows the top 15 songs that the model recommends. It takes in the index of the plalist (index 25 is a country playlist). It shows what each training set had as it's training target and then the actual songs it recommended as well as its confidence.

In [19]:
def show_recommendation(playlist_index):
    top_fifteen = get_recommendations(playlist_index, emb_play).softmax(dim=0).topk(15)

    recommend = [[tracks[song], ] for song in top_fifteen.indices]
    probability = [prop for prop in top_fifteen.values]
    print('Training target 1: ' + tracks[playlists_train1[playlist_index].item()] + '  ')
    print('Training target 2: ' + tracks[playlists_train2[playlist_index].item()] + '  ')
    print('Training target 3: ' + tracks[playlists_train3[playlist_index].item()] + '  ')
    print('Training target 4: ' + tracks[playlists_train4[playlist_index].item()] + '  ')
    print('Training target 5: ' + tracks[playlists_train5[playlist_index].item()] + '  ')
    print('Training target 6: ' + tracks[playlists_train6[playlist_index].item()] + '  ')
    print('Training target 7: ' + tracks[playlists_train7[playlist_index].item()] + '  ')
    print('Training target 8: ' + tracks[playlists_train8[playlist_index].item()] + '  ')
    print('Training target 9: ' + tracks[playlists_train9[playlist_index].item()] + '  ')
    print('Training target 10: ' + tracks[playlists_train10[playlist_index].item()] + '  ')
    print()
    for song, prob in zip(recommend, probability):
        print('{:.6f}, {}  '.format(prob.item(), song[0]))

show_recommendation(0)

Training target 1: Soak Up The Sun - Sheryl Crow  
Training target 2: Where Is The Love? - The Black Eyed Peas  
Training target 3: Stacy's Mom - Bowling For Soup  
Training target 4: Just The Girl - The Click Five  
Training target 5: Yo (Excuse Me Miss) - Chris Brown  
Training target 6: Year 3000 - Jonas Brothers  
Training target 7: Lip Gloss - Lil Mama  
Training target 8: Everytime We Touch - Radio Edit - Cascada  
Training target 9: Whatcha Say - Jason Derulo  
Training target 10: Miss Independent - Ne-Yo  

0.000114, Overcome - As I Lay Dying  
0.000113, Young & Gettin' It - feat. Kirko Bangz - Meek Mill  
0.000113, Bang Bang Bang - Russ Chimes Remix - Mark Ronson  
0.000112, All Over The Road - Easton Corbin  
0.000112, Untitled - Matt Corby  
0.000108, Two Step - Mel Waiters  
0.000107, Blame It - Jamie Foxx  
0.000106, When I See Ya (feat. Fetty Wap) - Ty Dolla $ign  
0.000106, When The Saints Go To Worship (feat. Kelly Price) - Donald Lawrence  
0.000106, Astral Weeks - Van

To validate the model, we analyzed the number of country songs were recommended (generally most distinguishable genre).

Here is the country playlist we found in position 25.

In [20]:
def get_playlist(num):
    raw_playlist = playlist_list[num][:]
    for i, song in enumerate(raw_playlist):
        raw_playlist[i] = tracks[song]
    return raw_playlist

In [21]:
get_playlist(25)

['Drink A Beer - Luke Bryan',
 'Crash My Party - Luke Bryan',
 'Country Girl (Shake It For Me) - Luke Bryan',
 'Drunk On You - Luke Bryan',
 "That's My Kind Of Night - Luke Bryan",
 "If Heaven Wasn't So Far Away - Justin Moore",
 'Cowboys and Angels - Dustin Lynch',
 "Where It's At - Dustin Lynch",
 'Little Toy Guns - Carrie Underwood',
 'Two Black Cadillacs - Carrie Underwood',
 'Good Girl - Carrie Underwood',
 'Blown Away - Carrie Underwood',
 'I Drive Your Truck - Lee Brice',
 'Drinking class - Lee Brice',
 "I Don't Dance - Lee Brice",
 "Beachin' - Jake Owen",
 'Barefoot Blue Jean Night - Jake Owen',
 'Cop Car - Keith Urban',
 'Somewhere In My Car - Keith Urban',
 'Six Foot Town - Big & Rich']

We found example playlists for three popular genres to use in validation

**Example playlist(25) - country**

Training target 1: Caught Up In The Moment - Big & Rich  
Training target 2: 8th Of November - Album Version w/o Intro - Big & Rich  
Training target 3: Save A Horse (Ride A Cowboy) - Big & Rich  
Training target 4: Rollin' (The Ballad Of Big & Rich) - The Ballad Of Big & Rich Album Version - Big & Rich  
Training target 5: Holy Water - Big & Rich  
Training target 6: Something in the Water - Carrie Underwood  
Training target 7: Cowboy Casanova - Carrie Underwood  
Training target 8: One Way Ticket - Carrie Underwood  
Training target 9: Thank God For Hometowns - Carrie Underwood  
Training target 10: Girl In A Country Song - Maddie & Tae  

**Example playlist(7) - rap/hip-hop**

Training target 1: I Don't Fuck With You - Big Sean  
Training target 2: Drowning (feat. Kodak Black) - A Boogie Wit da Hoodie  
Training target 3: goosebumps - Travis Scott  
Training target 4: STFU - mansionz  
Training target 5: Exposed - Russ  
Training target 6: Slippery (feat. Gucci Mane) - Migos  
Training target 7: m.A.A.d city - Kendrick Lamar  
Training target 8: Yellow - Aminé  
Training target 9: Juke Jam (feat. Justin Bieber & Towkio) - Chance The Rapper  
Training target 10: Bodak Yellow - Cardi B

**Example playlist(83) - Christmas**

Training target 1: Run Rudolph Run - Single Version - Chuck Berry  
Training target 2: Santa Claus Is Comin' to Town - Single Version - Bruce Springsteen  
Training target 3: Linus And Lucy - Vince Guaraldi Trio  
Training target 4: Christmas Time Is Here - Vocal - Vince Guaraldi Trio  
Training target 5: Do They Know It's Christmas? - 1984 Version - Band Aid  
Training target 6: All I Want for Christmas Is You - Mariah Carey  
Training target 7: Santa Claus Go Straight To The Ghetto - James Brown  
Training target 8: Joy To The World - Sufjan Stevens  
Training target 9: Little Drummer Boy/Silent Night/Auld Lang Syne - Extended Version - Jimi Hendrix  
Training target 10: Christmas All Over Again - Tom Petty and the Heartbreakers

Unfortunatley, the model appears to be broken. When it trains, it gains more and more confidence in its top 15 songs - for every playlist. That is to say, the longer it trains, the less it distinguishes between playlist and the more confidence it has in the same 15 songs. The reason for this behavior is unknown, but we assume it has to do with the loss.backward() function while we train. We currently have to use retain_graph=True because it gives errors otherwise, and we assume if we could get rid of this flag the model would work a lot better.

Because of this, even though we have a way to validate the model, we haven't used this method much because the output is so obviously wrong. If it gave a different set of songs for each playlist, it would be easier to check the recommendations for these three albums and see if they match up to the genre. 

## Improvements

We had an earlier version of the model that concatenated the mean, variance, and target song in order to train rather than going through mean and variance sequentially. We moved away from this architecture once we realized that gradient information was not retained when we did this because concatenating tensors loses the gradient information in them. So our first improvement was to change models.

When we changed to the new model, we initially fed it a song that was meant to go into the playlist to help it along in training, and when we did this, we tried to optimize by changing four parameters - amount of data loaded in, learning rate, training sets used, and epochs.

Eventually, it hit us that the end result of the model can't accept a song that is supposed to go into the playlist because the whole point is to recommend a song using only the playlist itself. Once we made this adjustement, the model started acting funny. It recommended the same sets of songs for every playlist. Unfortunately, we could not figure out where this quick came from, and as a result, we did not have a meaningful way to optimize it.

If we would ever get the model working, we definitely could load in more datasets to train on.

## Alternative Methods

There are many other ways to set up a sequential model. We could have set up the mlp to calculate the output directly instead of calculating the change in its input. We also could have added more or less layers. We took into account both mean and variance, but it could be viable to use only one.  

We also could have used pytorch's model of an LSTM to make our model, but the documentation online turned out to be sorely confusing and unhelpful. For someone who has experience using the pytorch LSTM, this could possibly be a better way to run our architecture.

## Conclusion

Obviously, our model does not perform the way we intended it to. It recommends similar songs for every playlist, and the longer it trains, the more it locks in its favorite songs no matter what the playlist is.  

In the future, greater research on how to loop through backpropogation without remembering the gradiant graph would probably be necessary to make the model work. In the 90 minutes of research the team did, we could not find the answer to this question. Since we have found no other flaw in the architecture, our conclusion is that this problem is the cause of our unhelpful recommendations.

To conclude, this project is open for future development and research.