# Evaluation of Collaborative vs. Content-Based Method

It is now time to evaluate both methods against each other. Here, we will use a test set, which I separated before training, which is a part of the dataset I used for training.

First, we need to load the playlist-song matrix created in the [collaborative filtering notebook](collaborative_filtering.ipynb), and the features we extracted in the [content-based recommendation system notebook](content_based_recsys.ipynb), which is essentially a matrix too.

In [1]:
import numpy as np
import pandas as pd

from sklearn.metrics.pairwise import cosine_similarity
import random

In [72]:
# define file paths
content_matrix_file = 'data/complete_feature.csv'  # all features for train set file

# all these files are created with notebook collaborative_filtering.ipynb
collaborative_matrix_file = 'data/matrix.npy'  # modified playlist-song vector 
playlists_file = 'data/playlists.txt'  # all playlists for collaborative matrix row header
unique_songs_uris_file = 'data/unique_songs_uris.txt'  # all uris for collaborative matrix column header

# load the files into data structures
with open(collaborative_matrix_file, 'rb') as f:
    coll_matrix = np.load(f)

with open(content_matrix_file, 'rb') as f:
    cont_matrix = pd.read_csv(f)

with open(playlists_file, 'r', encoding='utf-8') as f:
    playlists = f.read().splitlines()

with open(unique_songs_uris_file, 'r') as f:
    unique_songs_uris = f.read().splitlines()

Now we need to load our test data set. The following cells illustrate the data.

In [3]:
df = pd.read_csv("data/sorted_processed_data_test.csv")
df.head()

Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,pos,artist_name,track_uri,artist_uri,track_name,album_uri,duration_ms_x,...,type,id,uri,track_href,analysis_url,duration_ms_y,time_signature,artist_pop,genres,track_pop
0,13103,13103,10884,4,Migos,4Km5HrUvYTaSUfiSGPJeQR,spotify:artist:6oMuImdp5ZcFhWP0ESe6mG,Bad and Boujee (feat. Lil Uzi Vert),spotify:album:2AvupjUeMnSffKEV05x222,343150,...,audio_features,4Km5HrUvYTaSUfiSGPJeQR,spotify:track:4Km5HrUvYTaSUfiSGPJeQR,https://api.spotify.com/v1/tracks/4Km5HrUvYTaS...,https://api.spotify.com/v1/audio-analysis/4Km5...,343150,4,82,atl_hip_hop pop_rap rap trap,76
1,9173,9173,10920,40,Snoop Dogg,2NBQmPrOEEjA8VbeWOQGxO,spotify:artist:7hJcb9fa4alzcOq3EaNPoG,Drop It Like It's Hot,spotify:album:797fkvAtk0iZvP1HHPCWbp,266066,...,audio_features,2NBQmPrOEEjA8VbeWOQGxO,spotify:track:2NBQmPrOEEjA8VbeWOQGxO,https://api.spotify.com/v1/tracks/2NBQmPrOEEjA...,https://api.spotify.com/v1/audio-analysis/2NBQ...,266067,4,85,g_funk gangster_rap hip_hop pop_rap rap west_c...,69
2,13686,13686,10922,42,Dr. Dre,6ltPEsP4edATzvinHOzvk2,spotify:artist:6DPYiyq5kWVQS4RGwxzPC7,Still D.R.E.,spotify:album:5csXMdS69VOvh8MjyfwkjB,268004,...,audio_features,6ltPEsP4edATzvinHOzvk2,spotify:track:6ltPEsP4edATzvinHOzvk2,https://api.spotify.com/v1/tracks/6ltPEsP4edAT...,https://api.spotify.com/v1/audio-analysis/6ltP...,268005,4,81,g_funk gangster_rap hip_hop rap west_coast_rap,0
3,13475,13475,10912,32,Chris Brown,0LWQWOFoz5GJLqcHk1fRO2,spotify:artist:7bXgB6jMjp9ATFy66eO08Z,Look At Me Now,spotify:album:6rEm9wAyZP79RFa2qW2bf7,222586,...,audio_features,0LWQWOFoz5GJLqcHk1fRO2,spotify:track:0LWQWOFoz5GJLqcHk1fRO2,https://api.spotify.com/v1/tracks/0LWQWOFoz5GJ...,https://api.spotify.com/v1/audio-analysis/0LWQ...,222587,4,90,dance_pop pop pop_rap r&b rap,0
4,8962,8962,10889,9,Drake,27GmP9AWRs744SzKcpJsTZ,spotify:artist:3TVXtAsR1Inumwj472S9r4,Jumpman,spotify:album:1ozpmkWcCHwsQ4QTnxOOdT,205879,...,audio_features,27GmP9AWRs744SzKcpJsTZ,spotify:track:27GmP9AWRs744SzKcpJsTZ,https://api.spotify.com/v1/tracks/27GmP9AWRs74...,https://api.spotify.com/v1/audio-analysis/27Gm...,205879,4,98,canadian_hip_hop canadian_pop hip_hop rap toro...,75


In [4]:
num_playlists = df["name"].nunique()  # count distinct values, this is the number of playlists
num_tracks = df["track_name"].nunique()  # count distinct values, this is the number of tracks
print(f"Playlists: {num_playlists} \nTracks: {num_tracks}")

Playlists: 36 
Tracks: 2427


In [5]:
playlists_test = df.groupby('name')["track_name"].apply(list)
playlists_test.head()

name
volleyball playlist     [Bad and Boujee (feat. Lil Uzi Vert), Drop It ...
w o r k o u t           [Remember The Name (feat. Styles Of Beyond), N...
wake up                 [Young Dumb & Broke, Clouds, D.A.N.C.E., Rearv...
whatever                [The Wild Life, Come On! Feel the Illinoise! P...
will                    [Good Times Roll, Fortunate Son, Fly Like An E...
Name: track_name, dtype: object

## Evaluation Method

We will evaluate our approaches based on the R-Precision metric. This metric is also used as an evaluation metric for the official Million Playlist Dataset. It describes the number of relevant retrieved tracks, divided by the number of relevant tracks. In our case, this is the percentage of recommended songs, which are in the test set.

# Collaborative Filtering

We have the matrix we created with our training data. Now we want to see what this matrix computes for a new playlist vector.

In [71]:
test_per = 0.2
num_tracks = len(unique_songs_uris)  # number of unique songs is length of row in matrix
score = 0

for playlist_name in list(playlists_test.index):  # loop over all playlists in test set
    pl = df[df['name'] == playlist_name]
    
    # select random tracks to withhold for testing
    n = int(len(pl) * test_per)
    test_data = random.sample(list(pl.index), k=n)
    test_df = pl.drop([i for i in list(pl.index) if i not in test_data])
    train_df = pl.drop([i for i in list(pl.index) if i in test_data])
    
    # create playlist vector
    playlist_array = np.zeros(num_tracks)  # this array represents the new playlist
    for song_id in train_df["id"]:  # set array 1 if song in this playlist
        try:
            playlist_array[unique_songs_uris.index(song_id)] = 1  # set array to 1 at index of the song
        except ValueError:
            continue  # this song has not been seen yet, it can not be added to vector
    
    # compute cosine similarity to all other playlists
    pl_sim = cosine_similarity(coll_matrix, playlist_array.reshape(1, -1))
    
    # find the most similar songs according to most similar playlist
    diff_vals = coll_matrix[pl_sim.argmax(), :] - 1  # subtract 1 from each value to get difference to 1
    smallest_n_indexes = np.argpartition(diff_vals, n)  # get list of indexes ordered according to descending order of values

    i = 0  # record how many songs were output
    recommendations = list()
    for idx in np.flip(smallest_n_indexes):  # look at which song corresponds to found indexes
        if playlist_array[idx] == 0:  # song was not in playlist already
            recommendations.append(unique_songs_uris[idx])  # use index to recover uri of song
        i += 1
        if i == n:
            break
            
    # compute R-precision
    retrieved_rel = set(recommendations).intersection(test_df['id'])  # how many songs retrieved were relevant
    score += len(retrieved_rel)/n  # sum score and divide afterwards

score = score/len(list(playlists_test.index))  # divide by number of playlists
score    

0.0024509803921568627

As we can see, the score is almost 0, with 0.2% R-Precision. This low score can be tracked back to many of our limitations. First of all, we do not own the data Spotify has, which is a matrix with playcounts for each song in a playlist, instead of the binary matrix we used. Also, the evaluation method does not fully represent the reality of users, as users might have liked those songs as well, either though they are not in their playlist yet.

# Content Based Recommendation

Get all the data created with the notebook [content_based_recsys](content_based_recsys.ipynb) and import functions from there.

In [16]:
from scripts.data_handling import generate_playlist_feature
from scripts.data_handling import generate_playlist_recos

song_df = pd.read_csv("data/allsong_data_test.csv")
complete_feature_set = pd.read_csv("data/complete_feature_test.csv")  # all the features for the allsong_data_test set

Now we go through every test playlist and withhold 20% of the playlist songs, in order to test the recommendation. After making as many recommendations as withheld songs, we compute the percentage of songs that were recommended "right". In this case a song was retrieved right, if it was in the top n recommendations and in the withheld test set. We average the so called R-Precision over all the playlists, which results in our final score.

In [38]:
test_per = 0.2
score = 0

for playlist_name in list(playlists_test.index):  # loop over all playlists in test set
    pl_df = df[df['name'] == playlist_name]  # get the playlist by its name
    
    # select random tracks to withhold for testing
    n = int(len(pl_df) * test_per)
    test_data = random.sample(list(pl_df.index), k=n)
    test_df = pl_df.drop([i for i in list(pl_df.index) if i not in test_data])
    train_df = pl_df.drop([i for i in list(pl_df.index) if i in test_data])
    
    # generate the features
    complete_feature_set_playlist_vector, complete_feature_set_nonplaylist = generate_playlist_feature(complete_feature_set, train_df)
    
    # Generate top n recommendation
    recommend = generate_playlist_recos(song_df, complete_feature_set_playlist_vector, complete_feature_set_nonplaylist)
    recommendations = recommend.head(n)  
    
    # compute R-precision
    retrieved_rel = set(recommendations['id']).intersection(test_df['id'])  # how many songs retrieved were relevant
    score += len(retrieved_rel)/n  # sum score and divide afterwards

score = score/len(list(playlists_test.index))  # divide by number of playlists
score

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis=1).values,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis=1).values,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] 

0.16908674392337195

We can see a high increase in the R-Precision score, which is now 16.9%. This recommendation was possible without any knowledge of user data and solely based on the characteristics of the song itself. Again, our evaluation method is limited in knowing what a "right" recommendation is. Still, we were able to perform a recommendation that statistically recommends a suitable song after 6-7 songs. Now you can imagine, how good this performance will be with actual user data, like favorite songs, skipped songs, listening time, ... This makes the recommendations almost perfect, but also very complex!