# Evaluation of Collaborative vs. Content-Based Method

It is now time to evaluate both methods against each other. Here, we will use a test set, which I separated before training, which is a part of the dataset I used for training.

First, we need to load the playlist-song matrix created in the [collaborative filtering notebook](collaborative_filtering.ipynb), and the features we extracted in the [content-based recommendation system notebook](content_based_recsys.ipynb), which is essentially a matrix too.

In [9]:
import numpy as np
import pandas as pd

from sklearn.metrics.pairwise import cosine_similarity
import random

In [10]:
# define file paths
# all these files are created with notebook collaborative_filtering.ipynb
collaborative_matrix_file = 'data/matrix.npy'  # modified playlist-song vector 
playlists_file = 'data/playlists.txt'  # all playlists for collaborative matrix row header
unique_songs_uris_file = 'data/unique_songs_uris.txt'  # all uris for collaborative matrix column header

# load the files into data structures
with open(collaborative_matrix_file, 'rb') as f:
    coll_matrix = np.load(f)

with open(playlists_file, 'r', encoding='utf-8') as f:
    playlists = f.read().splitlines()

with open(unique_songs_uris_file, 'r') as f:
    unique_songs_uris = f.read().splitlines()

Now we need to load our test data set. The following cells illustrate the data.

In [11]:
df = pd.read_csv("data/processed_data_test.csv")
df.head()

Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,pos,artist_name,track_uri,artist_uri,track_name,album_uri,duration_ms_x,...,type,id,uri,track_href,analysis_url,duration_ms_y,time_signature,artist_pop,genres,track_pop
0,66245,66245,63853,52,Bea Miller,3UerZr7GF7qO2hQf6FwEbz,spotify:artist:1o2NpYGqHiCq7FoiYdyd1x,Young Blood,spotify:album:1ukpF3eKewIjkKGpu70sKm,220706,...,audio_features,3UerZr7GF7qO2hQf6FwEbz,spotify:track:3UerZr7GF7qO2hQf6FwEbz,https://api.spotify.com/v1/tracks/3UerZr7GF7qO...,https://api.spotify.com/v1/audio-analysis/3Uer...,220707,4,76,alt_z dance_pop electropop indie_poptimism mod...,45
1,66244,66244,63852,51,Mike Stud,2S1IGardidwgCnx3DINcc4,spotify:artist:5G9kmDLg3OeUyj8KVBLzbu,Closer,spotify:album:0seBYRYRZtSjEG7Pr2enf9,219413,...,audio_features,2S1IGardidwgCnx3DINcc4,spotify:track:2S1IGardidwgCnx3DINcc4,https://api.spotify.com/v1/tracks/2S1IGardidwg...,https://api.spotify.com/v1/audio-analysis/2S1I...,219413,4,72,indie_pop_rap pop_rap rap rhode_island_rap,35
2,51826,51826,63810,9,Michael Bublé,3I09LQbHS3NSU46Ly3tPpR,spotify:artist:1GxkXlMwML1oSg5eLPiAz3,Feeling Good,spotify:album:2koUTBXkwUt2uJYv0uezHx,237333,...,audio_features,3I09LQbHS3NSU46Ly3tPpR,spotify:track:3I09LQbHS3NSU46Ly3tPpR,https://api.spotify.com/v1/tracks/3I09LQbHS3NS...,https://api.spotify.com/v1/audio-analysis/3I09...,237333,3,90,adult_standards canadian_pop jazz_pop lounge,70
3,66243,66243,63834,33,Maisey Rika,5xJrAqkBlOyXzNe5QMMT1K,spotify:artist:6YNeVWrhrAjUEkDWn9TRrl,Sink or Swim,spotify:album:46fOEA7AWiS8slSH8JGISV,196600,...,audio_features,5xJrAqkBlOyXzNe5QMMT1K,spotify:track:5xJrAqkBlOyXzNe5QMMT1K,https://api.spotify.com/v1/tracks/5xJrAqkBlOyX...,https://api.spotify.com/v1/audio-analysis/5xJr...,196600,4,45,nz_folk pacific_islands_pop waiata_maori waiat...,37
4,51476,51476,63851,50,Matt Nathanson,63e7gdWf1DtM6AVifeEzO9,spotify:artist:4NGiEU3Pkd8ASRyQR30jcA,Headphones,spotify:album:4kgP3zp264iAOJoRumNyFb,208600,...,audio_features,63e7gdWf1DtM6AVifeEzO9,spotify:track:63e7gdWf1DtM6AVifeEzO9,https://api.spotify.com/v1/tracks/63e7gdWf1DtM...,https://api.spotify.com/v1/audio-analysis/63e7...,208600,4,60,acoustic_pop indiecoustica neo_mellow pop_rock,42


In [12]:
num_playlists = df["name"].nunique()  # count distinct values, this is the number of playlists
num_tracks = df["track_name"].nunique()  # count distinct values, this is the number of tracks
print(f"Playlists: {num_playlists} \nTracks: {num_tracks}")

Playlists: 90 
Tracks: 5448


In [13]:
playlists_test = df.groupby('name')["track_name"].apply(list)
playlists_test.head()

name
#chill    [Young Blood, Closer, Feeling Good, Sink or Sw...
2k17      [No Flockin, Go Flex, Digits, March Madness, A...
2pac      [Whatz Ya Phone #, All Eyez On Me, Don't You T...
80s       [Who Can It Be Now?, Nothing's Gonna Stop Us N...
90s       [Home, Only Wanna Be With You, All The Small T...
Name: track_name, dtype: object

## Evaluation Method

We will evaluate our approaches based on the R-Precision metric. This metric is also used as an evaluation metric for the official Million Playlist Dataset. It describes the number of relevant retrieved tracks, divided by the number of relevant tracks. In our case, this is the percentage of recommended songs, which are in the test set.

# Collaborative Filtering

We have the matrix we created with our training data. Now we want to see what this matrix computes for a new playlist vector.

In [14]:
test_per = 0.2
num_tracks = len(unique_songs_uris)  # number of unique songs is length of row in matrix
score = 0

for playlist_name in list(playlists_test.index):  # loop over all playlists in test set
    pl = df[df['name'] == playlist_name]
    
    # select random tracks to withhold for testing
    n = int(len(pl) * test_per)
    test_data = random.sample(list(pl.index), k=n)
    test_df = pl.drop([i for i in list(pl.index) if i not in test_data])
    train_df = pl.drop([i for i in list(pl.index) if i in test_data])
    
    # create playlist vector
    playlist_array = np.zeros(num_tracks)  # this array represents the new playlist
    for song_id in train_df["id"]:  # set array 1 if song in this playlist
        try:
            playlist_array[unique_songs_uris.index(song_id)] = 1  # set array to 1 at index of the song
        except ValueError:
            continue  # this song has not been seen yet, it can not be added to vector
    
    # compute cosine similarity to all other playlists
    pl_sim = cosine_similarity(coll_matrix, playlist_array.reshape(1, -1))
    
    # find the most similar songs according to most similar playlist
    diff_vals = coll_matrix[pl_sim.argmax(), :] - 1  # subtract 1 from each value to get difference to 1
    smallest_n_indexes = np.argpartition(diff_vals, n)  # get list of indexes ordered according to descending order of values

    i = 0  # record how many songs were output
    recommendations = list()
    for idx in np.flip(smallest_n_indexes):  # look at which song corresponds to found indexes
        if playlist_array[idx] == 0:  # song was not in playlist already
            recommendations.append(unique_songs_uris[idx])  # use index to recover uri of song
        i += 1
        if i == n:
            break
            
    # compute R-precision
    retrieved_rel = set(recommendations).intersection(test_df['id'])  # how many songs retrieved were relevant
    score += len(retrieved_rel)/n  # sum score and divide afterwards

score = score/len(list(playlists_test.index))  # divide by number of playlists
score    

0.0004938271604938272

As we can see, the score is almost 0. This low score can be tracked back to many of our limitations. First of all, we do not own the data Spotify has, which is a matrix with play counts for each song in a playlist, instead of the binary matrix we used. Also, the evaluation method does not fully represent the reality of users, as users might have liked those songs as well, either though they did not add them to their playlist yet.

# Content Based Recommendation

Get all the data created with the notebook [content_based_recsys](content_based_recsys.ipynb) and import functions from there.

In [15]:
from scripts.data_handling import generate_playlist_feature
from scripts.data_handling import generate_playlist_recos

song_df = pd.read_csv("data/allsong_data_test.csv")
complete_feature_set = pd.read_csv("data/complete_feature_test.csv")  # all the features for the allsong_data_test set

Now we go through every test playlist and withhold 20% of the playlist songs, in order to test the recommendation. After making as many recommendations as withheld songs, we compute the percentage of songs that were recommended "right". In this case a song was retrieved right, if it was in the top n recommendations and in the withheld test set. We average the so called R-Precision over all the playlists, which results in our final score.

In [16]:
test_per = 0.2
score = 0

for playlist_name in list(playlists_test.index):  # loop over all playlists in test set
    pl_df = df[df['name'] == playlist_name]  # get the playlist by its name
    
    # select random tracks to withhold for testing
    n = int(len(pl_df) * test_per)
    test_data = random.sample(list(pl_df.index), k=n)
    test_df = pl_df.drop([i for i in list(pl_df.index) if i not in test_data])
    train_df = pl_df.drop([i for i in list(pl_df.index) if i in test_data])
    
    # generate the features
    complete_feature_set_playlist_vector, complete_feature_set_nonplaylist = generate_playlist_feature(complete_feature_set, train_df)
    
    # Generate top n recommendation
    recommend = generate_playlist_recos(song_df, complete_feature_set_playlist_vector, complete_feature_set_nonplaylist)
    recommendations = recommend.head(n)  
    
    # compute R-precision
    retrieved_rel = set(recommendations['id']).intersection(test_df['id'])  # how many songs retrieved were relevant
    score += len(retrieved_rel)/n  # sum score and divide afterwards

score = score/len(list(playlists_test.index))  # divide by number of playlists
score

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis=1).values,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] = cosine_similarity(nonplaylist_features.drop('id', axis=1).values,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_playlist_df['sim'] 

0.13254412392025872

We can see a high increase in the R-Precision score, which is now 13.25%. This recommendation was possible without any knowledge of user data and solely based on the characteristics of the song itself. Again, our evaluation method is limited in knowing what a "right" recommendation is. Still, we were able to perform a recommendation that statistically recommends a suitable song after 9-10 songs. Now you can imagine, how good this performance will be with actual user data, like favorite songs, skipped songs, listening time, ... This makes the recommendations way better, but also very complex!