# SoundCloud - Bonus Question

For this question, I will be using Python to generate playlist recommendations for SoundCloud users based on their playlist history. I'll attempt this by finding similarities between available playlists, and then users.

In [1]:
import pandas as pd
from scipy.spatial.distance import cosine

First, I populated a table with hypothetical data for player_id, timestamp, user_id, track_id, and listening_duration.
Then I imported my table into Jupyter Notebook.

In [2]:
plays = pd.read_csv("../asset/plays.csv")

In [3]:
plays.head()

Unnamed: 0,Playlist_ID,timestamp,user_id,track_id,listening_duration
0,NM,11:54,regina,falz,750
1,BP,7:30,qi,TLC,780
2,NI,9:45,marit,korede,420
3,QB,3:35,tenele,rihanna,677
4,KJ,4:40,ebehi,kari,500


I'm going to need just the playlist_Id and the user_id.

In [4]:
userPlay = plays[["Playlist_ID", "user_id"]].copy()

Now I want to make the playlist_id dummy variables. 

In [5]:
dummies = pd.get_dummies(userPlay["Playlist_ID"])
userPlay2 = userPlay[["user_id"]].join(dummies)
userPlay2.head()

Unnamed: 0,user_id,BP,FJ,KJ,NG,NI,NM,QB
0,regina,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,qi,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,marit,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,tenele,0.0,0.0,0.0,0.0,0.0,0.0,1.0
4,ebehi,0.0,0.0,1.0,0.0,0.0,0.0,0.0


For each user_id, 1 indicated that the user listened to a particular playlist and 0 indicates that the user hasn't listened to the playlist.

Next, I want to group the table by user_id.

In [6]:
aggMap = { 
           "BP" : "max", "FJ" : "max", 
           "KJ" : "max", "NG" : "max",
           "NI" : "max", "NM" : "max",
           "QB" : "max" 
         }

userPlay3 = userPlay2.groupby(["user_id"]).agg(aggMap).reset_index()

Before creating a list of recommendations for each user, I would need to find which playlists are similar to one another based on which playlists have previously been played. To do that, I'll have to drop the user_id columns and create an empty data frame for my results.

In [7]:
userPlay4 = userPlay3.drop("user_id", 1)
userPlay5 = pd.DataFrame(index = userPlay4.columns, columns = userPlay4.columns)

In [8]:
userPlay5

Unnamed: 0,NI,NM,KJ,NG,QB,BP,FJ
NI,,,,,,,
NM,,,,,,,
KJ,,,,,,,
NG,,,,,,,
QB,,,,,,,
BP,,,,,,,
FJ,,,,,,,


I'll be using cosine similarities to find my recommendations.

In [9]:
for i in range(0, len(userPlay4.columns)):
    for j in range(0, len(userPlay4.columns)):
        userPlay5.ix[i,j] = 1 - cosine(userPlay4.ix[:,i], userPlay4.ix[:,j])

In [10]:
simPlaylist = pd.DataFrame(index = userPlay5.columns, columns = range(1,8))

for i in range(0, len(userPlay4.columns)):
    simPlaylist.ix[i,:8] = userPlay5.ix[0:,i].sort_values(ascending=False)[:].index

In [11]:
simPlaylist

Unnamed: 0,1,2,3,4,5,6,7
NI,NI,QB,KJ,NG,NM,FJ,BP
NM,NM,NG,BP,FJ,QB,NI,KJ
KJ,KJ,NI,QB,BP,NM,FJ,NG
NG,NG,NM,FJ,BP,QB,NI,KJ
QB,QB,NI,NG,NM,FJ,KJ,BP
BP,BP,NM,FJ,NG,KJ,QB,NI
FJ,FJ,NG,BP,NM,QB,NI,KJ


The table above highlights the playlists as well as other playlists similar to them, in descending order from left to right. For the playlist named NI, QB is the most similar, followed by KJ, and so on.

Now, I want to incorporate these similarities between playlists with the users to create recommendations for each user. I'll start off by creating an empty dataframe for my similar playlists.

In [12]:
similarPlaylist = pd.DataFrame(index = userPlay3.index, columns = userPlay3.columns)
similarPlaylist.ix[:,:1] = userPlay3.ix[:,:1] # This is to include the user_id in my data frame
similarPlaylist

Unnamed: 0,user_id,NI,NM,KJ,NG,QB,BP,FJ
0,amy,,,,,,,
1,ebehi,,,,,,,
2,jason,,,,,,,
3,kevin,,,,,,,
4,kobe,,,,,,,
5,marit,,,,,,,
6,qi,,,,,,,
7,regina,,,,,,,
8,saket,,,,,,,
9,tenele,,,,,,,


In [13]:
# for i in range(1,len(similarPlaylist.columns)):
#     print i

similarPlaylist.ix[:,0]

0         amy
1       ebehi
2       jason
3       kevin
4        kobe
5       marit
6          qi
7      regina
8       saket
9      tenele
10    yoonhee
Name: user_id, dtype: object

In [14]:
def similar(a,b):
    return sum(a*b)/sum(b)

for i in range(0, len(similarPlaylist.index)):
    for j in range(1, len(similarPlaylist.columns)):
        user = similarPlaylist.index[i] # To select the users
        playlist = similarPlaylist.columns[j] # To select the playlist
        
        if userPlay3.ix[i][j] == 1:
            similarPlaylist.ix[i][j] = 0 # To eliminate playlists that each user has heard before
        
        else: 
            topPlaylist = simPlaylist.ix[playlist][1:7] # Selecting the top matches for each playlist
            topSimilarities = userPlay5.ix[playlist].sort_values(ascending=False)[1:7]
            # This line above is for sorting the cosine similarities for each playlist
            user_playlist = userPlay4.ix[user, topPlaylist]
            
            similarPlaylist.ix[i][j] = similar(user_playlist, topSimilarities)

In [15]:
similarPlaylist.head() 

Unnamed: 0,user_id,NI,NM,KJ,NG,QB,BP,FJ
0,amy,0.151916,0.0,0.343462,0.453971,0.181428,0.0,0.427376
1,ebehi,0.0,0.208495,0.0,0.140485,0.498577,0.131881,0.12948
2,jason,0.303832,0.0,0.157319,0.0,0.362855,0.604356,0.489388
3,kevin,0.41986,0.0,0.343462,0.0,0.501423,0.0,0.0
4,kobe,0.0,0.0,0.813857,0.0,0.0,0.868119,0.0


In [16]:
# Creating an empty data frame for the top recommendations
recommendation = pd.DataFrame(index=similarPlaylist.index, columns=['user','1','2','3','4','5'])
recommendation.ix[0:,0] = similarPlaylist.ix[:,0]
recommendation

Unnamed: 0,user,1,2,3,4,5
0,amy,,,,,
1,ebehi,,,,,
2,jason,,,,,
3,kevin,,,,,
4,kobe,,,,,
5,marit,,,,,
6,qi,,,,,
7,regina,,,,,
8,saket,,,,,
9,tenele,,,,,


In [17]:
# Inserting the top recommended playlists into the empty data frame
for i in range(0, len(similarPlaylist.index)):
    recommendation.ix[i,1:] = similarPlaylist.ix[i,:].sort_values(ascending=False).ix[1:6,].index.transpose()

Now we have a data frame that higlights top recommendations for each user.

In [18]:
recommendation

Unnamed: 0,user,1,2,3,4,5
0,amy,NG,FJ,KJ,QB,NI
1,ebehi,QB,NM,NG,BP,FJ
2,jason,BP,FJ,QB,NI,KJ
3,kevin,QB,NI,KJ,FJ,BP
4,kobe,BP,KJ,FJ,QB,NG
5,marit,KJ,FJ,BP,QB,NG
6,qi,NG,FJ,NI,QB,BP
7,regina,FJ,QB,KJ,NI,BP
8,saket,FJ,BP,NM,QB,NI
9,tenele,NM,NG,FJ,BP,QB


We can go further to create a function that returns the top recommendations for a given user.

In [19]:
def youmightlike(x):
    for i in range(0, len(recommendation.index)):
        if recommendation.ix[i][0] == x:
            return recommendation.ix[i,1:]

And Voila! We have playlist recommendations for Kevin!

In [20]:
youmightlike("kevin")

1    QB
2    NI
3    KJ
4    FJ
5    BP
Name: 3, dtype: object