# Goal:
Analyze and synthesize 2 users' Spotify recent taste profiles so that we can create a playlist filled with brand new song recommendations for both of them to discover together. The playlist we curate will automatically end up in both of their spotify libraries. A novel way for people to socialize because the playlist is unique to them and can create a special experience for both to find new music together that matches their music tastes.

In [1]:
# Import the libraries

import os
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format ='retina'
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
import random
from pandas.io.json import json_normalize

## Authorization Flow

In [2]:
# Declare the credentials

cid = '0b2103231ba64a70885c27fbb38cfa97'
secret = '45d7d10e11474298abcb52df745e2b25'
redirect_uri='http://localhost:7777/callback'
username = 'areddy12434'

# Authorization flow
scope = 'user-top-read'
token = util.prompt_for_user_token(username, scope, client_id=cid, client_secret=secret, redirect_uri=redirect_uri)

if token:
    sp = spotipy.Spotify(auth=token)
else:
    print("Can't get token for", username)

# Data Collection
## Extract User's Top Medium Term Songs
Spotipy has a built-in function to fetch users' top 50 songs. We extract our songs with that function in the Save_User1_Top_50_Songs.ipynb and Save_User2_Top_50_Songs.ipynb notebooks, and we import the resulting csv files from there below

In [5]:
user1_songs = pd.read_csv('User1_top_50_songs.csv')
user2_songs = pd.read_csv('User2_top_50_songs.csv')
temp = [user1_songs, user2_songs]
temp = pd.concat(temp)
temp.reset_index(drop=True,inplace=True)

### Extract Users' Top 50 Tracks' Audio Features

In [6]:
user1_list = []
for song in user1_songs['song_uri']:
    row = pd.DataFrame(sp.audio_features(tracks=[song]))
    user1_list.append(row)
user1_df = pd.concat(user1_list)

user2_list = []
for song in user2_songs['song_uri']:
    row = pd.DataFrame(sp.audio_features(tracks=[song]))
    user2_list.append(row)
user2_df = pd.concat(user2_list)

# Combine both users' top 50 songs into one dataframe of 100 songs

dfs = [user1_df, user2_df]
dfs = pd.concat(dfs)

#### Data Cleaning

In [7]:
# Drop unnecessary features

dfs.drop(['type','track_href','analysis_url','time_signature','duration_ms','uri','instrumentalness','liveness','loudness','key','mode'],1,inplace=True)
dfs.set_index('id',inplace=True)

In [8]:
# Normalize tempo feature

columns = ['danceability','energy','speechiness','acousticness','valence','tempo']
scaler = MinMaxScaler()
scaler.fit(dfs[columns])
dfs[columns] = scaler.transform(dfs[columns])

# Building the Song Recommender
## K-Means Cluster Analysis
We used k-means clustering to identify clusters (essentially subgenres) of similar songs in the dataframe that combines all our favorite recent songs together. This would allow us to call the final recommended songs based on the clusters. We chose k=20 since we ideally want 5 songs per cluster to feed into Spotipy's recommendation() function which only takes 5 seed songs. We have tried the elbow method and saw that the ideal k is around 6-8 depending on the users, however, having a larger k will be able curate much more specified subgenres than having a smaller k that groups more songs per cluster.

In [9]:
# Get 20 clusters from 100 songs

clusters = 20
kmeans = KMeans(n_clusters=clusters)
kmeans.fit(dfs)

KMeans(n_clusters=20)

### Cluster Analysis
Now that we have visualized and confirmed the clusters correspond to the songs, we should update our dataframe to label each song with its cluster.

In [10]:
scaler = MinMaxScaler()
scaled = scaler.fit_transform(dfs)
y_kmeans = kmeans.fit_predict(scaled)
y_kmeans

array([ 2,  7, 15,  5,  2,  0, 10, 10, 17, 19, 16,  1,  0, 18,  6, 18,  1,
       14, 10,  9,  8, 14,  5,  8, 15, 10, 15, 11, 10,  9,  2, 11,  1,  0,
       10,  9,  5,  1,  2, 10, 18, 18,  2,  0, 10,  9,  6,  4, 14, 18,  8,
       18, 18, 14,  7,  9, 13,  5,  1, 14,  1, 18,  9,  3,  4, 14,  5, 12,
        6, 17, 14,  1,  1, 18,  6, 17, 13, 15,  8,  4,  3, 13,  4,  4,  5,
       13,  6,  7, 13,  4,  8,  0,  7, 10, 10,  9, 11,  9,  7,  9])

In [11]:
# Updating dataframe with assigned clusters 

dfs['cluster'] = y_kmeans
dfs['artist'] = temp.artist.tolist()
dfs['title'] = temp.song.tolist()

Unnamed: 0_level_0,danceability,energy,speechiness,acousticness,valence,tempo,cluster,artist,title
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
3ZCTVFBt2Brf31RLEnCkWJ,0.673716,0.100381,0.081903,0.92418,0.228867,0.405763,2,Billie Eilish,everything i wanted
4NhDYoQTYCdWHTvlbGVgwo,0.533233,0.688691,0.35827,0.027042,0.397482,0.984351,7,6ix9ine,GOOBA
1kVYXfxTWSftIZtmYr6yH8,0.953172,0.390089,0.266758,0.028066,0.436826,0.603908,15,Andy Mineo,Coming In Hot
39Yp9wwQiSRIDOvrVg7mbk,0.691843,0.496823,0.026996,0.238724,0.270459,0.504474,5,THE SCOTTS,THE SCOTTS
5ehVOwEZ1Q7Ckkdtq0dY1W,0.693353,0.05972,0.020705,0.82172,0.005958,0.306645,2,WYS,Snowman


In [13]:
# Removing clusters that only have one song in them

delete_clusters = []
cluster = 0
while cluster < (len(dfs.cluster.unique())-1):
    if dfs.groupby('cluster').count().loc[cluster].danceability == 1:
        delete_clusters.append(cluster)
    cluster+=1

In [14]:
dfs.reset_index(inplace=True)

In [15]:
i = 0
while i < (len(dfs.cluster.unique())-1):
    if dfs.loc[[i]].cluster.tolist()[0] in delete_clusters:
        dfs.drop(i,0,inplace=True)
    i+=1

In [16]:
dfs.set_index('id',inplace=True)

Now we want to make a nested list so that it contains a list of songs for every cluster. The purpose of this is so that we can find 1-2 recommended songs for each cluster using spotipy's built-in recommendations() function that takes in 5 seed songs.

In [18]:
# Create list of lists of song ids to put into recommendation function

i=0
list_of_recs = [0]*len(dfs.groupby('cluster').count())
while i<len(dfs.groupby('cluster').count()):
    list_of_recs[i] = dfs.loc[dfs['cluster'] == i].index.to_list()
    i+=1

list_of_recs = [ele for ele in list_of_recs if ele != []] 

In [20]:
# Adjust list for clusters so that each cluster has a maximum of 5 seed songs

j = 0
adj_list_of_recs = [0]*len(list_of_recs)
while j<len(list_of_recs):
    if 0 < len(list_of_recs[j]) < 6:
        adj_list_of_recs[j] = list_of_recs[j]
    elif len(list_of_recs[j]) > 5:
        adj_list_of_recs[j] = random.sample(list_of_recs[j], 5)
    j += 1

We want to get 1 recommended song from each cluster with less than 4 songs and 2 recommended songs from each cluster with 4-5 songs. This is because we assume that bigger clusters generally mean that we enjoy songs similar to that cluster more. We give weight to the song recommender to accomodate for this preference towards bigger clusters.

In [21]:
#Getting 1 recommended song from each cluster with less than 4 songs, 2 recommended songs from each cluster with 4-5 songs

k = 0
list_of_recommendations = [0]*len(list_of_recs)
while k < len(list_of_recs):
    if len(adj_list_of_recs[k]) < 4:
        list_of_recommendations[k] = sp.recommendations(seed_tracks=adj_list_of_recs[k],limit=1)
    else:
        list_of_recommendations[k] = sp.recommendations(seed_tracks=adj_list_of_recs[k],limit=2)
    k += 1
    
pd.json_normalize(list_of_recommendations[15], record_path='tracks').id

0    7KSSdFCBHCfq4KPzz78ghk
1    4otQJBpb8okSeykALR3eCH
Name: id, dtype: object

In [22]:
list_of_recommendations_converted = [0]*len(list_of_recs)

l = 0
while l < len(list_of_recs):
    list_of_recommendations_converted.append(pd.json_normalize(list_of_recommendations[l], record_path='tracks').id.tolist())
    l += 1

no_integers = [x for x in list_of_recommendations_converted if not isinstance(x, int)]
list_of_recommendations_converted = [item for elem in no_integers for item in elem]

len(list_of_recommendations_converted)

32

## Create the New Playlist
Next, we are going to create a new playlist and add to it all the tracks we ended up selecting:

In [30]:
# Authorization flow

scope = "playlist-modify-public"
token = util.prompt_for_user_token(username, scope, client_id=cid, client_secret=secret, redirect_uri=redirect_uri)

if token:
    sp = spotipy.Spotify(auth=token)
else:
    print("Can't get token for", username)

In [31]:
# Create new playlist and insert it straight to user's library

def create_playlist(sp, username, playlist_name, playlist_description):
    playlists = sp.user_playlist_create(username, playlist_name, description = playlist_description)

In [32]:
create_playlist(sp, username, 'Spotify Discover Together', 'Choose a friend to discover brand new music with. We create an adventurous playlist curated to both of your tastes!')

In [33]:
# Fetch user's playlist library

def fetch_playlists(sp, username):
    """
    Returns the user's playlists.
    """
        
    id = []
    name = []
    num_tracks = []
    
    # Make the API request
    playlists = sp.user_playlists(username)
    for playlist in playlists['items']:
        id.append(playlist['id'])
        name.append(playlist['name'])
        num_tracks.append(playlist['tracks']['total'])

    # Create the final df   
    df_playlists = pd.DataFrame({"id":id, "name": name, "#tracks": num_tracks})
    return df_playlists

In [35]:
extracted_id = fetch_playlists(sp,username).id[0]

In [36]:
# Finally, fill the new playlist with the recommended songs straight to the user's library!

sp.user_playlist_add_tracks(username, extracted_id, list_of_recommendations_converted, position=None)

{'snapshot_id': 'MywwZTVlZGM2YmE2MTljY2I3MDkwZDEwNzU3M2Q5MDNmOTUxMWQwZjRj'}