# K-pop Song Recommender

**JC Nacpil (last updated: September 17, 2021)**

This code pulls your top tracks on Spotify, then recommends songs out of a database of over 11 thousand K-pop songs based on similar audio features such as danceability, tempo, energy, etc. Through this tool, I hope you discover an unfamiliar genre and new songs to enjoy! 😁

## Set-Up

### Installation of Dependencies

Before starting, make sure that you have installed all dependencies in `reqs/requirements.txt`. The following cell installs these packages in one line.

<div class="alert alert-block alert-warning">
<b>Note</b>: Currently, using `conda install` does not work for scikit-learn. You may opt to install using `pip install`, or install it yourself using the Anaconda Navigator or Anaconda Prompt. More details <a href="https://scikit-learn.org/stable/install.html">here</a>.
</div>

In [None]:
!pip install -r reqs/requirements.txt
# !conda install --file reqs/requirements.txt

### Importing Libraries

Run this cell to import all required libraries

In [None]:
# Library for accessing Spotify API
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth

# Scientific and vector computation for python
import numpy as np

# Data manipulation and analysis
import pandas as pd

# Library for this notebook providing utilitiy functions
from utils import repeatAPICall

# Progress bar
from tqdm import tqdm

# Cosine similarity calculation
from sklearn.metrics.pairwise import cosine_similarity

# Deep copy of python data structures
from copy import deepcopy


### Spotify API Set-up

**Scope**: This notebook will request for `user-top-read` and `playlist-modify-private(public)` scopes. Once authorized, this gives us access to the user's top tracks and ability to add and modify private (or public) playlists.

<div class="alert alert-block alert-warning">
<b>Note:</b> To run this notebook, you need to create a Spotify Developer account and set API credentials for your application (a client id, a client secret, and a redirect uri). Get started by going to <a href = "https://developer.spotify.com/dashboard/">My Dashboard</a>. For more information, you can refer to this <a href='https://developer.spotify.com/documentation/web-api/quick-start/'>link.</a> <br><br>
    
Redirect URIs are set after registering the application by going to `Edit Settings`. They can be any link and does not have to be reachable. I recommend using `https://open.spotify.com/` so that the authorization page redirects to the Spotify Web Player after completion.
    
Lastly, to give access to a user, add their email address to the `Users and Access` section of the Dashboard. 
</div>

In [None]:
# Spotify API Credentials
CLIENT_ID = ""
CLIENT_SECRET = ""
REDIRECT_URI = 'https://open.spotify.com/'

scope = 'user-top-read playlist-modify-private playlist-modify-public'
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id = CLIENT_ID, client_secret = CLIENT_SECRET, redirect_uri = REDIRECT_URI,  scope=scope))

# Confirm that we have access to Spotify
user_result = sp.me()
user_id = user_result['id']
user_name = user_result['display_name']
print("Hello {}! (user_id: {})".format(user_name,user_id))

## Step 1: Settings

You can modify the following settings before running the recommender system.

`num_recomms`: (default 20) Number of recommendations to generate <br>
`similarity_choice`: (default 'median') Set whether to choose similar tracks based on median or average similarity <br>
`num_toptracks` (default 50): Number of user top tracks to base recommendations on <br>

`time_range` (default 'medium_term'): time range of top tracks to be considered
* short_term: last 4 weeks
* medium_term: last 6 months
* long_term: all time

`create_playlist` (default True): Allow to create a new playlist in your Spotify profile with the following settings
* `public` (default False): Whether it is a public (True) or private (False) playlist 
* `collaborative` (default False): Whether it is a collaborative playlist that allows other people to add new songs
* `playlist_description`: Optional playlist description


In [None]:
#### Recommender Options ####
num_recomms = 20 # Maximum of 1000
similarity_choice = 'median' # Can also set to 'average'

#### Top tracks options ####
num_toptracks = 50 # Minimum 1, Maximum 50
time_range = 'medium_term' # Can also be 'short_term' or 'long_term'

#### Playlist options ######
create_playlist = True # If True, the notebook will create a playlist 
playlist_name = 'Kpop Song Recommendations'
public = False
collaborative = False
playlist_description =  "Autogenerated Kpop Playlist"

**At this point**: we are now ready to run the recommender system! You can run the cells one-by-one, or run all remaining cells by clicking `Run > Run All Cells` in the toolbar above.

## Step 2: Load Song Data 

In [None]:
# Load the track data 
database_dir = 'Data/tracks_top10_features.csv'
database_df = pd.read_csv(database_dir)

# Load user top track data
print("Loading user's top {} tracks data for {}".format(num_toptracks,time_range))
results, success = repeatAPICall(sp.current_user_top_tracks,{
        'limit': num_toptracks, 
        'time_range': time_range
    })

# Create dataframe for top tracks
df_cols = ['artist','track','track_uri']
artists = []
tracks = []
track_uris = []
for idx, track in enumerate(results['items']):
    artists.append(track['artists'][0]['name'])
    tracks.append(track['name'])
    track_uris.append(track['uri'])

user_toptracks_df = pd.DataFrame(zip(artists,tracks,track_uris),columns=df_cols)
print('Done! These are your top tracks')
display(user_toptracks_df)

### Preprocessing: Getting track features

In [None]:
tracks = user_toptracks_df.track.values
track_uris = user_toptracks_df.track_uri.values

batch_size = 10

# This list of columns is taken directly from the keys of a feature dictionary
features_cols = ['danceability', 
                 'energy', 
                 'key', 
                 'loudness', 
                 'mode', 
                 'speechiness', 
                 'acousticness', 
                 'instrumentalness', 
                 'liveness', 
                 'valence', 
                 'tempo', 
                 'type', 
                 'id', 
                 'uri', 
                 'track_href', 
                 'analysis_url', 
                 'duration_ms', 
                 'time_signature']

features_df = pd.DataFrame(columns = features_cols)

for i in tqdm(range(0, len(track_uris), batch_size)):
    
    # Select the current batch
    track_uris_batch = track_uris[i:i+batch_size]
    
    features_result, success = repeatAPICall(sp.audio_features,{'tracks':track_uris_batch})
    if not success: 
        print("Skipping to next batch.")
        continue
    
    # Deepcopy the list of dictionaries to be modified
    # This is necessary for this particular structure
    features_dicts = deepcopy(features_result)
    
    
    # Drop None in features_dict
    # This will mean that some of our songs will not have features
    if any(d is None for d in features_dicts):
        
        print("Batch: {} to {} | Some songs do not have features; dropping from list.".format(i+1, i+1+batch_size)) 
        print("Count: {}".format(len(features_dicts)))
        features_dicts = [d for d in features_dicts if d is not None]
        print("New count: {}".format(len(features_dicts)))

    temp_df = pd.DataFrame.from_records(features_dicts) 
    features_df = pd.concat([features_df.reset_index(drop=True), temp_df.reset_index(drop=True)])
    
    temp_df_count = len(temp_df.index)
    if temp_df_count != batch_size:
        print("Batch: {} to {} | Dataframe rows count: {}".format(i+1, i+1+batch_size, temp_df_count)) 


# Reset index and rename 'uri' to 'track_uri'
# Drop duplicates based on track_uri
features_df = features_df.rename(columns={'uri':'track_uri'}).drop_duplicates(subset=['track_uri'])

# Merge features to track_df by track_uri
# Note: some rows will not have features. We keep them for now to retain the track info
user_toptracks_df = user_toptracks_df.merge(features_df, on='track_uri', how='left').reset_index(drop = True)

### Preprocessing: Feature Scaling and Mean Normalization

In [None]:
# Only features in this list will be modified
target_features = [
    'danceability', 
    'energy', 
    'loudness', 
    'speechiness', 
    'acousticness', 
    'liveness', 
    'valence', 
    'tempo'
]

# Database with normalized features
database_normalized_df = database_df.copy()
database_normalized_df[target_features] = (database_df[target_features] - database_df[target_features].mean()) / database_df[target_features].std()

# Top tracks with normalized features
user_toptracks_normalized_df = user_toptracks_df.copy()
user_toptracks_normalized_df[target_features] = (user_toptracks_normalized_df[target_features] - database_df[target_features].mean()) / database_df[target_features].std()

## Step 3: Generate Recommendations

In [None]:
# List of top tracks uris 
user_tracks = user_toptracks_normalized_df.track_uri.values

# Filter out songs in database if they already occur in the user's top tracks
database_excluser_df = database_normalized_df[~database_normalized_df['track_uri'].isin(user_tracks)].dropna().reset_index(drop=True)
database_excluser_df.drop_duplicates('track_uri',inplace=True)

# Get the similarity values between the items of the two dataframes
# Similarity values has a shape of (number of database songs x number of top tracks)
database_features = database_excluser_df[target_features].values
toptracks_features = user_toptracks_normalized_df[target_features].values
similarity_values = cosine_similarity(database_features,toptracks_features)

# We take the mean across toptracks
ave_similarity_values = similarity_values.mean(axis = 1)

# Do a similar calculation for median
median_similarity_values = np.median(similarity_values, axis = 1)

# Store similarity values per track
database_similarity_df = database_excluser_df[df_cols].copy()
database_similarity_df['ave_similarity'] = ave_similarity_values
database_similarity_df['median_similarity'] = median_similarity_values

# Take first n songs with highest average (median) similarity values
if similarity_choice == 'average':
    recomms_df = database_similarity_df.sort_values(by = 'ave_similarity', ascending = False).head(num_recomms).reset_index(drop=True)
else:
    recomms_df = database_similarity_df.sort_values(by = 'median_similarity', ascending = False).head(num_recomms).reset_index(drop=True)

print("Here are your recommendations!")
display(recomms_df)

## Step 4: Save as new playlist

In [None]:
if not create_playlist:
    print("You have set the notebook to not save a playlist. To save a playlist to your personal profile, set create_playlist = True")
else: 
    print("""Creating playlist \nName: {}\nPublic: {}\nCollaborative: {}""".format(playlist_name,public,collaborative))
    print("Description: ", playlist_description)
    
    # Get list of track uris from recomms_df
    tracks_to_add = recomms_df.track_uri.values.tolist()
    
    playlist_result = sp.user_playlist_create(user_id, playlist_name, public, collaborative, description = playlist_description)
    playlist_link = playlist_result['external_urls']['spotify']
    playlist_id = playlist_result['id']
    sp.playlist_change_details(playlist_id, description = playlist_description)
    sp.user_playlist_add_tracks(user_id, playlist_id, tracks_to_add)
    print("Here's the link to your new playlist: {}".format(playlist_link))

## Done! Happy Listening! 🎵