<img src='gnod.png' width='75' align='left'/> <h1> Song recommender project </h1>
_____
                                                                                Nelson Lage
                                                                                       Gnod

In [1]:
import config
import pickle
import pandas as pd
from operator import itemgetter
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from IPython.display import IFrame, Image

%config Completer.use_jedi = False

## The data

* Hot songs - 225 unique songs obtained by web scraping the US, UK and Germany official charts
* Recommendation songs - 34358 unique songs from different genres and countries retrieved through the Spotify API

* 13 features:
    * **danceability:** describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. Ranges from 0.0 to 1.0.
    * **energy:** Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
    * **key:** Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D. Ranges from -1 (no key detected) to 11.
    * **loudness:** averaged decibels across the entire track. Values typically range between -60 and 0 db.
    * **mode:**  indicates the modality (major - 1 or minor - 0) of a track, the type of scale from which its melodic content is derived.

* 13 features (cont'd):
    * **speechiness:** detects the presence of spoken words in a track. The more exclusively speech-like the recording, the closer to 1.0 the attribute value.
    * **acousticness:** confidence measure from 0.0 to 1.0 of whether the track is acoustic.
    * **instrumentalness:** predicts whether a track contains no vocals (0.0 to 1.0)
    * **liveness:** detects the presence of an audience in the recording (0.0 to 1.0).
    * **valence:** measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track (the closer to 1.0, the happier). 
    * **tempo:** overall estimated tempo of a track in beats per minute (BPM)
    * **duration_ms**
    * **time_signature:** a notational convention to specify how many beats are in each bar (or measure). It ranges from 3 to 

## Choosing k (K-Means)

### Elbow method
<img src='elbow.png' width='800' align='left'/>

### Silhouette method
<img src='silhouette.png' width='800' align='left'/>

In [2]:
with open('Model/scaler.pickle', 'rb') as f: 
    scaler = pickle.load(f)

with open('Model/kmeans_10.pickle', 'rb') as f: 
    model = pickle.load(f)
    
songs_scaled = pd.read_pickle('songs_clustered.pickle')
hot_songs = pd.read_pickle('hot_songs.pickle')

In [3]:
songs_scaled

Unnamed: 0,id,title,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,cluster
0,33xMbeHzmWd6Od0BmLZEUs,2k,nosaj thing,-1.599636,-0.857371,0.487216,-1.256731,-1.388730,0.134288,-0.552637,-0.176626,-0.492752,-1.535655,-0.922147,-0.455235,-2.650057,4
1,3UnyplmZaq547hwsfOR5yy,4 billion souls,the doors,-0.939541,-0.335376,-0.071465,-0.829752,0.720082,-0.502148,-0.412299,0.884016,-0.399305,0.560588,1.072669,-0.244593,0.191635,9
2,1w8QCSDH4QobcQeT4uMKLm,4 minute warning,radiohead,-1.333175,-1.479415,1.045897,-1.190656,0.720082,-0.528050,1.066146,-0.164413,-0.505212,-1.133145,0.090762,-0.027275,0.191635,5
3,7J9mBHG4J2eIfDAv5BehKA,7 element,vitas,0.925682,0.621614,-0.071465,0.329052,-1.388730,-0.186397,0.201272,0.052680,0.734522,1.803988,0.301099,-0.000891,0.191635,1
4,1VZedwJj1gyi88WFRhfThb,#9 dream,r.e.m.,-0.019041,0.356267,-1.468168,0.505569,0.720082,-0.609455,-0.784032,-0.431504,-0.624201,-0.487534,-0.158889,0.131522,0.191635,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34353,6pWgRkpqVfxnj3WuIcJ7WP,cornfield chase,hans zimmer,-2.386905,-1.810011,1.045897,-1.996667,-1.388730,-0.412110,2.244332,3.425302,-0.574986,-1.830564,-0.967846,-0.574677,0.191635,2
34354,6VfNTf0N1HwfFKl7Y18diU,omen,the prodigy,-0.176495,1.352407,0.487216,0.695204,0.720082,-0.386209,-0.856352,0.017220,0.553857,0.201915,0.670438,-0.159118,0.191635,9
34355,28d1X9lfagOD4iFULH4qEK,dark star - homemade weapons remix,"quadrant, iris, homemade weapons",-1.436126,1.456806,1.045897,1.453031,0.720082,0.273662,-0.843365,2.991904,0.423031,-1.789915,1.735860,0.445923,0.191635,6
34356,5HiSc2ZCGn8L3cH3qSwzBT,러시안 룰렛 russian roulette,red velvet,0.919626,1.134909,0.487216,1.385764,-1.388730,-0.489814,-0.679921,-0.442871,-0.212410,1.796017,0.313121,-0.181434,0.191635,1


In [4]:
hot_songs

Unnamed: 0,song,artist
0,all too well (taylor's version),taylor swift
1,easy on me,adele
2,stay,the kid laroi & justin bieber
3,industry baby,lil nas x & jack harlow
4,smokin out the window,silk sonic (bruno mars & anderson .paak)
...,...,...
220,in da getto,j balvin & skrillex
221,stimmen,musso
222,wasted love,ofenbach feat. lagique
223,mond,montez x badmómzjay


In [5]:
feat_names = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness',
              'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature']

In [6]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=config.client_id,
                                                           client_secret=config.client_secret))

In [7]:
def spotify_player(track_id):

    return IFrame(src=f"https://open.spotify.com/embed/track/{track_id}",
       width="320",
       height="80",
       frameborder="0",
       allowtransparency="true",
       allow="encrypted-media",
      )

In [10]:
def song_suggestion():
    
    user_song = input('Enter a song: ')
    songs = hot_songs['song'].values
    
    if user_song.lower() in songs:
        suggestion = hot_songs.sample(1)
        song_suggestion = suggestion.iloc[0][0]
        by_artist = suggestion.iloc[0][1]
         
        print('Here\'s another hot song for you: ' +
                     '\033[1m' + song_suggestion.title() +  '\033[0m' +
                     ' by ' + by_artist.title())
    
    else:
        
        try:
            
            responses = sp.search(q=user_song, type="track", limit=25, market='US')
            tracks = responses['tracks']['items']
        
        except:
            print('It seemns like this song doesn\'t exist. Please, try again!')
            song_suggestion()
        
        else:
            artists_song_id = {}
                
            for track in tracks:
                
                multiple_artists = []
                for artist in track['artists']:
                    multiple_artists.append(artist['name'])
                artists = ', '.join(multiple_artists)
                
                if artists not in artists_song_id.keys():
                    artists_song_id[artists] = track['id']
                
            artists_list = list(artists_song_id.keys())
                        
            if len(artists_list) > 0:
                enumerated_artists = list(enumerate(artists_list))
                print('\nUnfortunately the song is not in the hot list!\n')
                print('Maybe we can recommend you something else based on your input.')
                print('First, we need to know the artist/s from your song:\n')
                print(*enumerated_artists, sep='\n')
                artist_number = input('\nPlease, choose a number from the list: ')
            
            input_ = False
            
            while input_ == False:
                try:
                    artist_number = int(artist_number)
                    key = artists_list[artist_number]
                    input_ = True
                    
                except:
                    artist_number = input('Please, enter a valid number: ')
                 
                else:
                    id_ = artists_song_id[key]
                    
                    all_feats = sp.audio_features(id_)[0]
                    
                    selected_feats = {key: all_feats[key] for key in feat_names}
                    
                    features_df = pd.DataFrame([selected_feats])
                    
                    feat_scaled = pd.DataFrame(scaler.transform(features_df), columns=feat_names)
                    
                    cluster_recommended = model.predict(feat_scaled)[0]
                    
                    recommended_id = songs_scaled['id'][songs_scaled['cluster'] ==  cluster_recommended].sample(1).values[0]
    
    return spotify_player(recommended_id)

## Our recommender

In [13]:
song_suggestion()

Enter a song: irreplaceable

Unfortunately the song is not in the hot list!

Maybe we can recommend you something else based on your input.
First, we need to know the artist/s from your song:

(0, 'Beyoncé')
(1, 'Madilyn Paige')
(2, 'NCT DREAM')
(3, 'C Blu')
(4, 'Riley Clemmons')
(5, 'Ed Patrick')
(6, 'JUNG')
(7, 'Tundra Beats, Darkforestdrives')
(8, 'IcyDaRabbit, CG5')
(9, 'Zak Manley')
(10, 'Sugarland')
(11, 'Kendra Logozar')
(12, 'Beyoncé, DJ Speedy, Ghost')
(13, 'Vybz Kartel, Lisa Hyper')
(14, 'Rockabye Baby!')
(15, 'Anthem Lights')

Please, choose a number from the list: 0
