# Song Recommender Project

The goal of this notebook is to develop a new Song Recommender product based on Billboard - The Hot 100 list: (https://www.billboard.com/charts/hot-100/). 

When the user enters the name of a song included in the hot list, the Song Recommender will suggest another song from the hot list.

In case the song entered by the user is not included in the Billboard list anymore, the Song Recommender will conect with Spotify in order to find another song with the same features of the one entered by the user, and recommend it. 

Song Recommender will base its recommendations on the following features: 

* **Danceability**: Danceability describes how suitable for dancing a track is, based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

* **Acousticness**: A measure from 0.0 to 1.0 of whether the track is acoustic.

* **Energy**: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

* **Instrumentalness**: Predicts whether a track contains no vocals. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.

* **Liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.

* **Loudness**: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track. Values typical range between -60 and 0 db.

* **Speechiness**: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.

* **Tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

* **Valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).


In [1]:
# Import all necessary libraries:

import numpy as np
import pandas as pd
import spotipy as sp
import config
from time import sleep
from random import randint
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
import random 
import pickle
from sklearn import datasets # sklearn comes with some toy datasets to practise
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from matplotlib import pyplot
from sklearn.metrics import silhouette_score
from IPython.display import Image
from IPython.display import display
from IPython.display import IFrame
from sklearn.metrics import pairwise_distances_argmin_min

In [2]:
#import files form "Spotify Clustering" Notebook (Notebook available in the repository)

def load(filename = "filename.pickle"): 
    try: 
        with open(filename, "rb") as f: 
            return pickle.load(f) 
    except FileNotFoundError: 
        print("File not found!")

In [3]:
# #import files form "Spotify Clustering" Notebook (Notebook available in the repository)
scaler2 = load("Model/scaler.pickle")
scaler2

StandardScaler()

In [4]:
#import files form "Spotify Clustering" Notebook (Notebook available in the repository)
kmeans2 = load("Model/kmeans_4.pickle")
kmeans2

KMeans(n_clusters=6, random_state=1234)

# Loading data

In [5]:

billboard_df=pd.read_csv("Data/hot100.csv") #import files form "Bilboard-The hot 100" Notebook (Notebook available in the repository)

clus_df=pd.read_csv("Data/clustered_df_v2.csv")#import files form "Spotify Clustering" Notebook (Notebook available in the repository)

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=config.client_id,
                                                           client_secret=config.client_secret)) # Credentials to connect to Spotify


In [6]:
#Let's check our DataFrame
billboard_df

Unnamed: 0,artists,titles
0,glass animals,heat waves
1,"carolina gaitan, mauro castillo, adassa, rhenz...",we don't talk about bruno
2,gayle,abcdefu
3,kodak black,super gremlin
4,the kid laroi & justin bieber,stay
...,...,...
95,mary j. blige,good morning gorgeous
96,king von & 21 savage,don't play that
97,saweetie featuring h.e.r.,closer
98,blake shelton,come back as a country boy


In [7]:
billboard_df.rename(columns = {'artists' : 'artist', 'titles' : 'song_name'}, inplace = True)
billboard_df

Unnamed: 0,artist,song_name
0,glass animals,heat waves
1,"carolina gaitan, mauro castillo, adassa, rhenz...",we don't talk about bruno
2,gayle,abcdefu
3,kodak black,super gremlin
4,the kid laroi & justin bieber,stay
...,...,...
95,mary j. blige,good morning gorgeous
96,king von & 21 savage,don't play that
97,saweetie featuring h.e.r.,closer
98,blake shelton,come back as a country boy


In [8]:
clus_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,time_signature,cluster,type,id,uri,track_href,analysis_url,track_name,artist_name,artist_id
0,0.776,0.378,0,-8.035,1,0.0322,0.4350,0.001870,0.1100,0.453,...,4,1,audio_features,2EgfLUS0jNiujIWc3ZLEtn,spotify:track:2EgfLUS0jNiujIWc3ZLEtn,https://api.spotify.com/v1/tracks/2EgfLUS0jNiu...,https://api.spotify.com/v1/audio-analysis/2Egf...,Tangerine,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
1,0.663,0.697,0,-5.503,1,0.0508,0.2720,0.008860,0.1530,0.873,...,4,1,audio_features,2ngRZDAluwYoJeuqEA4dhK,spotify:track:2ngRZDAluwYoJeuqEA4dhK,https://api.spotify.com/v1/tracks/2ngRZDAluwYo...,https://api.spotify.com/v1/audio-analysis/2ngR...,Sidestep,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
2,0.596,0.675,9,-7.790,1,0.0517,0.1880,0.739000,0.1020,0.155,...,4,4,audio_features,3tcJ3yUXKtJpsgpAyVzP7R,spotify:track:3tcJ3yUXKtJpsgpAyVzP7R,https://api.spotify.com/v1/tracks/3tcJ3yUXKtJp...,https://api.spotify.com/v1/audio-analysis/3tcJ...,Crime of Passion,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
3,0.593,0.274,2,-15.402,1,0.2780,0.9340,0.000569,0.0758,0.346,...,4,0,audio_features,6W4osAjSVCvUwOlVFBP76n,spotify:track:6W4osAjSVCvUwOlVFBP76n,https://api.spotify.com/v1/tracks/6W4osAjSVCvU...,https://api.spotify.com/v1/audio-analysis/6W4o...,Together Lonely,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
4,0.871,0.281,9,-10.650,0,0.0466,0.7670,0.000180,0.1470,0.541,...,4,3,audio_features,1vdpFZ4rsQevl8WC6m3m9y,spotify:track:1vdpFZ4rsQevl8WC6m3m9y,https://api.spotify.com/v1/tracks/1vdpFZ4rsQev...,https://api.spotify.com/v1/audio-analysis/1vdp...,Small Talk,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
222308,0.906,0.724,6,-4.109,0,0.0931,0.4460,0.000000,0.0848,0.744,...,4,3,audio_features,4dUlJyHgdz6IeCJIYMHHDm,spotify:track:4dUlJyHgdz6IeCJIYMHHDm,https://api.spotify.com/v1/tracks/4dUlJyHgdz6I...,https://api.spotify.com/v1/audio-analysis/4dUl...,Me Gusta - Remix (feat. Cardi B & 24kGoldn),Cardi B,4kYSro6naA4h99UJvo89HB
222309,0.934,0.443,1,-7.541,1,0.4100,0.0272,0.000000,0.0889,0.359,...,4,5,audio_features,051wt8AyLFgYnVuberd3vO,spotify:track:051wt8AyLFgYnVuberd3vO,https://api.spotify.com/v1/tracks/051wt8AyLFgY...,https://api.spotify.com/v1/audio-analysis/051w...,WAP (feat. Megan Thee Stallion),Cardi B,4kYSro6naA4h99UJvo89HB
222310,0.903,0.447,6,-11.554,1,0.1160,0.0873,0.000000,0.1360,0.239,...,4,1,audio_features,3DyiAk1BzIF8rq9rimypG4,spotify:track:3DyiAk1BzIF8rq9rimypG4,https://api.spotify.com/v1/tracks/3DyiAk1BzIF8...,https://api.spotify.com/v1/audio-analysis/3Dyi...,La Bebe - Remix,Cardi B,4kYSro6naA4h99UJvo89HB
222311,0.805,0.835,0,-4.603,1,0.0896,0.1300,0.000005,0.3650,0.722,...,4,1,audio_features,1EJgymgJHcjSOGSHcYaxvW,spotify:track:1EJgymgJHcjSOGSHcYaxvW,https://api.spotify.com/v1/tracks/1EJgymgJHcjS...,https://api.spotify.com/v1/audio-analysis/1EJg...,South of the Border (feat. Camila Cabello & Ca...,Cardi B,4kYSro6naA4h99UJvo89HB


In [9]:
clus_df.rename(columns = {'artist_name' : 'artist', 'track_name' : 'song_name'}, inplace = True)
clus_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,time_signature,cluster,type,id,uri,track_href,analysis_url,song_name,artist,artist_id
0,0.776,0.378,0,-8.035,1,0.0322,0.4350,0.001870,0.1100,0.453,...,4,1,audio_features,2EgfLUS0jNiujIWc3ZLEtn,spotify:track:2EgfLUS0jNiujIWc3ZLEtn,https://api.spotify.com/v1/tracks/2EgfLUS0jNiu...,https://api.spotify.com/v1/audio-analysis/2Egf...,Tangerine,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
1,0.663,0.697,0,-5.503,1,0.0508,0.2720,0.008860,0.1530,0.873,...,4,1,audio_features,2ngRZDAluwYoJeuqEA4dhK,spotify:track:2ngRZDAluwYoJeuqEA4dhK,https://api.spotify.com/v1/tracks/2ngRZDAluwYo...,https://api.spotify.com/v1/audio-analysis/2ngR...,Sidestep,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
2,0.596,0.675,9,-7.790,1,0.0517,0.1880,0.739000,0.1020,0.155,...,4,4,audio_features,3tcJ3yUXKtJpsgpAyVzP7R,spotify:track:3tcJ3yUXKtJpsgpAyVzP7R,https://api.spotify.com/v1/tracks/3tcJ3yUXKtJp...,https://api.spotify.com/v1/audio-analysis/3tcJ...,Crime of Passion,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
3,0.593,0.274,2,-15.402,1,0.2780,0.9340,0.000569,0.0758,0.346,...,4,0,audio_features,6W4osAjSVCvUwOlVFBP76n,spotify:track:6W4osAjSVCvUwOlVFBP76n,https://api.spotify.com/v1/tracks/6W4osAjSVCvU...,https://api.spotify.com/v1/audio-analysis/6W4o...,Together Lonely,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
4,0.871,0.281,9,-10.650,0,0.0466,0.7670,0.000180,0.1470,0.541,...,4,3,audio_features,1vdpFZ4rsQevl8WC6m3m9y,spotify:track:1vdpFZ4rsQevl8WC6m3m9y,https://api.spotify.com/v1/tracks/1vdpFZ4rsQev...,https://api.spotify.com/v1/audio-analysis/1vdp...,Small Talk,Tim Atlas,3CiuXDKttPUT0tWGHicFUH
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
222308,0.906,0.724,6,-4.109,0,0.0931,0.4460,0.000000,0.0848,0.744,...,4,3,audio_features,4dUlJyHgdz6IeCJIYMHHDm,spotify:track:4dUlJyHgdz6IeCJIYMHHDm,https://api.spotify.com/v1/tracks/4dUlJyHgdz6I...,https://api.spotify.com/v1/audio-analysis/4dUl...,Me Gusta - Remix (feat. Cardi B & 24kGoldn),Cardi B,4kYSro6naA4h99UJvo89HB
222309,0.934,0.443,1,-7.541,1,0.4100,0.0272,0.000000,0.0889,0.359,...,4,5,audio_features,051wt8AyLFgYnVuberd3vO,spotify:track:051wt8AyLFgYnVuberd3vO,https://api.spotify.com/v1/tracks/051wt8AyLFgY...,https://api.spotify.com/v1/audio-analysis/051w...,WAP (feat. Megan Thee Stallion),Cardi B,4kYSro6naA4h99UJvo89HB
222310,0.903,0.447,6,-11.554,1,0.1160,0.0873,0.000000,0.1360,0.239,...,4,1,audio_features,3DyiAk1BzIF8rq9rimypG4,spotify:track:3DyiAk1BzIF8rq9rimypG4,https://api.spotify.com/v1/tracks/3DyiAk1BzIF8...,https://api.spotify.com/v1/audio-analysis/3Dyi...,La Bebe - Remix,Cardi B,4kYSro6naA4h99UJvo89HB
222311,0.805,0.835,0,-4.603,1,0.0896,0.1300,0.000005,0.3650,0.722,...,4,1,audio_features,1EJgymgJHcjSOGSHcYaxvW,spotify:track:1EJgymgJHcjSOGSHcYaxvW,https://api.spotify.com/v1/tracks/1EJgymgJHcjS...,https://api.spotify.com/v1/audio-analysis/1EJg...,South of the Border (feat. Camila Cabello & Ca...,Cardi B,4kYSro6naA4h99UJvo89HB


# Connect with Spotify

In [10]:
# Conect with Spotify in order to get the most important features of a song: 

#def spotify_search():


results = sp.search(q='lose yourself', limit=1)


In [11]:
#Extract the track id
track_ids = [track['id'] for track in results['tracks']['items']]

# extract the audio features
audio_features = sp.audio_features(track_ids)

# store audio features in a dataframe
audio_df = pd.DataFrame(audio_features)
audio_df


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.689,0.735,2,-4.545,1,0.267,0.00922,0.00072,0.365,0.059,171.403,audio_features,77Ft1RJngppZlq59B6uP0z,spotify:track:77Ft1RJngppZlq59B6uP0z,https://api.spotify.com/v1/tracks/77Ft1RJngppZ...,https://api.spotify.com/v1/audio-analysis/77Ft...,320627,4


In [12]:
# create a function to automate work
def get_audio_features():
    # get tracks from artist
    results = sp.search(q=f'track:{song_name}', limit=1)
    # extract the track ids
    track_ids = [track['id'] for track in results['tracks']['items']]
    song_names = [track['name'] for track in results['tracks']['items']]
    # extract the audio features
    audio_features = sp.audio_features(track_ids)
    # store audio features in a dataframe
    audio_df = pd.DataFrame(audio_features)
    audio_df['artist'] = artist
    audio_df['song name'] = song_names
    return audio_df

In [13]:
# We get numerical columns:
X=audio_df._get_numeric_data()

In [14]:
scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
X_scaled_df = pd.DataFrame(X_scaled, columns = X.columns)
display(X.head())

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,0.689,0.735,2,-4.545,1,0.267,0.00922,0.00072,0.365,0.059,171.403,320627,4


In [15]:
clusters = kmeans2.predict(X_scaled_df)

In [16]:
clusters

array([1])

# Create the recommender function

In [30]:
choice=["is a great choice!"," is amazing!","is cool!"]
hot_song=""

def recommended_song():
    hot_song,hot_artist =input("Please enter a song name and the artist name: ").split(",")
    print("hot song is:",hot_song)
    print("hot artist is:",hot_artist)
    
    song_name=hot_song.lower()
    artist_name=hot_artist.lower()

    if song_name in billboard_df["song_name"].values:
        print(song_name,random.choice(choice),"You might like this one from the hot list too:",random.choice(billboard_df["song_name"]))                                                                             
    elif artist_name in billboard_df["artist"].values:
        print(song_name,random.choice(choice),"You might like this one from the hot list too:",random.choice(billboard_df["song_name"]))
        
  
    else:
        results = sp.search(q=f'track:{song_name}', limit=1)
        track_id = results['tracks']['items'][0]['id']
        audio_features = sp.audio_features(track_id)
        df_ = pd.DataFrame(audio_features)
        x = df_[['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]
        closest, _ = pairwise_distances_argmin_min(x, clus_df[x.columns])
        return ' - '.join([clus_df.loc[closest]['song_name'].values[0], clus_df.loc[closest]['artist'].values[0]])
       


In [33]:
recommended_song() 

Please enter a song name and the artist name: let her go, passengers
hot song is: let her go
hot artist is:  passengers


'Let Her Go - Harry Styles'

In [19]:
billboard_df["artist"].value_counts()



kodak black                                3
nicki minaj x lil baby                     2
adele                                      2
doja cat                                   2
ed sheeran                                 2
                                          ..
drake featuring 21 savage & project pat    1
ckay                                       1
morgan wallen                              1
doja cat & the weeknd                      1
lil shordie scott                          1
Name: artist, Length: 93, dtype: int64

In [20]:
clus_df.columns


Index(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'duration_ms', 'time_signature', 'cluster', 'type', 'id', 'uri',
       'track_href', 'analysis_url', 'song_name', 'artist', 'artist_id'],
      dtype='object')