# Lab | Unsupervised learning intro

#### Instructions 


It's the moment to perform clustering on the songs you collected. Remember that the ultimate goal of this little project is to improve the recommendations of artists. Clustering the songs will allow the recommendation system to limit the scope of the recommendations to only songs that belong to the same cluster - songs with similar audio features.

The experiments you did with the `Spotify` API and the Billboard web scraping will allow you to create a pipeline such that when the user enters a song, you:

1. Check whether or not the song is in the Billboard Hot 200.
2. Collect the audio features from the `Spotify` API.

After that, you want to send the `Spotify` audio features of the submitted song to the clustering model, which should return a cluster number.

##### 0. Train a clustering model on the song database

In [46]:
# Import relevant libraries
import numpy as np 
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import getpass

In [47]:
# Create connection to Spotify API
client_id = "31bb38d4d2c54b0e9b994db2a71040d5"
client_secret = getpass.getpass('Write client secret:')

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret))

In [50]:
# Import the song database
song_database = pd.read_csv("song_database.csv", index_col="song_id")

In [58]:
# Save audio features for the songs in a dataframe
audio_database = pd.DataFrame(columns=['danceability', 'energy', 'key', \
    'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', \
    'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', \
    'analysis_url', 'duration_ms', 'time_signature'])

for song_id in song_database.index:
    audio_features = sp.audio_features(tracks=song_id)[0]
    audio_database = audio_database.append(audio_features, ignore_index=True)

# Set the database index to the song id
audio_database.set_index("id", drop=True, inplace=True)

# Remove the columns unnecessary for the training
to_drop = ["uri", "track_href", "analysis_url"]
audio_database.drop(to_drop, inplace=True)

##### 1. Check if the song is in the Billboard Hot 200

In [24]:
new_song = input("Write a song you like:")
new_artist = input("Write the artist of that song:")

In [4]:
hot_100 = pd.read_csv("billboard_hot_100.csv")

if new_song in hot_100.Song and new_artist in hot_100.Artist:
    print("This song is hot right now!")
else:
    print("This song is not that popular nowadays.")

This song is not that popular nowadays.


##### 2. Collect the song audio features

In [41]:
# Ask the user for a song until they find something within the database
song_found = True

while song_found == True:
    try:
        new_song_id = song_database[(song_database["song_name"] == new_song) & \
        (song_database["artist_name"] == new_artist)].index[0]
        song_found = False

    except:
        print("Sorry, we couldn't find this song on Spotify")
        new_song = input("Please add another song you like:")
        new_artist = input("And add the name of the artist:")    

In [56]:
# Get the song features for the recommendation system
sp.audio_features(tracks=new_song_id)[0]["id"]

'5Q0Nhxo0l2bP3pNjpGJwV1'

In [57]:
new_song_id

'5Q0Nhxo0l2bP3pNjpGJwV1'

##### 3. Test the clustering model on submitted songs and make a recommendation from the same cluster