# GNOD

Business goal:
Check the case_study_gnod.md file.

Make sure you've understood the big picture of your project:

the goal of the company (Gnod),
their current product (Gnoosic),
their strategy, and
how your project fits into this context.
Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

# GNOD | Part 5 

## Song Recommendation

### Importing Libraries

In [55]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

#for plots
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#scaling numerical variables
from sklearn.preprocessing import StandardScaler

#for clustering
from sklearn import cluster, datasets
from sklearn.cluster import KMeans
import pickle

#for spotify:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

### Importing Scaler and KMeans-Model

In [56]:
#kmeans model
kmeans = pickle.load(open('kmeans_7_cluster.p', 'rb'))

#scaler
scaler = pickle.load(open('scaler.p', 'rb'))

### Getting the playlists

In [57]:
playlist_7_clusters = pd.read_csv('playlist_7_clusters.csv')
top_100_songs = pd.read_csv('top_100_songs.csv')

### Initialize Spotipy Credentials

In [58]:
secrets_file = open("secrets.txt","r")
# we have to make sure, that it is a "gitignore-file" so that the client-id and client-secret are not shared on github
string = secrets_file.read()
string.split('\n')

['clientid: 3c2292972796423bb9d7f5b67200561d',
 'clientsecret: 9e59e460de064a638f0c7b2900ba9788']

In [59]:
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()
secrets_dict

{'clientid': '3c2292972796423bb9d7f5b67200561d',
 'clientsecret': '9e59e460de064a638f0c7b2900ba9788'}

In [60]:
#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                          client_secret=secrets_dict['clientsecret']))

### See the song recommender from GNOD-Part1&2

#### 1. Input a song

In [61]:
song = input("Enter your value: ")
print("your song is:", song)

your song is: unholy


#### 2. Song recommender

In [None]:
if (top_100_songs['title'].str.lower().str.contains(str(song).lower()).any() == True):
    print("ok")
    index = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())].index[0]
    title = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())]['title'].values[0]
    genre = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())]['genre'].values[0]

    print("your picked song was: ' ", title,"' the genre is:", genre)
    songs_tobe_sampled = top_100_songs.drop(index)
    songs_tobe_sampled_genre = songs_tobe_sampled[songs_tobe_sampled.genre == str(genre)]
    sampled_songs = songs_tobe_sampled_genre.sample()
    print("here is another song recommended for you:", sampled_songs['title'].values[0], "from the artist: ",sampled_songs['artist'].values[0],"with the genre", sampled_songs['genre'].values[0] ) #random sample
else:
    print("We have no recommendation for you at this time")

### Improving the recommender

1. Input a song
2. Look for that song in spotify
3. Retrieve the audio features
4. Cluster these audio features
5. Recommend another song in the same cluster

#### 1. Input a song

In [None]:
song = input("Enter your value: ")
print("your song is:", song)

your song is: unholy


#### 2. Look for that song in spotify

In [62]:
results = sp.search(q="track:" + song, type="track")
results

{'tracks': {'href': 'https://api.spotify.com/v1/search?query=track%3Aunholy&type=track&offset=0&limit=10',
  'items': [{'album': {'album_type': 'single',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2wY79sveU1sp5g7SokKOiI'},
       'href': 'https://api.spotify.com/v1/artists/2wY79sveU1sp5g7SokKOiI',
       'id': '2wY79sveU1sp5g7SokKOiI',
       'name': 'Sam Smith',
       'type': 'artist',
       'uri': 'spotify:artist:2wY79sveU1sp5g7SokKOiI'},
      {'external_urls': {'spotify': 'https://open.spotify.com/artist/3Xt3RrJMFv5SZkCfUE8C1J'},
       'href': 'https://api.spotify.com/v1/artists/3Xt3RrJMFv5SZkCfUE8C1J',
       'id': '3Xt3RrJMFv5SZkCfUE8C1J',
       'name': 'Kim Petras',
       'type': 'artist',
       'uri': 'spotify:artist:3Xt3RrJMFv5SZkCfUE8C1J'}],
     'available_markets': ['AD',
      'AE',
      'AG',
      'AL',
      'AM',
      'AO',
      'AR',
      'AT',
      'AU',
      'AZ',
      'BA',
      'BB',
      'BD',
      'BE',
      

#### 3. Retrieve the audio features

In [63]:
song_uri = results['tracks']['items'][0]['uri']
song_uri

'spotify:track:3nqQXoyQOWXiESFLlDF1hG'

In [64]:
audio_features = pd.DataFrame(sp.audio_features(song_uri))
audio_features

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.714,0.472,2,-7.375,1,0.0864,0.013,5e-06,0.266,0.238,131.121,audio_features,3nqQXoyQOWXiESFLlDF1hG,spotify:track:3nqQXoyQOWXiESFLlDF1hG,https://api.spotify.com/v1/tracks/3nqQXoyQOWXi...,https://api.spotify.com/v1/audio-analysis/3nqQ...,156943,4


#### 4. Cluster these audio features

Prepare Data for clustering:
1. Drop columns
2. Scaling the data

In [65]:
audio_features = audio_features.drop(columns=['type','id','uri','track_href','analysis_url'], axis=1)
display(audio_features.shape)
audio_features.head()

(1, 13)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,0.714,0.472,2,-7.375,1,0.0864,0.013,5e-06,0.266,0.238,131.121,156943,4


In [66]:
audio_features_scaled = scaler.transform(audio_features)

In [67]:
cluster_pred = kmeans.predict(audio_features_scaled)

In [68]:
cluster_pred

array([3], dtype=int32)

In [69]:
cluster_pred[0]

3

#### 5. Recommend another song in the same cluster

In [70]:
playlist_7_clusters.columns

Index(['title', 'artist', 'danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url',
       'duration_ms', 'time_signature', 'clusters'],
      dtype='object')

In [91]:
recommended_song = playlist_7_clusters[playlist_7_clusters['clusters'] == cluster_pred[0]].sample(1)
recommended_song.head()

Unnamed: 0,title,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,...,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,clusters
6549,If I Can't Have You,Shawn Mendes,0.691,0.823,2.0,-4.197,1.0,0.0623,0.487,0.0,...,0.87,123.935,audio_features,2bT1PH7Cw3J9p3t7nlXCdh,spotify:track:2bT1PH7Cw3J9p3t7nlXCdh,https://api.spotify.com/v1/tracks/2bT1PH7Cw3J9...,https://api.spotify.com/v1/audio-analysis/2bT1...,191467.0,4.0,3


In [90]:
recommended_song['title'].values[0]

'Rather Be (feat. Jess Glynne)'

### Song recommender - finalized: Song recommender Part 1&2 + Part5

In [93]:
song = input("Enter your value: ")
print("your song is:", song)

if (top_100_songs['title'].str.lower().str.contains(str(song).lower()).any() == True):
    print("ok")
    index = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())].index[0]
    title = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())]['title'].values[0]
    genre = top_100_songs[top_100_songs.title.str.lower().str.contains(str(song).lower())]['genre'].values[0]

    print("your picked song was: ' ", title,"' the genre is:", genre)
    songs_tobe_sampled = top_100_songs.drop(index)
    songs_tobe_sampled_genre = songs_tobe_sampled[songs_tobe_sampled.genre == str(genre)]
    sampled_songs = songs_tobe_sampled_genre.sample()
    print("here is another song recommended for you:", sampled_songs['title'].values[0], "from the artist: ",sampled_songs['artist'].values[0],"with the genre", sampled_songs['genre'].values[0] ) #random sample
else:
    #Look for that song in spotify
    results = sp.search(q="track:" + song, type="track")
    #retrieve the audio features
    song_uri = results['tracks']['items'][0]['uri']
    audio_features = pd.DataFrame(sp.audio_features(song_uri))
    #prepare data for clustering
    audio_features = audio_features.drop(columns=['type','id','uri','track_href','analysis_url'], axis=1)
    audio_features_scaled = scaler.transform(audio_features)
    #cluster with kmeans
    cluster_pred = kmeans.predict(audio_features_scaled)
    #recommend song
    recommended_song = playlist_7_clusters[playlist_7_clusters['clusters'] == cluster_pred[0]].sample(1)

    print("here is another song recommended for you:", recommended_song['title'].values[0], "from the artist: ",recommended_song['artist'].values[0])


your song is: adrift
here is another song recommended for you: Why Must The Show Go On - 2003 Digital Remaster from the artist:  Phil Oakey
