# GNOD

Business goal:
Check the case_study_gnod.md file.

Make sure you've understood the big picture of your project:

the goal of the company (Gnod),
their current product (Gnoosic),
their strategy, and
how your project fits into this context.
Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

# GNOD | Part 3

## Lab | API wrappers - Create your collection of songs & audio features

Instructions  
To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [33]:
#Top 10'000 Songs of all times:
#https://open.spotify.com/playlist/1G8IpkZKobrIlXcVPoSIuf?si=77396c77e4394657

### Authentification Process:

In [34]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [35]:
secrets_file = open("secrets.txt","r")
# we have to make sure, that it is a "gitignore-file" so that the client-id and client-secret are not shared on github
string = secrets_file.read()
string.split('\n')


['clientid: 3c2292972796423bb9d7f5b67200561d',
 'clientsecret: 9e59e460de064a638f0c7b2900ba9788']

In [36]:
secrets_dict={}
for line in string.split('\n'):
    if len(line) > 0:
        secrets_dict[line.split(':')[0]]=line.split(':')[1].strip()
secrets_dict

{'clientid': '3c2292972796423bb9d7f5b67200561d',
 'clientsecret': '9e59e460de064a638f0c7b2900ba9788'}

In [37]:
#Initialize SpotiPy with user credentials
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=secrets_dict['clientid'],
                                                          client_secret=secrets_dict['clientsecret']))

### Getting Songs from Playlists

In [38]:
# from Best songs from 50ies - 2020ies
playlist_50_id = "37i9dQZF1DWSV3Tk4GO2fq"
playlist_60_id = "3SbE5Sk5MWtNc0GRTU0X6d"
playlist_70_id = "37i9dQZF1DX1Hya1sRqqxI"
playlist_80_id = "37i9dQZF1DXb57FjYWz00c"
playlist_90_id = "37i9dQZF1DXbTxeAdrVG2l"
playlist_00_id = "37i9dQZF1DX4o1oenSJRJd"
playlist_10_id = "37i9dQZF1DX5Ejj0EkURtP"
playlist_20_id = "4vSTV61efRmetmaoz95Vet"
playlist_top_1000_1920_2019 = "23HsgHgvpjludlObYNpA1S"
playlist_top_500_oat = "0JiVp7Z0pYKI8diUV6HJyQ"
playlist_bollywood_100 = "37i9dQZF1DWZNJXX2UeBij"
playlist_best_2000 = "37i9dQZF1DWTmvXBN4DgpA"

In [39]:
playlist_ids = [playlist_50_id, playlist_60_id, playlist_70_id, playlist_80_id, playlist_90_id, 
playlist_00_id, playlist_10_id, playlist_20_id, playlist_top_1000_1920_2019]#, playlist_top_500_oat, playlist_bollywood_100, playlist_best_2000]

### Function to extract songs of a playlist

Import

In [40]:
from random import randint
from time import sleep
import pandas as pd
import requests
from pandas.io.json import json_normalize
import numpy as np

In [41]:
def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks("spotify",playlist_id)
    tracks = results['items']
    while results['next']!=None:
        results = sp.next(results)
        tracks = tracks + results['items']
        sleep((randint(1,3000)/1000))# a respectful nap
    return tracks

In [42]:
# this will take at least around num_songs_in_playlist * (avg_sleep_time + processing_time) = 53 * (2+0.1) = 110 seconds
all_tracks = []
for playlist in playlist_ids:
    all_tracks.extend(get_playlist_tracks(playlist))
len(all_tracks)

1051

In [43]:
artist_name = []
    
# get the name of the artists from a list of tracks:
for track in range(0,len(all_tracks)):
    artist_name.append(all_tracks[track]['track']['artists'][0]['name'])
# removing duplicates: 
artist_name = list(set(artist_name))
# artist_name
# len(artist_name)

In [44]:
# get the top 10 tracks from every artist
top_10_tracks = []
for artist in artist_name:
    top_10_tracks.append(sp.search(q=artist, limit=10))
    sleep((randint(1,3000)/1000))# a respectful nap
    

In [45]:
# for every top 10 tracks from every artist, get the title, artist and uri (for the audio features)
song_title = []
song_artist = []
song_uri = []
    
for artist in range(0,len(artist_name)):
    for song in range(0,10):
        # get the song name for every artist
        song_title.append(top_10_tracks[artist]['tracks']['items'][song]['name'])
        song_artist.append(top_10_tracks[artist]['tracks']['items'][song]['artists'][0]['name'])
            
        # to get the audio features we need the song_uri
        # get the song_uri
        song_uri.append(top_10_tracks[artist]['tracks']['items'][song]['uri'])


In [46]:
# get the audio features 
# append to a list the audio features found with the song_uri
song_audio_f = []

for uri in range(0,len(song_uri)):
    song_audio_f.append(sp.audio_features(song_uri[uri])[0])

In [47]:
#create a DataFrame with title, artist and audio features
data_songs = pd.DataFrame({'title':song_title,'artist':song_artist,'audio':song_audio_f})
#"explode" and create the audio features: data_songs['audio'].apply(pd.Series)

#concat the old data_songs without audio and the audiofeatures df. 
data_songs = pd.concat([data_songs.drop(['audio'], axis=1), data_songs['audio'].apply(pd.Series)], axis=1)

  data_songs = pd.concat([data_songs.drop(['audio'], axis=1), data_songs['audio'].apply(pd.Series)], axis=1)
  data_songs = pd.concat([data_songs.drop(['audio'], axis=1), data_songs['audio'].apply(pd.Series)], axis=1)
  data_songs = pd.concat([data_songs.drop(['audio'], axis=1), data_songs['audio'].apply(pd.Series)], axis=1)


In [48]:
data_songs

Unnamed: 0,title,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,It's Beginning to Look a Lot Like Christmas (w...,Perry Como,0.724,0.276,7.0,-12.250,1.0,0.0599,0.76000,0.000000,0.1160,0.718,112.874,audio_features,2pXpURmn6zC5ZYDMms6fwa,spotify:track:2pXpURmn6zC5ZYDMms6fwa,https://api.spotify.com/v1/tracks/2pXpURmn6zC5...,https://api.spotify.com/v1/audio-analysis/2pXp...,155933.0,4.0
1,And I Love You So,Perry Como,0.415,0.220,8.0,-17.470,1.0,0.0261,0.87800,0.651000,0.1250,0.268,90.267,audio_features,1naVD19eofGpFf6wosmHIe,spotify:track:1naVD19eofGpFf6wosmHIe,https://api.spotify.com/v1/tracks/1naVD19eofGp...,https://api.spotify.com/v1/audio-analysis/1naV...,197627.0,4.0
2,Christmas Dream,Perry Como,0.514,0.287,10.0,-13.971,1.0,0.0352,0.67200,0.000004,0.0677,0.834,184.356,audio_features,1PrhnQxWAVYikCHcieRQiy,spotify:track:1PrhnQxWAVYikCHcieRQiy,https://api.spotify.com/v1/tracks/1PrhnQxWAVYi...,https://api.spotify.com/v1/audio-analysis/1Prh...,169160.0,4.0
3,(There's No Place Like) Home for the Holidays ...,Perry Como,0.532,0.401,5.0,-10.629,1.0,0.0525,0.86900,0.000000,0.2220,0.450,143.823,audio_features,2GapxG7BxK55ihQRAlR39e,spotify:track:2GapxG7BxK55ihQRAlR39e,https://api.spotify.com/v1/tracks/2GapxG7BxK55...,https://api.spotify.com/v1/audio-analysis/2Gap...,178293.0,4.0
4,Magic Moments,Perry Como,0.719,0.257,0.0,-14.446,0.0,0.0435,0.88000,0.000003,0.2570,0.750,103.829,audio_features,7bflxIMDz5mFxhQyYx1CEh,spotify:track:7bflxIMDz5mFxhQyYx1CEh,https://api.spotify.com/v1/tracks/7bflxIMDz5mF...,https://api.spotify.com/v1/audio-analysis/7bfl...,159907.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5985,Walkin' On Sunshine (Re-Recorded Version),Katrina,0.458,0.928,10.0,-5.235,1.0,0.0798,0.02460,0.000001,0.0550,0.962,218.967,audio_features,3Wq1XLo82SODnXJpYHvnyD,spotify:track:3Wq1XLo82SODnXJpYHvnyD,https://api.spotify.com/v1/tracks/3Wq1XLo82SOD...,https://api.spotify.com/v1/audio-analysis/3Wq1...,219429.0,4.0
5986,Walking On Sunshine,Katrina & The Waves,0.604,0.929,10.0,-8.790,1.0,0.0389,0.01170,0.205000,0.0583,0.945,109.885,audio_features,2NynOElNo3XKrYILRLHnJL,spotify:track:2NynOElNo3XKrYILRLHnJL,https://api.spotify.com/v1/tracks/2NynOElNo3XK...,https://api.spotify.com/v1/audio-analysis/2Nyn...,239027.0,4.0
5987,Walking On Sunshine,Katrina & The Waves,0.610,0.874,10.0,-11.469,1.0,0.0367,0.01110,0.257000,0.0373,0.931,109.851,audio_features,1R6sDs5ovdcweZlFzGxQVD,spotify:track:1R6sDs5ovdcweZlFzGxQVD,https://api.spotify.com/v1/tracks/1R6sDs5ovdcw...,https://api.spotify.com/v1/audio-analysis/1R6s...,241427.0,4.0
5988,Do You Want Crying,Katrina & The Waves,0.427,0.780,2.0,-10.671,1.0,0.0350,0.00009,0.029400,0.2020,0.815,155.964,audio_features,0Ea4GdyudP4ZTJXmm7PNd2,spotify:track:0Ea4GdyudP4ZTJXmm7PNd2,https://api.spotify.com/v1/tracks/0Ea4GdyudP4Z...,https://api.spotify.com/v1/audio-analysis/0Ea4...,214773.0,4.0


In [60]:
data_songs.head()

Unnamed: 0,title,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,It's Beginning to Look a Lot Like Christmas (w...,Perry Como,0.724,0.276,7.0,-12.25,1.0,0.0599,0.76,0.0,0.116,0.718,112.874,audio_features,2pXpURmn6zC5ZYDMms6fwa,spotify:track:2pXpURmn6zC5ZYDMms6fwa,https://api.spotify.com/v1/tracks/2pXpURmn6zC5...,https://api.spotify.com/v1/audio-analysis/2pXp...,155933.0,4.0
1,And I Love You So,Perry Como,0.415,0.22,8.0,-17.47,1.0,0.0261,0.878,0.651,0.125,0.268,90.267,audio_features,1naVD19eofGpFf6wosmHIe,spotify:track:1naVD19eofGpFf6wosmHIe,https://api.spotify.com/v1/tracks/1naVD19eofGp...,https://api.spotify.com/v1/audio-analysis/1naV...,197627.0,4.0
2,Christmas Dream,Perry Como,0.514,0.287,10.0,-13.971,1.0,0.0352,0.672,4e-06,0.0677,0.834,184.356,audio_features,1PrhnQxWAVYikCHcieRQiy,spotify:track:1PrhnQxWAVYikCHcieRQiy,https://api.spotify.com/v1/tracks/1PrhnQxWAVYi...,https://api.spotify.com/v1/audio-analysis/1Prh...,169160.0,4.0
3,(There's No Place Like) Home for the Holidays ...,Perry Como,0.532,0.401,5.0,-10.629,1.0,0.0525,0.869,0.0,0.222,0.45,143.823,audio_features,2GapxG7BxK55ihQRAlR39e,spotify:track:2GapxG7BxK55ihQRAlR39e,https://api.spotify.com/v1/tracks/2GapxG7BxK55...,https://api.spotify.com/v1/audio-analysis/2Gap...,178293.0,4.0
4,Magic Moments,Perry Como,0.719,0.257,0.0,-14.446,0.0,0.0435,0.88,3e-06,0.257,0.75,103.829,audio_features,7bflxIMDz5mFxhQyYx1CEh,spotify:track:7bflxIMDz5mFxhQyYx1CEh,https://api.spotify.com/v1/tracks/7bflxIMDz5mF...,https://api.spotify.com/v1/audio-analysis/7bfl...,159907.0,4.0


In [62]:
data_songs.shape

(5990, 20)

Check for duplicates in title column

In [63]:
data_songs_cleaned = data_songs.drop_duplicates(subset=['title'])

In [64]:
data_songs.shape

(5990, 20)

Save to csv-file

In [65]:
data_songs.to_csv('data_songs.csv', index = False)

In [66]:
data = pd.read_csv('data_songs.csv')
data.head()

Unnamed: 0,title,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,It's Beginning to Look a Lot Like Christmas (w...,Perry Como,0.724,0.276,7.0,-12.25,1.0,0.0599,0.76,0.0,0.116,0.718,112.874,audio_features,2pXpURmn6zC5ZYDMms6fwa,spotify:track:2pXpURmn6zC5ZYDMms6fwa,https://api.spotify.com/v1/tracks/2pXpURmn6zC5...,https://api.spotify.com/v1/audio-analysis/2pXp...,155933.0,4.0
1,And I Love You So,Perry Como,0.415,0.22,8.0,-17.47,1.0,0.0261,0.878,0.651,0.125,0.268,90.267,audio_features,1naVD19eofGpFf6wosmHIe,spotify:track:1naVD19eofGpFf6wosmHIe,https://api.spotify.com/v1/tracks/1naVD19eofGp...,https://api.spotify.com/v1/audio-analysis/1naV...,197627.0,4.0
2,Christmas Dream,Perry Como,0.514,0.287,10.0,-13.971,1.0,0.0352,0.672,4e-06,0.0677,0.834,184.356,audio_features,1PrhnQxWAVYikCHcieRQiy,spotify:track:1PrhnQxWAVYikCHcieRQiy,https://api.spotify.com/v1/tracks/1PrhnQxWAVYi...,https://api.spotify.com/v1/audio-analysis/1Prh...,169160.0,4.0
3,(There's No Place Like) Home for the Holidays ...,Perry Como,0.532,0.401,5.0,-10.629,1.0,0.0525,0.869,0.0,0.222,0.45,143.823,audio_features,2GapxG7BxK55ihQRAlR39e,spotify:track:2GapxG7BxK55ihQRAlR39e,https://api.spotify.com/v1/tracks/2GapxG7BxK55...,https://api.spotify.com/v1/audio-analysis/2Gap...,178293.0,4.0
4,Magic Moments,Perry Como,0.719,0.257,0.0,-14.446,0.0,0.0435,0.88,3e-06,0.257,0.75,103.829,audio_features,7bflxIMDz5mFxhQyYx1CEh,spotify:track:7bflxIMDz5mFxhQyYx1CEh,https://api.spotify.com/v1/tracks/7bflxIMDz5mF...,https://api.spotify.com/v1/audio-analysis/7bfl...,159907.0,4.0


##### Exploring the ouput

In [49]:
# type(all_tracks[0]['track']['album'])

In [50]:
# all_tracks[0]['track']['artists'][0]['name']#.keys()

In [51]:
#get the id of an artist
# all_tracks[0]['track']['artists'][0]['id']

In [52]:
# get the song title/name of the first song:
# all_tracks[0]["track"]["name"]

In [53]:
# get the song artist of the first song:
# all_tracks[0]["track"]["artists"][0]['name']

In [54]:
# # get the uri of the first song:
# song_uri = all_tracks[0]['track']["uri"]
# song_uri
# # 

In [55]:
# # get the audio features for that song
# audio_features=sp.audio_features(song_uri)

#### Function to extract song title, song name and audio features from playlist

In [56]:
# def get_title_artist_audiofeatures_from_playlist_alltracks(tracks):
#     title = []
#     artist = []
#     audio = []

#     for track in range(0,len(tracks)):
#         #store the title and the artist into new lists
#         title.append(tracks[track]["track"]["name"])
#         artist.append(tracks[track]["track"]["artists"][0]['name'])
        
#         # *** get the audio features
#         #get the song_uri
#         song_uri = tracks[track]['track']["uri"]
#         #append to a list the audio features found with the song_uri
#         audio.append(sp.audio_features(song_uri)[0])
#         #create a DataFrame with title, artist and audio features
#         data_songs = pd.DataFrame({'title':title,'artist':artist,'audio':audio})
#         #"explode" and create the audio features: data_songs['audio'].apply(pd.Series)
#         #concat the old data_songs without audio and the audiofeatures df. 
#         data_songs = pd.concat([data_songs.drop(['audio'], axis=1), data_songs['audio'].apply(pd.Series)], axis=1)
#     return data_songs