# ![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | API wrappers - Create your collection of songs & audio features


#### Instructions 


To move forward with the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.
The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [2]:
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="84acfa90a4b040b288565c9043c891fc",
                                                           client_secret="b2096cde0fb74a10a980ebcd67843f8a"))

In [3]:
playlist = sp.user_playlist_tracks("turgon1993", "44sPcoQ10on4NGGNtvsjI4")
playlist.keys()
#playlist['total']

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

We created the dictionary "Playlist". It is a dictionary which contains all the information from the playlist I requested, in a JSON format.

In [4]:
len(playlist['items']) 

91

We are interested in the items, which are the tracks in the playlist.

In [5]:
results = sp.user_playlist_tracks("turgon1993", "44sPcoQ10on4NGGNtvsjI4")
tracks = results['items']

while results['next']:
    results = sp.next(results)
    tracks.extend(results['items'])

#This while loop iterates over the playlist for every track 
#and appends the information of the tracks to the tracks list

#The tracks list is a list which contains different dictionaries

#One can access to the song titles like this:    
tracks[0]['track']['name']

'Under Pressure - Remastered 2011'

In [6]:
len(tracks)

91

In [7]:
#We get the artists participating in a song this way:
tracks[0]['track']['artists']

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/1dfeR4HaWDbWqFHLkxsg1d'},
  'href': 'https://api.spotify.com/v1/artists/1dfeR4HaWDbWqFHLkxsg1d',
  'id': '1dfeR4HaWDbWqFHLkxsg1d',
  'name': 'Queen',
  'type': 'artist',
  'uri': 'spotify:artist:1dfeR4HaWDbWqFHLkxsg1d'},
 {'external_urls': {'spotify': 'https://open.spotify.com/artist/0oSGxfWSnnOXhD2fKuz2Gy'},
  'href': 'https://api.spotify.com/v1/artists/0oSGxfWSnnOXhD2fKuz2Gy',
  'id': '0oSGxfWSnnOXhD2fKuz2Gy',
  'name': 'David Bowie',
  'type': 'artist',
  'uri': 'spotify:artist:0oSGxfWSnnOXhD2fKuz2Gy'}]

In [8]:
#Manually get all the songs from the playlist
song_names=[]
i=0

while i < len(tracks):
    song_names.append(tracks[i]['track']['name'])
    i+=1
    
#song_names

### Get info using functions

In [9]:
username="turgon1993"
playlist_id="44sPcoQ10on4NGGNtvsjI4"

In [10]:
def get_playlist_tracks(username, playlist_id):
    
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    
    return tracks

In [11]:
def get_playlist_tracklist(username, playlist_id):
    
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
        
    song_names=[]
    i=0

    while i < len(tracks):
        song_names.append(tracks[i]['track']['name'])
        i+=1
    
    return song_names

In [13]:
def get_artists_from_playlist(username,playlist_id):
    
    tracks_from_playlist = get_playlist_tracks(username, playlist_id)
    
    artists = []
    
    for track in tracks_from_playlist:
        artists_info = track['track']['artists']
        
        for artist_info in artists_info:
            artists.append(artist_info['name'])
    
    return list(set(artists))

In [14]:
artists_playlist=get_artists_from_playlist(username,playlist_id)
len(artists_playlist)

68

In [16]:
def get_playlists(username):
    playlist_ids=[]
    i=0
    while i < len(sp.user_playlists(username)['items']):
        playlist_id=sp.user_playlists(username)['items'][i]['id']
        playlist_ids.append(playlist_id)
        i+=1
    return playlist_ids

In [19]:
turgon1993_playlists=get_playlists(username)

In [26]:
artists=sum([get_artists_from_playlist(username,playlist) for playlist in turgon1993_playlists], [])

In [28]:
len(artists)

1554

In [36]:
clean = [x for x in artists if x != None]

In [37]:
len(clean)

1554

In [29]:
def get_audio_features(artist):
    # get tracks from artist
    results = sp.search(q=f'artist:{artist}', limit=50)
    # extract the track ids
    track_ids = [track['id'] for track in results['tracks']['items']]
    song_names = [track['name'] for track in results['tracks']['items']]
    # extract the audio features
    audio_features = sp.audio_features(track_ids)
    # store audio features in a dataframe
    df = pd.DataFrame(audio_features)
    df['artist'] = artist
    df['song name'] = song_names
    return df

In [32]:
artists[13]

'Mike Candys'

In [33]:
df_13=get_audio_features(artists[13])
df_13

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,artist,song name
0,0.937,0.823,11,-2.867,1,0.0956,0.013,0.397,0.121,0.616,126.063,audio_features,4YprqLB3jEAlHNOMXEcZtm,spotify:track:4YprqLB3jEAlHNOMXEcZtm,https://api.spotify.com/v1/tracks/4YprqLB3jEAl...,https://api.spotify.com/v1/audio-analysis/4Ypr...,141002,4,Mike Candys,Vibe
1,0.843,0.861,6,-4.48,1,0.059,0.0125,0.916,0.0703,0.518,129.967,audio_features,2OvqYU6lo9GA6hO2jVttKm,spotify:track:2OvqYU6lo9GA6hO2jVttKm,https://api.spotify.com/v1/tracks/2OvqYU6lo9GA...,https://api.spotify.com/v1/audio-analysis/2Ovq...,135706,4,Mike Candys,Flexin
2,0.873,0.924,10,-2.316,1,0.0675,0.00515,0.369,0.0997,0.537,125.989,audio_features,7qwQ2UGJnixDow4kyz9PDe,spotify:track:7qwQ2UGJnixDow4kyz9PDe,https://api.spotify.com/v1/tracks/7qwQ2UGJnixD...,https://api.spotify.com/v1/audio-analysis/7qwQ...,141429,4,Mike Candys,Push It
3,0.771,0.888,8,-4.447,1,0.0963,0.0445,0.00431,0.0976,0.619,101.285,audio_features,1umrjqwMMDpuwXBzDx8YEv,spotify:track:1umrjqwMMDpuwXBzDx8YEv,https://api.spotify.com/v1/tracks/1umrjqwMMDpu...,https://api.spotify.com/v1/audio-analysis/1umr...,140887,3,Mike Candys,Overdose
4,0.825,0.946,9,-4.585,1,0.0444,0.0196,0.472,0.134,0.322,126.016,audio_features,3831W9bQFWQyiH2F9Y2G0U,spotify:track:3831W9bQFWQyiH2F9Y2G0U,https://api.spotify.com/v1/tracks/3831W9bQFWQy...,https://api.spotify.com/v1/audio-analysis/3831...,146667,4,Mike Candys,Like That
5,0.57,0.993,4,-2.12,0,0.0408,0.0122,0.392,0.236,0.175,125.989,audio_features,6uRUfq1y0VayUjh2M935g7,spotify:track:6uRUfq1y0VayUjh2M935g7,https://api.spotify.com/v1/tracks/6uRUfq1y0Vay...,https://api.spotify.com/v1/audio-analysis/6uRU...,180000,4,Mike Candys,Insomnia - Rework
6,0.796,0.926,6,-4.925,0,0.0401,0.0027,0.0,0.0599,0.458,126.03,audio_features,4E18v8nVzQCeZBAGMZOtCI,spotify:track:4E18v8nVzQCeZBAGMZOtCI,https://api.spotify.com/v1/tracks/4E18v8nVzQCe...,https://api.spotify.com/v1/audio-analysis/4E18...,173333,4,Mike Candys,The Riddle Anthem - Rework
7,0.635,0.743,0,-4.266,1,0.162,0.00785,2e-05,0.302,0.224,141.945,audio_features,2kbzyz1CZHSAX1dH4btQLY,spotify:track:2kbzyz1CZHSAX1dH4btQLY,https://api.spotify.com/v1/tracks/2kbzyz1CZHSA...,https://api.spotify.com/v1/audio-analysis/2kbz...,151268,4,Mike Candys,Louder
8,0.876,0.894,10,-3.555,1,0.0529,0.00959,0.478,0.201,0.476,126.042,audio_features,0YwlHzRYRU2LV2gtWklmD5,spotify:track:0YwlHzRYRU2LV2gtWklmD5,https://api.spotify.com/v1/tracks/0YwlHzRYRU2L...,https://api.spotify.com/v1/audio-analysis/0Ywl...,139405,4,Mike Candys,Boom
9,0.789,0.941,6,-3.326,1,0.0412,0.00918,0.856,0.0719,0.614,127.998,audio_features,5q2cqCySPgK595wNyN0XEr,spotify:track:5q2cqCySPgK595wNyN0XEr,https://api.spotify.com/v1/tracks/5q2cqCySPgK5...,https://api.spotify.com/v1/audio-analysis/5q2c...,139687,4,Mike Candys,Baby


In [46]:
df = pd.DataFrame()
error_log={'artist':[],'error':[]}

for artist in artists:
    #print("Getting features for:",artist)
    try:
        df_artist = get_audio_features(artist)
        df = pd.concat([df, df_artist])
    except Exception as e:
        #print("Error found:",e,"in artist",artist)
        error_log['artist'].append(artist)
        error_log['error'].append(e)
        # Logs the error appropriately. 
        continue
        
df = df.reset_index(drop=True)
error_log_df=pd.DataFrame(error_log)

In [62]:
df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,artist,song name
0,0.745,0.7710,7,-4.051,1,0.0967,0.00543,0.000000,0.1540,0.701,124.005,audio_features,0wpEtERZ78hweuJCWD9eAE,spotify:track:0wpEtERZ78hweuJCWD9eAE,https://api.spotify.com/v1/tracks/0wpEtERZ78hw...,https://api.spotify.com/v1/audio-analysis/0wpE...,163209,4,Tiscore,Fire To Smoke
1,0.718,0.8590,0,-3.603,1,0.0417,0.29600,0.000000,0.3980,0.757,120.041,audio_features,5Av85j2myp1Zcp7ICKDN4P,spotify:track:5Av85j2myp1Zcp7ICKDN4P,https://api.spotify.com/v1/tracks/5Av85j2myp1Z...,https://api.spotify.com/v1/audio-analysis/5Av8...,162000,4,Tiscore,The Tide Is High
2,0.724,0.9150,5,-3.358,1,0.1610,0.09750,0.000638,0.3130,0.518,120.045,audio_features,7l33AaaNrYHcr0kjzNiBUf,spotify:track:7l33AaaNrYHcr0kjzNiBUf,https://api.spotify.com/v1/tracks/7l33AaaNrYHc...,https://api.spotify.com/v1/audio-analysis/7l33...,164750,4,Tiscore,Down
3,0.605,0.9010,4,-5.141,0,0.0837,0.04730,0.000000,0.3980,0.546,123.990,audio_features,5aavp5BSNuKzqpq11m34fi,spotify:track:5aavp5BSNuKzqpq11m34fi,https://api.spotify.com/v1/tracks/5aavp5BSNuKz...,https://api.spotify.com/v1/audio-analysis/5aav...,151452,4,Tiscore,Fire To Smoke - Yves V Remix
4,0.774,0.8080,11,-5.206,0,0.0544,0.05300,0.002210,0.1230,0.264,112.896,audio_features,4ETYOkzLdLBHGqy05q5a8e,spotify:track:4ETYOkzLdLBHGqy05q5a8e,https://api.spotify.com/v1/tracks/4ETYOkzLdLBH...,https://api.spotify.com/v1/audio-analysis/4ETY...,161090,4,Tiscore,Where The Roses Grow - VIZE & NOØN Remix
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71925,0.306,0.4160,4,-20.363,0,0.0435,0.62200,0.898000,0.0748,0.913,81.589,audio_features,25HFZB3SgFI1zEOeURx17w,spotify:track:25HFZB3SgFI1zEOeURx17w,https://api.spotify.com/v1/tracks/25HFZB3SgFI1...,https://api.spotify.com/v1/audio-analysis/25HF...,101435,4,Andrew Zack,Fur Elise
71926,0.451,0.5850,2,-13.421,0,0.0505,0.17700,0.324000,0.0905,0.155,110.028,audio_features,7AIgqX0acXyzkOs64ahhrm,spotify:track:7AIgqX0acXyzkOs64ahhrm,https://api.spotify.com/v1/tracks/7AIgqX0acXyz...,https://api.spotify.com/v1/audio-analysis/7AIg...,233336,4,Andrew Zack,Requiem Deus
71927,0.576,0.8860,8,-8.534,1,0.0505,0.22400,0.004310,0.1250,0.326,131.953,audio_features,3NfAAwntISr3hmJ01n5Yrw,spotify:track:3NfAAwntISr3hmJ01n5Yrw,https://api.spotify.com/v1/tracks/3NfAAwntISr3...,https://api.spotify.com/v1/audio-analysis/3NfA...,206563,4,Andrew Zack,Shimmer
71928,0.445,0.6750,2,-7.965,1,0.0349,0.01200,0.029800,0.1470,0.311,134.075,audio_features,2igdn18wrfICzkiD6zHLpb,spotify:track:2igdn18wrfICzkiD6zHLpb,https://api.spotify.com/v1/tracks/2igdn18wrfIC...,https://api.spotify.com/v1/audio-analysis/2igd...,281200,4,Andrew Zack,Here We Are


In [64]:
df.to_csv('./song_features.csv') 

In [49]:
error_log_df.head()

Unnamed: 0,artist,error
0,Sam Gray,'NoneType' object has no attribute 'keys'
1,Rebecca Helena,Length of values (0) does not match length of ...
2,Agent Zed,Length of values (0) does not match length of ...
3,The Lyndhurst Orchestra,Length of values (0) does not match length of ...
4,Cream,'NoneType' object has no attribute 'keys'


### Approach for Error Handling: 

Do not run the loop for the entire list at once. Instead, do it in a "fractioned" way. In the case here, run the loop with get_audio_features() for the first 300 elements, then 300-600, and so on in groups of 300 elements until the last element of the list. This is going to let me avoid running the whole loop every time, especially if it is prone to crash at some point in the middle of it.

With this approach, if I lose data I will only lose about 300 elements of it, which is much faster to debug and run again than doing it for the whole code every time.

In [50]:
len(artists)

1554

In [55]:
art_1=artists[0:300]
art_2=artists[301:600]
art_3=artists[601:900]
art_4=artists[901:1200]
art_5=artists[1201:]

In [59]:
def get_features_artist_list(list_artists):
    df = pd.DataFrame()
    error_log={'artist':[],'error':[]}

    for artist in list_artists:
        #print("Getting features for:",artist)
        try:
            df_artist = get_audio_features(artist)
            df = pd.concat([df, df_artist])
        except Exception as e:
            #print("Error found:",e,"in artist",artist)
            error_log['artist'].append(artist)
            error_log['error'].append(e)
            # Logs the error appropriately. 
            continue

    df = df.reset_index(drop=True)
    error_log_df=pd.DataFrame(error_log)
    
    return df, error_log_df

In [60]:
df_1, err_log_1 = get_features_artist_list(art_1)

In [63]:
df_1.head()
err_log_1

Unnamed: 0,artist,error
0,Sam Gray,'NoneType' object has no attribute 'keys'
1,Rebecca Helena,Length of values (0) does not match length of ...
2,Agent Zed,Length of values (0) does not match length of ...
3,The Lyndhurst Orchestra,Length of values (0) does not match length of ...
4,Cream,'NoneType' object has no attribute 'keys'
5,Cream,'NoneType' object has no attribute 'keys'
