### Lab: Create your collection of songs & audio features

To move forward witht the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.

The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

## Spotipy API

Create an Spotify account and follow these steps to register an app: https://developer.spotify.com/documentation/general/guides/app-settings/

After the app is created, you can see it on your dashboard
https://developer.spotify.com/dashboard/applications

Click on it and you'll find the client id and client secret.

#### Authentification

In [1]:
import spotipy # install if needed
from spotipy.oauth2 import SpotifyClientCredentials

In [5]:
#Initialize SpotiPy with user credentias
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(
    client_id="your-client-id",
    client_secret="your-client-secret"))

#### Pagination using "next"

When you collect songs from a playlist using `sp.playlist_tracks`, you're limited by the `limit` parameter, which has a maximum (and default) value of 100. When the playlist has more than 100 songs, you have to collect them by navigating through the "pages" of the results.

The parameter `offset` allows you to retrieve resuls starting at a certain position: if you start at position 101, you'd get the next "page" of results. An offset of 201 would give you the third page, and so on.

The function `sp.next()` does the same, but in a simpler way: it can be used on the results from any request to directly retrieve the results for the next page.

We can check whether there's a next page or not by accessing the key `next` on the results from any request.

In [73]:
tracks_from_playlist = sp.playlist_tracks("https://open.spotify.com/playlist/6uYt5DwcyaOxxPUundboC2")

In [80]:
# this is the link to the "next" page
next_page_link = tracks_from_playlist["next"]
next_page_link

'https://api.spotify.com/v1/playlists/6uYt5DwcyaOxxPUundboC2/tracks?offset=100&limit=100&additional_types=track'

In [83]:
# this gives the full results directly
next_results = sp.next(tracks_from_playlist)
next_results

{'href': 'https://api.spotify.com/v1/playlists/6uYt5DwcyaOxxPUundboC2/tracks?offset=100&limit=100&additional_types=track',
 'items': [{'added_at': '2018-05-23T08:44:29Z',
   'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/haeltotoe'},
    'href': 'https://api.spotify.com/v1/users/haeltotoe',
    'id': 'haeltotoe',
    'type': 'user',
    'uri': 'spotify:user:haeltotoe'},
   'is_local': False,
   'primary_color': None,
   'track': {'album': {'album_type': 'album',
     'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/0LyfQWJT6nXafLPZqxe9Of'},
       'href': 'https://api.spotify.com/v1/artists/0LyfQWJT6nXafLPZqxe9Of',
       'id': '0LyfQWJT6nXafLPZqxe9Of',
       'name': 'Various Artists',
       'type': 'artist',
       'uri': 'spotify:artist:0LyfQWJT6nXafLPZqxe9Of'}],
     'available_markets': ['AD',
      'AE',
      'AL',
      'AR',
      'AU',
      'BA',
      'BE',
      'BG',
      'BH',
      'BO',
      'BR',
      'BY',
     

In [88]:
# this playlist has 111 songs, so there's no 3rt page:
print(sp.next(next_results))
print(next_results["next"])

None
None


 ## Sample lab solution

#### Step 1: get uris of all the songs in a playlist (no matter how many songs the playlist has)

In [140]:
def get_tracks_ids_from_playlist(playlist_id):
    
    final_results = []
    result = sp.playlist_tracks(playlist_id)
    final_results.extend(result["items"])
    
    while result["next"]:
        
        result = sp.next(result)
        final_results.extend(result["items"])
        
    ids = [track["track"]["id"] for track in final_results]
    
    # we return only values that are not None:
    return [i for i in ids if i]

In [107]:
get_tracks_ids_from_playlist("https://open.spotify.com/playlist/6uYt5DwcyaOxxPUundboC2")

['74X1epeRufHckhuX1KFD04',
 '1QEEqeFIZktqIpPI4jSVSF',
 '6m4HWTYMRSJkaUuvXTaNmE',
 '2WfaOiMkCvy7F5fcp2zZ8L',
 '0sDqo9UPzPUtu9wEkI3zRB',
 '67Hna13dNDkZvBpTXRIaOJ',
 '2X485T9Z5Ly0xyaghN73ed',
 '1KDsONFxp3YtnJTaLeWFIi',
 '5AhDb4oM6f4YmHPXW123Fg',
 '6cr6UDpkjEaMQ80OjWqEBQ',
 '7E1boGBVKRPqbHuEDXXZ7D',
 '3MODES4TNtygekLl146Dxd',
 '131yybV7A3TmC34a0qE8u8',
 '2LkaNhCrNVmcYgXJeLVmsw',
 '05f8Hg3RSfiPSCBQOtxl3i',
 '3gdewACMIVMEWVbyb8O9sY',
 '5dRwQffP46e2zNsBiEEJ9P',
 '70gbuMqwNBE2Y5rkQJE9By',
 '28VC9MNZSJZOqiAHUU8XSP',
 '38Ngied9rBORlAbLYNCl4k',
 '6pnwfWyaWjQiHCKTiZLItr',
 '2nVHqZbOGkKWzlcy1aMbE7',
 '7BY005dacJkbO6EPiOh2wb',
 '5ihS6UUlyQAfmp48eSkxuQ',
 '1lFC3sMgOcDrVzNh8zXRnl',
 '0hKRSZhUGEhKU6aNSPBACZ',
 '0YveezON7jpiaHA8fnUHxN',
 '6aBUnkXuCEQQHAlTokv9or',
 '6mcxQ1Y3uQRU0IHsvdNLH1',
 '3TO7bbrUKrOSPGRTB5MeCz',
 '7qL6WYHu1148Y58bhgFZC2',
 '72Z17vmmeQKAg8bptWvpVG',
 '7Jh1bpe76CNTCgdgAdBw4Z',
 '3FCto7hnn1shUyZL42YgfO',
 '57zJeqbOA6AHsv2n6BMcm6',
 '1ju7EsSGvRybSNEsRvc7qY',
 '2QVmiA93GVhWNTWQctyY1K',
 

#### Step 2: build a df with all the audio features

In [102]:
# we're gonna iterate through the output of the function above
sp.audio_features(track_ids[0])[0]

[{'danceability': 0.339,
  'energy': 0.143,
  'key': 0,
  'loudness': -10.78,
  'mode': 1,
  'speechiness': 0.0317,
  'acousticness': 0.921,
  'instrumentalness': 0.000636,
  'liveness': 0.372,
  'valence': 0.0948,
  'tempo': 101.213,
  'type': 'audio_features',
  'id': '74X1epeRufHckhuX1KFD04',
  'uri': 'spotify:track:74X1epeRufHckhuX1KFD04',
  'track_href': 'https://api.spotify.com/v1/tracks/74X1epeRufHckhuX1KFD04',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/74X1epeRufHckhuX1KFD04',
  'duration_ms': 413320,
  'time_signature': 3}]

In [143]:
def get_audio_features_df(playlist_id):
    track_ids = get_tracks_ids_from_playlist(playlist_id)
    aud_feat = [sp.audio_features(track)[0] for track in track_ids]
    return pd.DataFrame(aud_feat)

In [129]:
audio_f = get_audio_features_df("https://open.spotify.com/playlist/6uYt5DwcyaOxxPUundboC2")
audio_f.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.339,0.143,0,-10.78,1,0.0317,0.921,0.000636,0.372,0.0948,101.213,audio_features,74X1epeRufHckhuX1KFD04,spotify:track:74X1epeRufHckhuX1KFD04,https://api.spotify.com/v1/tracks/74X1epeRufHc...,https://api.spotify.com/v1/audio-analysis/74X1...,413320,3
1,0.377,0.682,7,-8.039,1,0.0299,0.000894,0.00217,0.0504,0.288,108.736,audio_features,1QEEqeFIZktqIpPI4jSVSF,spotify:track:1QEEqeFIZktqIpPI4jSVSF,https://api.spotify.com/v1/tracks/1QEEqeFIZktq...,https://api.spotify.com/v1/audio-analysis/1QEE...,285133,4
2,0.138,0.712,0,-5.271,1,0.069,0.746,0.773,0.0714,0.133,99.877,audio_features,6m4HWTYMRSJkaUuvXTaNmE,spotify:track:6m4HWTYMRSJkaUuvXTaNmE,https://api.spotify.com/v1/tracks/6m4HWTYMRSJk...,https://api.spotify.com/v1/audio-analysis/6m4H...,203067,4
3,0.573,0.902,6,-7.638,0,0.054,0.018,0.00125,0.0928,0.876,84.412,audio_features,2WfaOiMkCvy7F5fcp2zZ8L,spotify:track:2WfaOiMkCvy7F5fcp2zZ8L,https://api.spotify.com/v1/tracks/2WfaOiMkCvy7...,https://api.spotify.com/v1/audio-analysis/2Wfa...,225280,4
4,0.63,0.519,9,-10.997,1,0.0528,0.225,4e-06,0.0974,0.502,143.942,audio_features,0sDqo9UPzPUtu9wEkI3zRB,spotify:track:0sDqo9UPzPUtu9wEkI3zRB,https://api.spotify.com/v1/tracks/0sDqo9UPzPUt...,https://api.spotify.com/v1/audio-analysis/0sDq...,278627,4


#### Step 3: grow the df with multiple playlists

In [142]:
playlists = ["https://open.spotify.com/playlist/2FaOyGU5gIiwDpck5Ui7L9",
             "https://open.spotify.com/playlist/10h9MYW9lIRp31fawvmjE8",
             "https://open.spotify.com/playlist/3YdCzysSu8jLNYXYaheZAF",
             "https://open.spotify.com/playlist/6IfGK9nLC9ChgD7FTZzkLJ"]

In [144]:
def get_big_audio_df(playlists_list):
    audio_f_df = pd.DataFrame()
    
    for p in playlists_list:
        audio_f_df = audio_f_df.append(get_audio_features_df(p))

    return audio_f_df

In [145]:
big_df_songs_audiofeat = get_big_audio_df(playlists)

In [146]:
big_df_songs_audiofeat

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.658,0.2590,11,-13.141,0,0.0705,0.69400,0.000059,0.9750,0.306,110.376,audio_features,1n7JnwviZ7zf0LR1tcGFq7,spotify:track:1n7JnwviZ7zf0LR1tcGFq7,https://api.spotify.com/v1/tracks/1n7JnwviZ7zf...,https://api.spotify.com/v1/audio-analysis/1n7J...,256213,4
1,0.742,0.3990,2,-12.646,1,0.0346,0.21700,0.000002,0.1070,0.693,125.039,audio_features,5QGM1U0eCYrQuwSJwTm5Zq,spotify:track:5QGM1U0eCYrQuwSJwTm5Zq,https://api.spotify.com/v1/tracks/5QGM1U0eCYrQ...,https://api.spotify.com/v1/audio-analysis/5QGM...,191867,4
2,0.851,0.7300,2,-11.048,1,0.3470,0.45300,0.000063,0.1240,0.905,93.698,audio_features,0NLIFSZxPzQhCwnkn5PJYs,spotify:track:0NLIFSZxPzQhCwnkn5PJYs,https://api.spotify.com/v1/tracks/0NLIFSZxPzQh...,https://api.spotify.com/v1/audio-analysis/0NLI...,152267,4
3,0.705,0.0502,4,-18.115,1,0.0471,0.87900,0.000041,0.3860,0.524,106.802,audio_features,3mXqOdlLE1k67WsAxryPFs,spotify:track:3mXqOdlLE1k67WsAxryPFs,https://api.spotify.com/v1/tracks/3mXqOdlLE1k6...,https://api.spotify.com/v1/audio-analysis/3mXq...,186227,4
4,0.651,0.1190,6,-19.807,1,0.0380,0.91600,0.000343,0.1040,0.402,120.941,audio_features,7bSzjzjTkWT2CkIPPdp0eA,spotify:track:7bSzjzjTkWT2CkIPPdp0eA,https://api.spotify.com/v1/tracks/7bSzjzjTkWT2...,https://api.spotify.com/v1/audio-analysis/7bSz...,273680,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
218,0.617,0.5670,0,-4.188,1,0.0828,0.05840,0.000000,0.0933,0.505,90.246,audio_features,1CnPYaKxTVb4LWOtiGOm0m,spotify:track:1CnPYaKxTVb4LWOtiGOm0m,https://api.spotify.com/v1/tracks/1CnPYaKxTVb4...,https://api.spotify.com/v1/audio-analysis/1CnP...,217603,4
219,0.608,0.5360,8,-5.355,0,0.0469,0.00704,0.000036,0.4600,0.167,112.035,audio_features,1bjd4UgjnK5hedJJKmi3YP,spotify:track:1bjd4UgjnK5hedJJKmi3YP,https://api.spotify.com/v1/tracks/1bjd4UgjnK5h...,https://api.spotify.com/v1/audio-analysis/1bjd...,203841,4
220,0.541,0.7910,4,-4.183,1,0.2230,0.05390,0.000000,0.1100,0.314,105.792,audio_features,2B5uUYyiDpFCgEGdbVpOZj,spotify:track:2B5uUYyiDpFCgEGdbVpOZj,https://api.spotify.com/v1/tracks/2B5uUYyiDpFC...,https://api.spotify.com/v1/audio-analysis/2B5u...,208474,4
221,0.501,0.6740,5,-6.363,1,0.0408,0.00346,0.000036,0.2820,0.152,155.051,audio_features,1fipvP2zmef6vN2IwXfJhY,spotify:track:1fipvP2zmef6vN2IwXfJhY,https://api.spotify.com/v1/tracks/1fipvP2zmef6...,https://api.spotify.com/v1/audio-analysis/1fip...,200838,4


Disclaimer: when Spotify's API takes more than 5 seconds to respond, it gives a Time readout error - ideally we'd use try - except to circumvent that. It can be in our next iteration!