![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | API wrappers - Create your collection of songs & audio features


#### Instructions 


To move forward with the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.
The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [21]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
from tqdm.notebook import tqdm
import getpass
import math
from collections import defaultdict

In [4]:
client_id = getpass.getpass(prompt='input client_id') 
client_secret = getpass.getpass(prompt='input client_secret')

input client_id········
input client_secret········


In [5]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret))

Get many songs

In [6]:
user = 'spotify'

playlists = sp.user_playlists(user)
playlist_uri = []

while playlists:
    for playlist in playlists['items']:
        playlist_uri.append(playlist['uri'])
    if playlists['next']:
        playlists = sp.next(playlists)
    else:
        playlists = None

In [10]:
def get_tracks_from_playlist(username, playlist_id):
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

In [8]:
def get_all_tracks(uri):

    s_id = []
    s_name = []
    s_artists = []
    
    results = get_tracks_from_playlist("",uri)
    c = 0
    for i in results:
        if i['track'] != None:
            s_id.append("missing value" if i['track']['id'] == None else i['track']['id'])
            s_name.append(i['track']['name'])
            s_artists.append([j['name'] for j in i['track']['artists']])

    return {'song_id':s_id, 'song_name':s_name, 'artists':s_artists}

In [11]:
songs = {'song_id':[], 'song_name':[], 'artists':[]}

for i in tqdm(playlist_uri):
    d = get_all_tracks(i)
    [songs['song_id'].append(i) for i in d['song_id']]
    [songs['song_name'].append(i) for i in d['song_name']]
    [songs['artists'].append(i) for i in d['artists']]

  0%|          | 0/1396 [00:00<?, ?it/s]

In [12]:
df_songs = pd.DataFrame(songs)
df_songs

Unnamed: 0,song_id,song_name,artists
0,5Kskr9LcNYa0tpt5f0ZEJx,Calling My Phone,"[Lil Tjay, 6LACK]"
1,5uEYRdEIh9Bo4fpjDd4Na9,Goosebumps - Remix,"[Travis Scott, HVME]"
2,1diS6nkxMQc3wwC4G1j0bh,We're Good,[Dua Lipa]
3,5QO79kh1waicV47BqGRL3g,Save Your Tears,[The Weeknd]
4,7lPN2DXiMsVn7XUKtOW1CS,drivers license,[Olivia Rodrigo]
...,...,...,...
102939,5kqIPrATaCc2LqxVWzQGbk,7 Years,[Lukas Graham]
102940,3FCto7hnn1shUyZL42YgfO,Piano Man,[Billy Joel]
102941,3XVBdLihbNbxUwZosxcGuJ,If I Ain't Got You,[Alicia Keys]
102942,6QPKYGnAW9QozVz2dSWqRg,Someone Like You,[Adele]


Remove the duplicates

In [13]:
df_songs = df_songs.drop_duplicates(subset='song_id').reset_index(drop=True)
df_songs

Unnamed: 0,song_id,song_name,artists
0,5Kskr9LcNYa0tpt5f0ZEJx,Calling My Phone,"[Lil Tjay, 6LACK]"
1,5uEYRdEIh9Bo4fpjDd4Na9,Goosebumps - Remix,"[Travis Scott, HVME]"
2,1diS6nkxMQc3wwC4G1j0bh,We're Good,[Dua Lipa]
3,5QO79kh1waicV47BqGRL3g,Save Your Tears,[The Weeknd]
4,7lPN2DXiMsVn7XUKtOW1CS,drivers license,[Olivia Rodrigo]
...,...,...,...
81654,3Zuf70897YkrVRAsrBMMSF,"You're The One That I Want - From ""Grease"" Ori...","[John Travolta, Olivia Newton-John]"
81655,1MDoll6jK4rrk2BcFRP5i7,Hello,[Adele]
81656,0ygTmpa6uSotkBkTiwcMZ4,Warwick Avenue,[Duffy]
81657,7GJClzimvMSghjcrKxuf1M,Budapest,[George Ezra]


Let's get the features, 100 ids at a time

In [16]:
features_list = []
for i in tqdm(range(math.ceil(df_songs.shape[0]/100))):
    features_list.append(sp.audio_features(df_songs['song_id'].to_list()[i*100:(i+1)*100]))

  0%|          | 0/817 [00:00<?, ?it/s]

In [17]:
features = []
[[(features.append(j) if j != None else None) for j in i] for i in features_list]
pass

In [22]:
dic = defaultdict(list)
{dic[key].append(f[key]) for f in features for key in f}  
df_features = pd.DataFrame(dict(dic))

In [23]:
df_songs.shape

(81659, 3)

In [24]:
df_features.shape

(81573, 18)

In [25]:
df_all = df_songs.merge(df_features.rename({'id':'song_id'}, axis=1), on='song_id', how='right')
df_all.columns

Index(['song_id', 'song_name', 'artists', 'danceability', 'energy', 'key',
       'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness',
       'liveness', 'valence', 'tempo', 'type', 'uri', 'track_href',
       'analysis_url', 'duration_ms', 'time_signature'],
      dtype='object')

In [26]:
df_all = df_all.drop(['type', 'uri', 'track_href', 'analysis_url'], axis=1)

In [27]:
df_all.head()

Unnamed: 0,song_id,song_name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,5Kskr9LcNYa0tpt5f0ZEJx,Calling My Phone,"[Lil Tjay, 6LACK]",0.907,0.393,4,-7.636,0,0.0539,0.451,1e-06,0.135,0.202,104.949,205458,4
1,5uEYRdEIh9Bo4fpjDd4Na9,Goosebumps - Remix,"[Travis Scott, HVME]",0.841,0.593,1,-7.846,1,0.0379,0.418,0.0,0.124,0.808,124.917,162803,4
2,1diS6nkxMQc3wwC4G1j0bh,We're Good,[Dua Lipa],0.722,0.588,6,-5.932,1,0.0544,0.0319,0.0,0.183,0.59,134.01,165507,4
3,5QO79kh1waicV47BqGRL3g,Save Your Tears,[The Weeknd],0.68,0.826,0,-5.487,1,0.0309,0.0212,1.2e-05,0.543,0.644,118.051,215627,4
4,7lPN2DXiMsVn7XUKtOW1CS,drivers license,[Olivia Rodrigo],0.585,0.436,10,-8.761,1,0.0601,0.721,1.3e-05,0.105,0.132,143.874,242014,4


In [28]:
df_all.shape

(81573, 16)