# Lab | API wrappers - Create your collection of songs & audio features
### Instructions
To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

In [2]:
#Initialize SpotiPy with user credentias
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="af3a4e21d9974f798b0ddef081728f2b",
                                                           client_secret="99a65d20eff04d64bcf24b11824dffc4"))


# Get the playlist

In [3]:
playlist = sp.user_playlist_tracks("spotify", "5fo41o54DPTvdPO2uMTDH1")
# playlist#.keys() 

In [4]:
# Number of songs in the playlist
playlist["total"]

249

In [5]:
# We can only retrieve 100 songs of this playlist
len(playlist["items"]) 

100

- Since the default limitation when callint ['item'] is 100 
- But we want to get all 249 songs inside this playlist

# Get all tracks in the playlist
- Since the limit is 100
- We can use while loop to get more tracks 

[Ref.](https://stackoverflow.com/questions/39086287/spotipy-how-to-read-more-than-100-tracks-from-a-playlist)

---

#### Info that might help you navegate through the API

- results["tracks"]["limit"] #Limit we have chosen

- results["tracks"]["next"] #link to the next page (next 50 tracks)

- results["tracks"]["offset"] # Actual offset (starting point)

- results["tracks"]["previous"] #Previous search

- results["tracks"]["total"] # Number of matches

---

In [6]:
# While loop to get all tracks
def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

all_tracks = get_playlist_tracks("spotify", "5fo41o54DPTvdPO2uMTDH1")
# all_tracks

In [7]:
# Check the name of the first track
all_tracks[0]['track']['name']

'Remedy'

In [8]:
# Get all the song titles from the all_tracks list
best_2023_li = []
for i in range(0, len(all_tracks)):
    title = all_tracks[i]['track']['name']
    best_2023_li.append(title)

print(f" There are {len(best_2023_li)} songs in the list")

 There are 249 songs in the list


In [9]:
# Put title into a dataframe
song_title = pd.DataFrame(best_2023_li)
song_title.columns = ['title']
song_title.head()

Unnamed: 0,title
0,Remedy
1,Auf & Ab
2,Heat Waves
3,Acapulco
4,Pepas


# Get all artists in the playlist

In [10]:
# Check artist info in the first track
artist_info = all_tracks[0]['track']['artists']
artist_info

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/2NpPlwwDVYR5dIj0F31EcC'},
  'href': 'https://api.spotify.com/v1/artists/2NpPlwwDVYR5dIj0F31EcC',
  'id': '2NpPlwwDVYR5dIj0F31EcC',
  'name': 'Leony',
  'type': 'artist',
  'uri': 'spotify:artist:2NpPlwwDVYR5dIj0F31EcC'}]

In [11]:
# Check artist id in the first track
artist_info[0]['id']

'2NpPlwwDVYR5dIj0F31EcC'

In [12]:
# Check artist name in the first track
artist_info[0]['name']

'Leony'

In [13]:
# Loop through all_tracks (249 songs) to get all the artist_id and artist_name into the lists

artist_info = []
artist_ids = []
artist_names = []

for a in range(len(all_tracks)):
    artist_info.append(all_tracks[a]['track']['artists'])

for a_info in range(len(artist_info)):
        artist_ids.append(artist_info[a_info][0]['id'])
        artist_names.append(artist_info[a_info][0]['name'])

In [14]:
# Get artist_id, artist_name into a pandas dataframe
df_artist = pd.DataFrame(list(zip(artist_ids, artist_names)))
df_artist.columns = ['artist_id', 'artist_name']
df_artist.head()

Unnamed: 0,artist_id,artist_name
0,2NpPlwwDVYR5dIj0F31EcC,Leony
1,5ZY4M2aGiTaZQEP6HfqeJc,Montez
2,4yvcSjfu4PC0CYQyLy4wSq,Glass Animals
3,07YZf4WDAMNwqr4jfgOZ8y,Jason Derulo
4,329e4yvIujISKGKz1BZZbO,Farruko


# Get all the track_ids from the playlist

- We want to get audio features to build our prototype and we need the track_id to retrieve the features

In [15]:
# Check the track_id of the first track
all_tracks[0]['track']['id']

'5JVA0t7r2Y7m9NaHmgaeiC'

In [16]:
# Loop through to get all track_id and append in the track_ids list
track_ids = [all_tracks[j]['track']['id'] for j in range(len(all_tracks))]
print(f"Number of track_id in the list: {len(track_ids)}")

Number of track_id in the list: 249


# Get audio features of all songs in the playlist

- Since the function audio_features() can retrieve only 100 songs per call
- We need to loop --> I tried to write a function as we retrieved the song title above but I could not do it.
- So instead, I came up with the idea of breaking the track_id list into small chunks, then calling the function audio_features() multiple times (using for loop) --> see script below [Ref.](https://www.geeksforgeeks.org/break-list-chunks-size-n-python/)

In [17]:
# Break a list into chunks of size 50
start = 0
end = len(track_ids)
step = 50
chunks = []

for j in range(start, end, step):
    x = j 
    chunks.append(track_ids[x:x + step])

print(f"We now have {len(chunks)} chunks of track_ids")

We now have 5 chunks of track_ids


In [18]:
# Get audio features chunk by chunk and append each into the feature_chunks list
feature_chunks = []

for k in range(len(chunks)):
    feature = sp.audio_features(chunks[k][:len(chunks[k])])
    feature_chunks.append(feature)

print(f"We now have {len(feature_chunks)} chunks of the audio features, which is from a totle 249 songs")

We now have 5 chunks of the audio features, which is from a totle 249 songs


In [19]:
### Check feature in the first chunk --> Note: I will not print it out here because it's hard to read (also too long)
# feature_chunks[0]

print(f"We have retrieved {len(feature_chunks[0])} audio features in the first chunk")
print(f"and {len(feature_chunks[4])} features for the last chunk")

We have retrieved 50 audio features in the first chunk
and 49 features for the last chunk


# Get all the audio features & song title into a dataframe

In [20]:
# Put all the features from those chunks into one list

audio_features = []

for digit in range(len(chunks)):
    audio_features += feature_chunks[digit]

len(audio_features)

249

In [21]:
# Create a dataframe from the audio_features list
df = pd.DataFrame(audio_features)
# df.head()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 249 entries, 0 to 248
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      249 non-null    float64
 1   energy            249 non-null    float64
 2   key               249 non-null    int64  
 3   loudness          249 non-null    float64
 4   mode              249 non-null    int64  
 5   speechiness       249 non-null    float64
 6   acousticness      249 non-null    float64
 7   instrumentalness  249 non-null    float64
 8   liveness          249 non-null    float64
 9   valence           249 non-null    float64
 10  tempo             249 non-null    float64
 11  type              249 non-null    object 
 12  id                249 non-null    object 
 13  uri               249 non-null    object 
 14  track_href        249 non-null    object 
 15  analysis_url      249 non-null    object 
 16  duration_ms       249 non-null    int64  
 1

- Since we have only id of the song in the dataframe
- I want to include the title of the song as well as artist_is and artist_name

In [22]:
# Concat all together
df = pd.concat([df, song_title, df_artist], axis=1)

# Make sure 'id' is the song_id because we also have artist_id in the dataframe
df.rename(columns={'id': 'song_id'})
df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,type,id,uri,track_href,analysis_url,duration_ms,time_signature,title,artist_id,artist_name
0,0.55,0.75,0,-3.289,1,0.146,0.0561,0.0,0.129,0.381,...,audio_features,5JVA0t7r2Y7m9NaHmgaeiC,spotify:track:5JVA0t7r2Y7m9NaHmgaeiC,https://api.spotify.com/v1/tracks/5JVA0t7r2Y7m...,https://api.spotify.com/v1/audio-analysis/5JVA...,147410,4,Remedy,2NpPlwwDVYR5dIj0F31EcC,Leony
1,0.714,0.425,1,-8.064,1,0.0809,0.403,0.000105,0.115,0.316,...,audio_features,0RSZ8EmUPEN3ySfCgytPke,spotify:track:0RSZ8EmUPEN3ySfCgytPke,https://api.spotify.com/v1/tracks/0RSZ8EmUPEN3...,https://api.spotify.com/v1/audio-analysis/0RSZ...,165477,4,Auf & Ab,5ZY4M2aGiTaZQEP6HfqeJc,Montez
2,0.761,0.525,11,-6.9,1,0.0944,0.44,7e-06,0.0921,0.531,...,audio_features,6CDzDgIUqeDY5g8ujExx2f,spotify:track:6CDzDgIUqeDY5g8ujExx2f,https://api.spotify.com/v1/tracks/6CDzDgIUqeDY...,https://api.spotify.com/v1/audio-analysis/6CDz...,238805,4,Heat Waves,4yvcSjfu4PC0CYQyLy4wSq,Glass Animals
3,0.774,0.792,10,-4.021,1,0.0523,0.051,0.0,0.155,0.507,...,audio_features,3eJH2nAjvNXdmPfBkALiPZ,spotify:track:3eJH2nAjvNXdmPfBkALiPZ,https://api.spotify.com/v1/tracks/3eJH2nAjvNXd...,https://api.spotify.com/v1/audio-analysis/3eJH...,139672,4,Acapulco,07YZf4WDAMNwqr4jfgOZ8y,Jason Derulo
4,0.762,0.766,7,-3.955,1,0.0343,0.00776,7e-05,0.128,0.442,...,audio_features,5fwSHlTEWpluwOM0Sxnh5k,spotify:track:5fwSHlTEWpluwOM0Sxnh5k,https://api.spotify.com/v1/tracks/5fwSHlTEWplu...,https://api.spotify.com/v1/audio-analysis/5fwS...,287120,4,Pepas,329e4yvIujISKGKz1BZZbO,Farruko


In [23]:
df.to_csv('audio_features_249.csv', index=False)

# Result
- The playlist contains 249 songs 
- We retrieved the audio features & title of every song in the playlist
- Finally, we have a dataframe and we can save it to a .csv for further uses