# Lab | API wrappers - Create your collection of songs & audio features


#### Instructions 


To move forward with the project, you need to create a collection of songs with their audio features - as large as possible! 

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster.
The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import getpass
import pandas as pd

In [3]:
client_id = "31bb38d4d2c54b0e9b994db2a71040d5"
client_secret = getpass.getpass('Write client secret:')

In [4]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id,
                                                           client_secret=client_secret))

In [5]:
country_codes= ['AD', 'AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', \
    'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'EC', 'SV', 'EE', 'FI', 'FR', 'DE', \
    'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'ID', 'IE', 'IT', 'JP', 'LV', 'LI', \
    'LT', 'LU', 'MY', 'MT', 'MX', 'MC', 'NL', 'NZ', 'NI', 'NO', 'PA', 'PY', \
    'PE', 'PH', 'PL', 'PT', 'SG', 'ES', 'SK', 'SE', 'CH', 'TW', 'TR', 'GB', \
    'US', 'UY']

In [6]:
# Save all Spotify featured playlists worldwide
playlists = pd.DataFrame(columns=["playlist", "playlist_id"])

for country in country_codes:
    featured = sp.featured_playlists(country=country, limit=50)
    for playlist in featured['playlists']['items']:
        playlist_name = playlist["name"]
        playlist_id = playlist["id"]
        playlists = playlists.append({"playlist": playlist_name, \
            "playlist_id": playlist_id}, ignore_index=True)

playlists.drop_duplicates(inplace=True)

In [7]:
# Save all the songs from the playlists extracted earlier
song_database = pd.DataFrame(columns=["song_id", "song_name", "artist_name"])

for playlist_id in playlists.playlist_id:
    offset = 0
    songs_dict = sp.playlist_items(playlist_id, offset=offset, limit=100)

    # Loop through the playlist items one by one until the second to last song
    while songs_dict["next"] != None:
        song_name = songs_dict["items"][0]["track"]["name"]
        song_id = songs_dict["items"][0]["track"]["id"]
        song_artists = songs_dict["items"][0]["track"]["artists"][0]["name"]
        artists = songs_dict["items"][0]["track"]["artists"]

        # If there are more artists, add them to the string
        if len(artists) > 1:
            for artist in artists[1:]:
                song_artists += " and "
                song_artists += artist["name"]

        song_database = song_database.append({"song_name": song_name, \
        "song_id": song_id, "artist_name": song_artists}, ignore_index=True)
        offset += 1
        songs_dict = sp.playlist_items(playlist_id, offset=offset, limit=100)

# Remove songs present in multiple playlists
song_database.drop_duplicates(inplace=True)

In [8]:
song_database.head()

Unnamed: 0,song_id,song_name,artist_name
0,4kbj5MwxO1bq9wjT5g9HaA,Shut Up and Dance,WALK THE MOON
1,0B9x2BRHqj3Qer7biM3pU3,You're The One That I Want - From “Grease”,John Travolta and Olivia Newton-John
2,68y6OIiE1nDdI1MLQdJNh8,I Want It That Way,Backstreet Boys
3,5Q0Nhxo0l2bP3pNjpGJwV1,Party In The U.S.A.,Miley Cyrus
4,4cluDES4hQEUhmXj6TXkSo,What Makes You Beautiful,One Direction


In [9]:
song_database.shape[0] # We extracted 3935 unique songs

3935

In [12]:
# Save song database to csv for the clustering
song_database.to_csv("song_database.csv", index=False)