## Extracting Spotify Data

Getting started, we want to extract data for a set of tracks within one of Spotify's top-featured playlists. Leveraging the **Spotify Web API**, we can seamlessly obtain detailed data for a song, such as the performing artist, the album it belongs to, its release date, popularity, and audio features like danceability, energy, and tempo.

Python libraries like `spotipy` offer a user-friendly way to interact with the Spotify API, offering a range of functions that streamline tasks like API authentication, retrieving playlist data, and obtaining information about any given song.


### Accessing the Spotify Web API

To access data from Spotify, we import the `spotipy` library and the `SpotifyClientCredentials` module. Additionally, we utilize the `pandas` package for data manipulation and display. In order to authenticate our access to the Spotify API, we must provide our **client ID** and **client secret** to a client credentials manager. Once authenticated, we can use the spotipy module to interact with the Spotify API and retrieve data.


In [1]:
import spotipy  # <1>
from spotipy.oauth2 import SpotifyClientCredentials # <1>
import pandas as pd # <1>

client_id = "xxx" # <2>
client_secret = "xxx" # <2>
my_auth = SpotifyClientCredentials(client_id, client_secret) # <3>
sp = spotipy.Spotify(auth_manager=my_auth) # <4>

In [2]:
client_id = "3899576b5fcb4c458beba2cce99aa1b6"
client_secret = "4f57df7614df4f75b54364bf3c3018fd"
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id, client_secret))

#### Spotify's Featured Playlists

Let's take a look at the popular Spotify playlists. Below, the code retrieves a range of Spotify playlists and generates a dataframe containing details for each playlist, including its name, ID, description, thumbnail, total number of tracks, and follower count. The resulting dataframe is displayed as an HTML table.


In [3]:
# Get the follower count for the playlist
def get_playlist_follower_count(playlist_id):
    playlist = sp.playlist(playlist_id)
    return playlist["followers"]["total"]

In [4]:
def get_top_playlists(username: str, lim: int):
    all_playlists = sp.user_playlists(username)  # Get user playlists
    playlist_data = [
        {
            "thumbnail": item["images"][0]["url"],
            "name": item["name"],
            "id": item["id"],
            "description": item["description"],
            "tracks": item["tracks"]["total"],
            "followers": get_playlist_follower_count(item["id"]),
        }
        for item in all_playlists["items"]  # Iterate over each playlist item
    ]
    # Create DataFrame from list of dictionaries
    playlist_df = pd.DataFrame(playlist_data)
    return playlist_df.nlargest(lim, "followers")

In [5]:
username = "spotify"
spotify_playlists = sp.user_playlists(username)
top_playlists = get_top_playlists(username, 6)

The function `get_top_playlists` retrieves all playlists for a given user and returns the playlists with the most followers. Specifically, the function gets all playlists from a given user, iterating over each playlist item to extract the thumbnail image URL, the playlist name, ID, description, total number of tracks, and follower count for the playlist. It then creates a DataFrame from the playlist data, sorts the DataFrame by the number of followers in descending order, and returns the top playlists with the most followers.

In [6]:
from IPython.core.display import HTML

# Converting links to html tags
def path_to_image_html(path):
    return f'<img src="{path}" width="40" >'

# Rendering the dataframe as HTML table
HTML(top_playlists.to_html(escape=False, formatters=dict(thumbnail=path_to_image_html)))

Unnamed: 0,thumbnail,name,id,description,tracks,followers
0,,Today’s Top Hits,37i9dQZF1DXcBWIGoYBM5M,Olivia Rodrigo is on top of the Hottest 50!,50,34686033
1,,RapCaviar,37i9dQZF1DX0XUsuxWHRQd,New music from Future and Metro Boomin.,50,15846375
3,,Viva Latino,37i9dQZF1DX10zKzsJ2jva,"Today's top Latin hits, elevando nuestra música. Shakira tiene algo especial para ti. ❤️‍🔥",50,14838128
7,,Rock Classics,37i9dQZF1DWXRqgorJj26U,Rock legends & epic songs that continue to inspire generations. Cover: Nirvana,200,12095843
12,,All Out 2000s,37i9dQZF1DX4o1oenSJRJd,The biggest songs of the 2000s. Cover: Kelly Clarkson,150,11125180
14,,All Out 80s,37i9dQZF1DX4UtSsGT1Sbe,The biggest songs of the 1980s. Cover: Madonna,150,10805624


---

### Extracting Tracks From a Playlist

The following script enables the compilation of song and artist data from any Spotify playlist through its URI. To analyze a particular playlist, simply copy the URI from the Spotify Player interface and input it into the function defined below. The `get_playlist_tracks` method returns a complete list of track IDs and corresponding artists from the selected playlist.


In [7]:
def get_playlist_tracks(playlist_URI):
    tracks = []
    results = sp.playlist_tracks(playlist_URI)
    tracks = results["items"]
    while results["next"]:
        results = sp.next(results)
        tracks.extend(results["items"])
    return tracks

#### Extracting Features from Tracks

The following script utilizes Spotify's API to extract further details about each song within the playlist. It obtains metadata such as the track name, the artist it's sung by, the album it belongs to, the release date, and track features such as danceability, tempo, and popularity.


In [8]:
def playlist_features(id, artist_id, playlist_id):
    meta = sp.track(id)
    audio_features = sp.audio_features(id)
    artist_info = sp.artist(artist_id)
    playlist_info = sp.playlist(playlist_id)

    # print(audio_features)

    if audio_features[0] is None:
        return None
    
    

    name = meta['name']
    track_id = meta['id']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    artist_id = meta['album']['artists'][0]['id']
    release_date = meta['album']['release_date']
    length = meta['duration_ms']
    popularity = meta['popularity']

    artist_pop = artist_info["popularity"]
    artist_genres = artist_info["genres"]

    acousticness = audio_features[0]['acousticness']
    danceability = audio_features[0]['danceability']
    energy = audio_features[0]['energy']
    instrumentalness = audio_features[0]['instrumentalness']
    liveness = audio_features[0]['liveness']
    loudness = audio_features[0]['loudness']
    speechiness = audio_features[0]['speechiness']
    tempo = audio_features[0]['tempo']
    valence = audio_features[0]['valence']
    key = audio_features[0]['key']
    mode = audio_features[0]['mode']
    time_signature = audio_features[0]['time_signature']
    
    playlist_name = playlist_info['name']

    return [name, track_id, album, artist, artist_id, release_date, length, popularity, 
            artist_pop, artist_genres, acousticness, danceability, 
            energy, instrumentalness, liveness, loudness, speechiness, 
            tempo, valence, key, mode, time_signature, playlist_name]


Choose a specific playlist to analyze by copying the URL from the Spotify Player interface. Using that link, the `playlist_tracks` method retrieves a list of IDs and corresponding artists for each track from the playlist. Specifically, we analyze Spotify's *Today’s Top Hits* playlist.

In [9]:
playlist_links = [top_playlists['id'][0]]

for playlist_URI in playlist_links:
    # playlist_URI = link.split("/")[-1].split("?")[0]
    
    all_tracks = [  # Loop over track ids
    playlist_features(i["track"]["id"], i["track"]["artists"][0]["uri"], playlist_URI)
    for i in get_playlist_tracks(playlist_URI)
]

In [10]:
df_all_tracks = all_tracks
all_tracks = [i for i in df_all_tracks if i != None]


Putting it all together, the `get_playlist_tracks` function retrieves basic details for each song in a specified Spotify playlist using its URI. The `playlist_features` function then iterates through these tracks using their IDs to extract additional information, such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, and more. From there, we create a Pandas dataframe by passing in the extracted information.


In [11]:
# Create dataframe
df = pd.DataFrame(
    all_tracks, columns=['name', 'track_id', 'album', 'artist', 'artist_id','release_date',
                     'length', 'popularity', 'artist_pop', 'artist_genres',
                     'acousticness', 'danceability', 'energy',
                     'instrumentalness', 'liveness', 'loudness',
                     'speechiness', 'tempo', 'valence', 'key', 'mode',
                     'time_signature', 'playlist'])

df.tail(4)

Unnamed: 0,name,track_id,album,artist,artist_id,release_date,length,popularity,artist_pop,artist_genres,...,instrumentalness,liveness,loudness,speechiness,tempo,valence,key,mode,time_signature,playlist
46,Daylight,1odExI7RdWc4BT515LTAwj,Daylight,David Kushner,33NVpKoXjItPwUJTMZIOiY,2023-04-14,212953,91,77,"[gen z singer-songwriter, singer-songwriter pop]",...,0.000441,0.093,-9.475,0.0335,130.09,0.324,2,0,4,Today’s Top Hits
47,LA FALDA,7iUtQNMRB8ZkKC4AmEuCJC,LVEU: VIVE LA TUYA...NO LA MIA,Myke Towers,7iK8PXO48WeuP03g8YR51W,2023-09-18,174229,92,87,"[reggaeton, trap latino, urbano latino]",...,0.00877,0.19,-4.26,0.0691,103.008,0.267,7,1,4,Today’s Top Hits
48,Is It Over Now? (Taylor's Version) (From The V...,1Iq8oo9XkmmvCQiGOfORiz,1989 (Taylor's Version),Taylor Swift,06HL4z0CvFAxyc27GXpf02,2023-10-26,229477,89,100,[pop],...,0.0,0.127,-7.346,0.036,100.012,0.176,0,1,4,Today’s Top Hits
49,Standing Next to You,2KslE17cAJNHTsI2MI0jb2,GOLDEN,Jung Kook,6HaGTQPmzraVmaVxvz6EUc,2023-11-03,206019,92,86,[k-pop],...,0.0,0.339,-4.389,0.0955,106.017,0.816,2,0,4,Today’s Top Hits


In [12]:
df.to_csv("../assets/data/all_tracks.csv", index = False)

---
