Objective:

The main goal of this notebook is to interface with the Spotify API, retrieve our saved tracks (includes album info), top tracks, and top artists and preprocess the data for subsequent analysis. This initial step is crucial to ensure data quality and set the foundation for our data visualisation and modeling stages.

In [3]:
!pip install spotipy pandas

import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd




Now that our libraries are in place, we'll proceed to authenticate with the Spotify API using the credentials from our developer dashboard and ther spotipy library to make things easier (doing this slightly more manually as I wanted to be explicit with my work). Remember to always keep your client_id and client_secret confidential.

## Note: For the below two cells the outputs have been removed so as not to leak sensitive information. You must include your own Client_ID,  Client_secret, and redirect URI for OAuth2 for the Spotify API after creating your Spotify developer account (and specifying your redirect URI) if you wish to do this yourself.

# Running the code in the below two cells as is will result in an error. 

# Some info: 
- The output of the 1st cell will provide an auth_url link which you must use (only once and you don't need to run it again) to authorize use of your/a user's account.

- The output of the 2nd cell will print your access and refresh tokens for your reference, but the code itself will store them and utilise them appropritately to make API calls via the Python Spotipy library. 

## Alternatively, if you're here just to browse my work, the other cells will show you the resulting data pulled from my Spotify account (the final dataframes after cleaning are saved as CSV files at the end of this notebook and are also provided in the Github repo). 

In [None]:
auth_manager = SpotifyOAuth(client_id="YOUR_CLIENT_ID_HERE",
                            client_secret="YOUR_CLIENT_SECRET_HERE",
                            redirect_uri="YOUR_REDIRECT_URI_HERE",
                            scope="user-library-read user-top-read")

auth_url = auth_manager.get_authorize_url()
print(auth_url)

In [None]:
import urllib.parse

# Replace this with the auth URL you generated in the preceeding cell
redirected_url = "YOUR_AUTH_URL_HERE"

# Extract the code from the URL
parsed_url = urllib.parse.urlparse(redirected_url)
code = urllib.parse.parse_qs(parsed_url.query)["code"][0]

# Get the access token using the code
token_info = auth_manager.get_access_token(code)
sp = spotipy.Spotify(auth=token_info['access_token'])
print(token_info)

With our Spotify API connection established for a given authenticated user, let's now fetch saved tracks. 

From now on, the data you see will be from my Spotify account.

This will give us a first glance at the songs that the user manually chooses to save on Spotify.  We'll then convert this data into a Pandas DataFrame for easy manipulation and cleaning.

In [120]:
from IPython.display import display
import pandas as pd

# Function to fetch a user's liked/saved tracks from Spotify.
def fetch_saved_tracks(sp):
    offset = 0  # Offset for paginated results.
    tracks = []  # List to hold all the fetched tracks.
    
    # Continuously fetch tracks in batches of 50 until all tracks are fetched.
    while True:
        # Fetching the next set of tracks.
        response = sp.current_user_saved_tracks(limit=50, offset=offset)
        
        # Extending our list with the fetched tracks.
        tracks.extend(response['items'])
        
        # If less than 50 tracks are returned, we've fetched all the tracks.
        if len(response['items']) < 50:
            break
            
        # Increase the offset by 50 to fetch the next set of tracks.
        offset += 50
        
    return tracks

# Calling the function to fetch saved tracks.
saved_tracks = fetch_saved_tracks(sp)

# Constructing a DataFrame from the fetched tracks.
df_tracks = pd.DataFrame([(track['track']['id'],
                           track['track']['name'],
                           track['track']['artists'][0]['name'],
                           track['track']['album']['name'],
                           track['track']['album']['release_date'],
                           track['track']['duration_ms'],
                           track['added_at'])
                          for track in saved_tracks],
                         columns=['id', 'name', 'artist', 'album', 'release_date', 'duration_ms', 'added_at'])

# Making a copy of the DataFrame for further use.
df_tracks_copy = df_tracks.copy()

# Displaying the first few rows of the DataFrame.
display(df_tracks.head())
# Displaying info about the DataFrame.
df_tracks.info()

Unnamed: 0,id,name,artist,album,release_date,duration_ms,added_at
0,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Maya Diab,Khserna Baad,2019-06-03,187561,2023-08-03T19:11:51Z
1,0WVTQp3SOCuMr08jh1jweV,Bring It On Home to Me,Sam Cooke,The Man Who Invented Soul,2000-09-26,162533,2023-08-01T16:48:21Z
2,1XPj5quoeV5Gd0paSUDpvm,You Send Me - Mono,Sam Cooke,Sam Cooke,1958-02-01,165733,2023-07-23T14:55:49Z
3,2G0GextMwZJLkNxcSZ7ZJ3,(What A) Wonderful World - Mono,Sam Cooke,The Wonderful World Of Sam Cooke,1960-02,128786,2023-07-20T01:18:55Z
4,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,Bon Iver,"For Emma, Forever Ago",2008-05-12,238532,2023-07-20T00:55:48Z


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 791 entries, 0 to 790
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            791 non-null    object
 1   name          791 non-null    object
 2   artist        791 non-null    object
 3   album         791 non-null    object
 4   release_date  791 non-null    object
 5   duration_ms   791 non-null    int64 
 6   added_at      791 non-null    object
dtypes: int64(1), object(6)
memory usage: 43.4+ KB


Now, well utilise the info obtained from Spotify's search artist endpoint to provide the genre for all our saved tracks and merge it into our saved tracks dataframe for genre-specific analysis/visualisations

In [121]:
#add in genre info for each saved track
# Get a list of all unique artists in your saved tracks
unique_artists_saved = df_tracks['artist'].unique()

# Initialize a list to store individual artist dataframes
artist_frames = []

# Loop through each unique artist
for artist in unique_artists:

    # Use the Spotify search API to get details about the artist
    # 'q=' specifies the search query, in this case the artist's name
    # 'type=' specifies the type of search, in this case we're searching for an artist
    results = sp.search(q='artist:' + artist, type='artist')

    # Check if any artist details were returned
    if results['artists']['items']:

        # Get the first (and most relevant) artist's details from the returned items
        artist_info = results['artists']['items'][0]

        # Create a dataframe from the artist's details and add it to our list
        # We're only interested in the artist's name and their genres in this case
        artist_frames.append(pd.DataFrame([{
            'name': artist_info['name'],
            'genres': artist_info['genres'],
        }]))

# Combine all the individual artist dataframes into one big dataframe
df_artists = pd.concat(artist_frames, ignore_index=True)

# Merge the saved tracks dataframe with the artist details dataframe
# 'left_on=' specifies the column in the left dataframe to merge on, in this case the artist's name
# 'right_on=' specifies the column in the right dataframe to merge on, in this case also the artist's name
# 'how=' specifies the type of merge, in this case a left merge which keeps all rows from the left dataframe and only matching rows from the right dataframe
df_saved_tracks_genre = df_tracks.merge(df_artists, left_on='artist', right_on='name', how='left')
#save a copy to refer to in case issues arise when cleaning the data e.g. dropping columns/rows
df_saved_tracks_genre_copy=df_saved_tracks_genre.copy()
display(df_saved_tracks_genre.head())

Unnamed: 0,id,name_x,artist,album,release_date,duration_ms,added_at,name_y,genres
0,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Maya Diab,Khserna Baad,2019-06-03,187561,2023-08-03T19:11:51Z,Maya Diab,"[arab pop, lebanese pop]"
1,0WVTQp3SOCuMr08jh1jweV,Bring It On Home to Me,Sam Cooke,The Man Who Invented Soul,2000-09-26,162533,2023-08-01T16:48:21Z,Sam Cooke,"[classic soul, soul, vocal jazz]"
2,1XPj5quoeV5Gd0paSUDpvm,You Send Me - Mono,Sam Cooke,Sam Cooke,1958-02-01,165733,2023-07-23T14:55:49Z,Sam Cooke,"[classic soul, soul, vocal jazz]"
3,2G0GextMwZJLkNxcSZ7ZJ3,(What A) Wonderful World - Mono,Sam Cooke,The Wonderful World Of Sam Cooke,1960-02,128786,2023-07-20T01:18:55Z,Sam Cooke,"[classic soul, soul, vocal jazz]"
4,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,Bon Iver,"For Emma, Forever Ago",2008-05-12,238532,2023-07-20T00:55:48Z,Bon Iver,"[chamber pop, eau claire indie, indie folk, me..."


Now, lets further enrich our saved tracks by utilising Spotify's audio features endpoint to obtain detailed audio feature information for each saved track

In [158]:
#define our fetch_audio_features function and obtain detailed audio features for all saved tracks in batches of 50 (to not violate the API request limit)
def fetch_audio_features(sp, track_ids):
    audio_features = []
    for i in range(0, len(track_ids), 50):
        audio_features += sp.audio_features(track_ids[i:i+50])
    return audio_features
# Fetch the audio features for all tracks
audio_features = fetch_audio_features(sp, df_saved_tracks_genre['id'].tolist())
# Create a dataframe from the audio features
df_audio_features = pd.DataFrame.from_records(audio_features)
# Merge the saved tracks+genre dataframe with the audio features dataframe for full info for my saved tracks
df_saved_tracks_with_audio_features = pd.merge(df_saved_tracks_genre, df_audio_features, how='inner', left_on='id', right_on='id')
df_saved_tracks_with_audio_features_copy=df_saved_tracks_with_audio_features.copy()
display(df_saved_tracks_with_audio_features.head())

Unnamed: 0,id,name_x,artist,album,release_date,duration_ms_x,added_at,genres,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms_y,time_signature
0,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Maya Diab,Khserna Baad,2019-06-03,187561,2023-08-03 19:11:51+00:00,"[arab pop, lebanese pop]",0.689,0.605,7,-7.917,0,0.0776,0.127,0.0,0.0827,0.146,87.017,audio_features,spotify:track:2UOopL3Y405ruJyMzJcdWD,https://api.spotify.com/v1/tracks/2UOopL3Y405r...,https://api.spotify.com/v1/audio-analysis/2UOo...,187562,4
1,0WVTQp3SOCuMr08jh1jweV,Bring It On Home to Me,Sam Cooke,The Man Who Invented Soul,2000-09-26,162533,2023-08-01 16:48:21+00:00,"[classic soul, soul, vocal jazz]",0.523,0.402,0,-8.233,1,0.0305,0.778,0.0,0.432,0.675,70.863,audio_features,spotify:track:0WVTQp3SOCuMr08jh1jweV,https://api.spotify.com/v1/tracks/0WVTQp3SOCuM...,https://api.spotify.com/v1/audio-analysis/0WVT...,162533,4
2,1XPj5quoeV5Gd0paSUDpvm,You Send Me - Mono,Sam Cooke,Sam Cooke,1958-02-01,165733,2023-07-23 14:55:49+00:00,"[classic soul, soul, vocal jazz]",0.572,0.365,4,-7.583,0,0.0276,0.88,0.0,0.125,0.41,96.022,audio_features,spotify:track:1XPj5quoeV5Gd0paSUDpvm,https://api.spotify.com/v1/tracks/1XPj5quoeV5G...,https://api.spotify.com/v1/audio-analysis/1XPj...,165733,4
3,2G0GextMwZJLkNxcSZ7ZJ3,(What A) Wonderful World - Mono,Sam Cooke,The Wonderful World Of Sam Cooke,1960-02-01,128786,2023-07-20 01:18:55+00:00,"[classic soul, soul, vocal jazz]",0.686,0.672,11,-5.523,1,0.0323,0.7,0.0,0.135,0.857,128.55,audio_features,spotify:track:2G0GextMwZJLkNxcSZ7ZJ3,https://api.spotify.com/v1/tracks/2G0GextMwZJL...,https://api.spotify.com/v1/audio-analysis/2G0G...,128787,4
4,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,Bon Iver,"For Emma, Forever Ago",2008-05-12,238532,2023-07-20 00:55:48+00:00,"[chamber pop, eau claire indie, indie folk, me...",0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4


Now, we'll retrieve a user's top artists and tracks via the top_artists and top_tracks endpoints over short, medium, and long term time ranges (ranges defined on the API documentation). This data can be insightful to understand personal music preferences over different timeframes and curate playtlists for different moods/activities effectively.

In [133]:
def fetch_top_artists_tracks(sp, type='tracks', time_range='short_term', limit=50):
    """
    Fetch top artists or tracks for a user.
    
    Parameters:
        sp : spotipy client
        type : 'artists' or 'tracks'
        time_range : 'short_term', 'medium_term', or 'long_term'
        limit : Number of results to fetch (maximum 50)
        
    Returns:
        List of results
    """
    if type == 'tracks':
        return sp.current_user_top_tracks(time_range=time_range, limit=limit)['items']
    elif type == 'artists':
        return sp.current_user_top_artists(time_range=time_range, limit=limit)['items']

# Fetching top artists and tracks for all time ranges
time_ranges = ['short_term', 'medium_term', 'long_term']
top_artists = {tr: fetch_top_artists_tracks(sp, 'artists', tr) for tr in time_ranges}
top_tracks = {tr: fetch_top_artists_tracks(sp, 'tracks', tr) for tr in time_ranges}

Top Artists DataFrame (df_top_artists):

The df_top_artists DataFrame contains data about your top artists from Spotify, categorized by timeframes (short term, medium term, and long term). The columns in this DataFrame are:

id: The unique identifier for the artist on Spotify.
name: The name of the artist.
genres: Genres associated with the artist, concatenated as a comma-separated string.
followers_count: The number of followers the artist has on Spotify.
popularity: A measure of the popularity of the artist on Spotify, ranging from 0 to 100.
time_range: The time frame for which the artist is considered "top" (can be "short_term", "medium_term", or "long_term").
Differences from the Saved Tracks DataFrame:

This DataFrame focuses on artists rather than individual tracks.
It contains metrics like genres, followers count, and popularity specific to artists.
The artists in this DataFrame are based on your listening patterns and are ranked as top artists for the specified time frame, whereas the Saved Tracks DataFrame contains tracks you've explicitly saved.

In [134]:
# Convert top artists data to DataFrame
artist_frames = []

for tr in time_ranges:
    df_artist = pd.DataFrame([(artist['id'],
                               artist['name'],
                               ', '.join(artist['genres']),
                               artist['followers']['total'],
                               artist['popularity'])
                              for artist in top_artists[tr]],
                             columns=['id', 'name', 'genres', 'followers_count', 'popularity'])
    df_artist['time_range'] = tr  # adding a column to specify the time range
    artist_frames.append(df_artist)

df_top_artists = pd.concat(artist_frames, ignore_index=True)
#create a copy of the original version we can refer to just in case after data cleaning/manipulation
df_top_artists_copy = df_top_artists.copy()
display(df_top_artists.head())

Unnamed: 0,id,name,genres,followers_count,popularity,time_range
0,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",0,43,short_term
1,5K4W6rqBFWDnAN6FQUkS6x,Kanye West,"chicago rap, hip hop, rap",0,88,short_term
2,2h93pZq0e7k5yf4dywlkpM,Frank Ocean,"lgbtq+ hip hop, neo soul",0,81,short_term
3,09A6IffSw0t8L8sfuOCVws,Wael Kfoury,"arab pop, belly dance, lebanese pop",0,52,short_term
4,5DPb3SKW8QZFwkRlmt7Gvo,Joseph Attieh,"arab pop, dabke, lebanese pop",0,40,short_term


Top Tracks DataFrame (df_top_tracks):

The df_top_tracks DataFrame provides information on your top tracks on Spotify, split by the same timeframes. The columns are:

id: The unique identifier for the track on Spotify.
name: The title of the track.
album: The album to which the track belongs.
artist: The primary artist of the track.
duration_ms: The duration of the track in milliseconds.
popularity: A measure of the popularity of the track on Spotify, ranging from 0 to 100.
time_range: The timeframe for which the track is deemed "top" (can be "short_term", "medium_term", or "long_term").
Differences from the Saved Tracks DataFrame:

While both DataFrames contain track-related information, the Top Tracks DataFrame specifically ranks tracks based on your listening habits within a certain timeframe.
The Saved Tracks DataFrame contains tracks you've chosen to save, regardless of how often you've listened to them.
The Saved Tracks DataFrame has an "added_at" column indicating when the track was saved, which isn't present in the Top Tracks DataFrame.

In [135]:
# Convert top tracks data to DataFrame
track_frames = []
for tr in time_ranges:
    df_track = pd.DataFrame([(track['id'],
                              track['name'],
                              track['album']['name'],
                              track['artists'][0]['name'],  # taking only the primary artist for simplicity
                              track['duration_ms'],
                              track['popularity'])
                             for track in top_tracks[tr]],
                            columns=['id', 'name', 'album', 'artist', 'duration_ms', 'popularity'])
    df_track['time_range'] = tr  # adding a column to specify the time range
    track_frames.append(df_track)
    
df_top_tracks = pd.concat(track_frames, ignore_index=True)
#create a copy of the original version we can refer to just in case after data cleaning/manipulation
df_top_tracks_copy = df_top_tracks.copy()
display(df_top_tracks.head())

Unnamed: 0,id,name,album,artist,duration_ms,popularity,time_range
0,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term
1,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term
2,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Goodie Bag,Still Woozy,146390,78,short_term
3,694vvR5o19xHPhhJ5QdLN7,NO HALO,GINGER,BROCKHAMPTON,259746,59,short_term
4,093CMFUwvPyFIsjBsVfBPO,El Hob El Kebir,El Hob El Kebir,Ragheb Alama,315640,50,short_term


Let us now incorporate genre information for our top tracks in a similar fashion to how we incorporated it into our saved tracks

In [136]:
#we now add genre column to top tracks as we did for saved tracks

# Get a list of all unique artists in your top tracks
unique_artists = df_top_tracks['artist'].unique()

# Initialize a list to store individual artist dataframes
artist_frames = []

# Loop through each unique artist
for artist in unique_artists:

    # Use the Spotify search API to get details about the artist
    results = sp.search(q='artist:' + artist, type='artist')

    # Check if any artist details were returned
    if results['artists']['items']:

        # Get the first (and most relevant) artist's details from the returned items
        artist_info = results['artists']['items'][0]

        # Create a dataframe from the artist's details and add it to our list
        artist_frames.append(pd.DataFrame([{
            'name': artist_info['name'],
            'genres': artist_info['genres'],
        }]))

# Combine all the individual artist dataframes into one big dataframe
df_artists = pd.concat(artist_frames, ignore_index=True)

# Merge the top tracks dataframe with the artist details dataframe
df_top_tracks_genre = df_top_tracks.merge(df_artists, left_on='artist', right_on='name', how='left')
df_top_tracks_genre_copy = df_top_tracks_genre.copy()
display(df_top_tracks_genre.head())

Unnamed: 0,id,name_x,album,artist,duration_ms,popularity,time_range,name_y,genres
0,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term,Bon Iver,"[chamber pop, eau claire indie, indie folk, me..."
1,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term,Kanye West,"[chicago rap, hip hop, rap]"
2,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Goodie Bag,Still Woozy,146390,78,short_term,Still Woozy,"[bedroom pop, oakland indie, pov: indie]"
3,694vvR5o19xHPhhJ5QdLN7,NO HALO,GINGER,BROCKHAMPTON,259746,59,short_term,BROCKHAMPTON,"[boy band, rap]"
4,093CMFUwvPyFIsjBsVfBPO,El Hob El Kebir,El Hob El Kebir,Ragheb Alama,315640,50,short_term,Ragheb Alama,"[classic arab pop, lebanese pop]"


We now access rich track details for each track in top tracks. For each track ID from your top tracks, we can access a detailed set of attributes using the audio_features endpoint in Spotipy. These attributes, such as danceability, energy, and tempo, provide insights into the nature and mood of each track. This is identical to how we included it for our saved tracks list as well.

In [137]:
# Getting Audio Features for Top Tracks
track_ids = df_top_tracks_genre['id'].tolist()
audio_features_list = []

# Batching requests for efficiency
for i in range(0, len(track_ids), 50):
    batch = track_ids[i:i+50]
    audio_features = sp.audio_features(batch)
    audio_features_list.extend(audio_features)

df_audio_features = pd.DataFrame(audio_features_list)
#create a copy of the original version we can refer to just in case after data cleaning/manipulation
df_audio_features_copy = df_audio_features.copy()
display(df_audio_features.head())

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,1HkOPLwAJH3kE8UnqgxF4s,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4
1,0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,5DBmXF7QO43Cuy9yqva116,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,278893,4
2,0.739,0.522,7,-12.304,0,0.151,0.821,0.00124,0.233,0.619,138.263,audio_features,4vHNeBWDQpVCmGbaccrRzi,spotify:track:4vHNeBWDQpVCmGbaccrRzi,https://api.spotify.com/v1/tracks/4vHNeBWDQpVC...,https://api.spotify.com/v1/audio-analysis/4vHN...,146390,4
3,0.643,0.573,4,-7.034,1,0.0709,0.41,0.00017,0.0695,0.47,165.192,audio_features,694vvR5o19xHPhhJ5QdLN7,spotify:track:694vvR5o19xHPhhJ5QdLN7,https://api.spotify.com/v1/tracks/694vvR5o19xH...,https://api.spotify.com/v1/audio-analysis/694v...,259747,4
4,0.621,0.965,7,-3.887,0,0.0431,0.00248,0.000169,0.119,0.809,144.971,audio_features,093CMFUwvPyFIsjBsVfBPO,spotify:track:093CMFUwvPyFIsjBsVfBPO,https://api.spotify.com/v1/tracks/093CMFUwvPyF...,https://api.spotify.com/v1/audio-analysis/093C...,315640,4


Artist Additional Details:

For every artist ID from your top artists, we can retrieve more extensive information using the artist endpoint in Spotipy. This data provides a deeper understanding of each artist, including genres they're associated with, their popularity, and more.

In [138]:
# Getting Additional Details for Top Artists
artist_ids = df_top_artists['id'].tolist()
artist_details_list = []

# Batching requests for efficiency
for i in range(0, len(artist_ids), 50):
    batch = artist_ids[i:i+50]
    artist_details = sp.artists(batch)
    artist_details_list.extend(artist_details['artists'])
df_artist_details = pd.DataFrame(artist_details_list)
#create a copy of the original version we can refer to just in case after data cleaning/manipulation
df_artist_details_copy = df_artist_details.copy()
display(df_artist_details.head())

Unnamed: 0,external_urls,followers,genres,href,id,images,name,popularity,type,uri
0,{'spotify': 'https://open.spotify.com/artist/0...,"{'href': None, 'total': 274200}","[arab pop, lebanese pop]",https://api.spotify.com/v1/artists/04N4sGkSTSx...,04N4sGkSTSxjVfbiItLvTj,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Ziad Bourji,43,artist,spotify:artist:04N4sGkSTSxjVfbiItLvTj
1,{'spotify': 'https://open.spotify.com/artist/5...,"{'href': None, 'total': 20790700}","[chicago rap, hip hop, rap]",https://api.spotify.com/v1/artists/5K4W6rqBFWD...,5K4W6rqBFWDnAN6FQUkS6x,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Kanye West,88,artist,spotify:artist:5K4W6rqBFWDnAN6FQUkS6x
2,{'spotify': 'https://open.spotify.com/artist/2...,"{'href': None, 'total': 11541738}","[lgbtq+ hip hop, neo soul]",https://api.spotify.com/v1/artists/2h93pZq0e7k...,2h93pZq0e7k5yf4dywlkpM,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Frank Ocean,81,artist,spotify:artist:2h93pZq0e7k5yf4dywlkpM
3,{'spotify': 'https://open.spotify.com/artist/0...,"{'href': None, 'total': 1518868}","[arab pop, belly dance, lebanese pop]",https://api.spotify.com/v1/artists/09A6IffSw0t...,09A6IffSw0t8L8sfuOCVws,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Wael Kfoury,52,artist,spotify:artist:09A6IffSw0t8L8sfuOCVws
4,{'spotify': 'https://open.spotify.com/artist/5...,"{'href': None, 'total': 301593}","[arab pop, dabke, lebanese pop]",https://api.spotify.com/v1/artists/5DPb3SKW8QZ...,5DPb3SKW8QZFwkRlmt7Gvo,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Joseph Attieh,40,artist,spotify:artist:5DPb3SKW8QZFwkRlmt7Gvo


We now merge our original top tracks and top artists dataframes with the additonal track details and additional artist details (relevant columns only) dataframes to provide a compehrensive summary of track and artist info in a single dataframe for each.

In [139]:
# Merge on track IDs
df_top_tracks_detailed = df_top_tracks_genre.merge(df_audio_features, on='id', how='left')
#keep an original of the merged top_Tracks as we will make changes to the original after this point
df_top_tracks_detailed_copy = df_top_tracks_detailed.copy()
# Extract relevant columns from artist details
df_artist_details_subset = df_artist_details[['id', 'popularity', 'followers']]

# Merge on artist IDs
df_top_artists_detailed = df_top_artists.merge(df_artist_details_subset, on='id', how='left')
#keep an original of the merged top artists as we will make changes to the original after this point
df_top_artists_detailed_copy = df_top_artists_detailed.copy()

display(df_top_tracks_detailed.head())
display(df_top_artists_detailed.head())

Unnamed: 0,id,name_x,album,artist,duration_ms_x,popularity,time_range,name_y,genres,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms_y,time_signature
0,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term,Bon Iver,"[chamber pop, eau claire indie, indie folk, me...",0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4
1,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term,Bon Iver,"[chamber pop, eau claire indie, indie folk, me...",0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4
2,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term,Kanye West,"[chicago rap, hip hop, rap]",0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,278893,4
3,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term,Kanye West,"[chicago rap, hip hop, rap]",0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,278893,4
4,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Goodie Bag,Still Woozy,146390,78,short_term,Still Woozy,"[bedroom pop, oakland indie, pov: indie]",0.739,0.522,7,-12.304,0,0.151,0.821,0.00124,0.233,0.619,138.263,audio_features,spotify:track:4vHNeBWDQpVCmGbaccrRzi,https://api.spotify.com/v1/tracks/4vHNeBWDQpVC...,https://api.spotify.com/v1/audio-analysis/4vHN...,146390,4


Unnamed: 0,id,name,genres,followers_count,popularity_x,time_range,popularity_y,followers
0,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",0,43,short_term,43,"{'href': None, 'total': 274200}"
1,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",0,43,short_term,43,"{'href': None, 'total': 274200}"
2,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",0,43,short_term,43,"{'href': None, 'total': 274200}"
3,5K4W6rqBFWDnAN6FQUkS6x,Kanye West,"chicago rap, hip hop, rap",0,88,short_term,88,"{'href': None, 'total': 20790700}"
4,5K4W6rqBFWDnAN6FQUkS6x,Kanye West,"chicago rap, hip hop, rap",0,88,short_term,88,"{'href': None, 'total': 20790700}"


We now begin our data cleaning, each cell will provide an overview  (statistical distribution, null count/column, datframe size, datatypes) for each dataframe in question (saved tracks, top tracks, top artists). Utilising df.head(), df.info(), df.describe() allows us to get a good overview of our data for cleaning and any other things we should note for the visualisations performed in the next notebook file.

In [160]:
# Convert 'release_date' to datetime
df_saved_tracks_with_audio_features['release_date'] = pd.to_datetime(df_saved_tracks_with_audio_features['release_date'])
# Convert 'added_at' to datetime
df_saved_tracks_with_audio_features['added_at'] = pd.to_datetime(df_saved_tracks_with_audio_features['added_at'])
#convert genre to string for duplicate_drop
df_saved_tracks_with_audio_features['genres'] = df_saved_tracks_with_audio_features['genres'].astype(str)
#drop duplicate column occurring after adding genre for saved tracks
if 'name_y' in df_saved_tracks_genre:
    df_saved_tracks_with_audio_features = df_saved_tracks_with_audio_features.drop(columns='name_y')
#drop duplicate column occurring after merging audio features with saved tracks
if "duration_ms_y" in df_saved_tracks_with_audio_features:
    df_saved_tracks_with_audio_features=df_saved_tracks_with_audio_features.drop(columns='duration_ms_y')
#drop duplicate rows
df_saved_tracks_with_audio_features.drop_duplicates(inplace=True)
#reset index to maintain index order
df_saved_tracks_with_audio_features = df_saved_tracks_with_audio_features.reset_index(drop=True)
#overview of saved tracks dataframe after cleaning/dropping duplicates
display(df_saved_tracks_with_audio_features.head(10))
display(df_saved_tracks_with_audio_features.info())
display(df_saved_tracks_with_audio_features.isnull().sum())
display(df_saved_tracks_with_audio_features.describe())

Unnamed: 0,id,name_x,artist,album,release_date,duration_ms_x,added_at,genres,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,time_signature
0,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Maya Diab,Khserna Baad,2019-06-03,187561,2023-08-03 19:11:51+00:00,"['arab pop', 'lebanese pop']",0.689,0.605,7,-7.917,0,0.0776,0.127,0.0,0.0827,0.146,87.017,audio_features,spotify:track:2UOopL3Y405ruJyMzJcdWD,https://api.spotify.com/v1/tracks/2UOopL3Y405r...,https://api.spotify.com/v1/audio-analysis/2UOo...,4
1,0WVTQp3SOCuMr08jh1jweV,Bring It On Home to Me,Sam Cooke,The Man Who Invented Soul,2000-09-26,162533,2023-08-01 16:48:21+00:00,"['classic soul', 'soul', 'vocal jazz']",0.523,0.402,0,-8.233,1,0.0305,0.778,0.0,0.432,0.675,70.863,audio_features,spotify:track:0WVTQp3SOCuMr08jh1jweV,https://api.spotify.com/v1/tracks/0WVTQp3SOCuM...,https://api.spotify.com/v1/audio-analysis/0WVT...,4
2,1XPj5quoeV5Gd0paSUDpvm,You Send Me - Mono,Sam Cooke,Sam Cooke,1958-02-01,165733,2023-07-23 14:55:49+00:00,"['classic soul', 'soul', 'vocal jazz']",0.572,0.365,4,-7.583,0,0.0276,0.88,0.0,0.125,0.41,96.022,audio_features,spotify:track:1XPj5quoeV5Gd0paSUDpvm,https://api.spotify.com/v1/tracks/1XPj5quoeV5G...,https://api.spotify.com/v1/audio-analysis/1XPj...,4
3,2G0GextMwZJLkNxcSZ7ZJ3,(What A) Wonderful World - Mono,Sam Cooke,The Wonderful World Of Sam Cooke,1960-02-01,128786,2023-07-20 01:18:55+00:00,"['classic soul', 'soul', 'vocal jazz']",0.686,0.672,11,-5.523,1,0.0323,0.7,0.0,0.135,0.857,128.55,audio_features,spotify:track:2G0GextMwZJLkNxcSZ7ZJ3,https://api.spotify.com/v1/tracks/2G0GextMwZJL...,https://api.spotify.com/v1/audio-analysis/2G0G...,4
4,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,Bon Iver,"For Emma, Forever Ago",2008-05-12,238532,2023-07-20 00:55:48+00:00,"['chamber pop', 'eau claire indie', 'indie fol...",0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,4
5,432hUIl3ISDeytYW5XBQ5h,Wolves,Kanye West,The Life Of Pablo,2016-06-10,301586,2023-07-16 12:12:03+00:00,"['chicago rap', 'hip hop', 'rap']",0.395,0.392,1,-8.034,1,0.0925,0.219,0.00173,0.134,0.103,155.856,audio_features,spotify:track:432hUIl3ISDeytYW5XBQ5h,https://api.spotify.com/v1/tracks/432hUIl3ISDe...,https://api.spotify.com/v1/audio-analysis/432h...,4
6,1w327AHTCoChRIkJUprAnV,Mirage,Orion Sun,A Collection of Fleeting Moments and Daydreams,2020-01-06,57495,2023-06-23 23:50:17+00:00,"['alternative r&b', 'bedroom pop']",0.683,0.468,9,-10.058,0,0.0743,0.87,0.673,0.108,0.47,117.095,audio_features,spotify:track:1w327AHTCoChRIkJUprAnV,https://api.spotify.com/v1/tracks/1w327AHTCoCh...,https://api.spotify.com/v1/audio-analysis/1w32...,4
7,5DBmXF7QO43Cuy9yqva116,Family Business,Kanye West,The College Dropout,2004-02-10,278893,2023-06-16 01:15:36+00:00,"['chicago rap', 'hip hop', 'rap']",0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,4
8,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Still Woozy,Goodie Bag,2017-11-05,146390,2023-06-09 18:11:19+00:00,"['bedroom pop', 'oakland indie', 'pov: indie']",0.739,0.522,7,-12.304,0,0.151,0.821,0.00124,0.233,0.619,138.263,audio_features,spotify:track:4vHNeBWDQpVCmGbaccrRzi,https://api.spotify.com/v1/tracks/4vHNeBWDQpVC...,https://api.spotify.com/v1/audio-analysis/4vHN...,4
9,4vju55Ag7apDL2CfotuE7Q,Sunny,Bobby Hebb,Sunny,1966-01-01,165066,2023-05-25 01:24:16+00:00,['northern soul'],0.714,0.338,4,-10.994,0,0.0456,0.93,0.00644,0.389,0.667,128.385,audio_features,spotify:track:4vju55Ag7apDL2CfotuE7Q,https://api.spotify.com/v1/tracks/4vju55Ag7apD...,https://api.spotify.com/v1/audio-analysis/4vju...,4


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 791 entries, 0 to 790
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype              
---  ------            --------------  -----              
 0   id                791 non-null    object             
 1   name_x            791 non-null    object             
 2   artist            791 non-null    object             
 3   album             791 non-null    object             
 4   release_date      791 non-null    datetime64[ns]     
 5   duration_ms_x     791 non-null    int64              
 6   added_at          791 non-null    datetime64[ns, UTC]
 7   genres            791 non-null    object             
 8   danceability      791 non-null    float64            
 9   energy            791 non-null    float64            
 10  key               791 non-null    int64              
 11  loudness          791 non-null    float64            
 12  mode              791 non-null    int64              
 13  speec

None

id                  0
name_x              0
artist              0
album               0
release_date        0
duration_ms_x       0
added_at            0
genres              0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
uri                 0
track_href          0
analysis_url        0
time_signature      0
dtype: int64

Unnamed: 0,duration_ms_x,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature
count,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0,791.0
mean,225172.45512,0.609549,0.552569,5.16182,-8.751134,0.590392,0.154147,0.368665,0.074728,0.210558,0.46897,118.155296,3.876106
std,91691.547592,0.167871,0.202468,3.581826,4.167771,0.492073,0.156323,0.303945,0.207064,0.170028,0.235077,32.459954,0.560129
min,9106.0,0.0,0.0113,0.0,-34.673,0.0,0.0,6.8e-05,0.0,0.0298,0.0,0.0,0.0
25%,174799.5,0.498,0.4145,2.0,-10.7125,0.0,0.04055,0.0882,0.0,0.103,0.285,92.892,4.0
50%,216320.0,0.622,0.565,5.0,-7.915,1.0,0.0811,0.29,2.3e-05,0.138,0.466,116.919,4.0
75%,270743.5,0.735,0.7095,8.0,-5.9365,1.0,0.233,0.624,0.006005,0.2735,0.642,140.1175,4.0
max,760973.0,0.967,0.977,11.0,-0.158,1.0,0.948,0.996,0.968,0.982,0.972,210.164,5.0


In [141]:
#Cleaning top tracks  dataframe

#first we take a look at top_tracks and of audio_features info
#top tracks
print('General Top Tracks Info:')
display(df_top_tracks_genre.head(10))
display(df_top_tracks_genre.info())
display(df_top_tracks_genre.isnull().sum())
display(df_top_tracks_genre.describe())
#audio features
print('Audio features Top Tracks Info:')
display(df_audio_features.head(10))
display(df_audio_features.info())
display(df_audio_features.isnull().sum())
display(df_audio_features.describe())
# now lets ensure our merged dataframe is as expected
print('Merged Top Tracks and Audio Features')
#converting genres from list to string to drop duplicate rows
df_top_tracks_detailed['genres'] = df_top_tracks_detailed['genres'].astype(str)
# Drop duplicate rows and duration
df_top_tracks_detailed.drop_duplicates(inplace=True)
#dropping duplicate columns
if 'duration_y' in df_top_tracks_detailed:
    df_top_tracks_detailed.drop('duration_y', axis=1, inplace=True)
if 'name_y' in df_top_tracks_detailed:
    df_top_tracks_detailed.drop('name_y', axis=1, inplace=True)
df_top_tracks_detailed = df_top_tracks_detailed.reset_index(drop=True)
display(df_top_tracks_detailed.head(10))
display(df_top_tracks_detailed.info())
display(df_top_tracks_detailed.isnull().sum())
display(df_top_tracks_detailed.describe())

General Top Tracks Info:


Unnamed: 0,id,name_x,album,artist,duration_ms,popularity,time_range,name_y,genres
0,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term,Bon Iver,"[chamber pop, eau claire indie, indie folk, me..."
1,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term,Kanye West,"[chicago rap, hip hop, rap]"
2,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Goodie Bag,Still Woozy,146390,78,short_term,Still Woozy,"[bedroom pop, oakland indie, pov: indie]"
3,694vvR5o19xHPhhJ5QdLN7,NO HALO,GINGER,BROCKHAMPTON,259746,59,short_term,BROCKHAMPTON,"[boy band, rap]"
4,093CMFUwvPyFIsjBsVfBPO,El Hob El Kebir,El Hob El Kebir,Ragheb Alama,315640,50,short_term,Ragheb Alama,"[classic arab pop, lebanese pop]"
5,4vju55Ag7apDL2CfotuE7Q,Sunny,Sunny,Bobby Hebb,165066,71,short_term,Bobby Hebb,[northern soul]
6,1T8BJvWzqm59RIuwQaTob8,Aghla El Habayeb,Yama Alo,Nawal Al Zoghbi,280673,44,short_term,Nawal Al Zoghbi,"[arab pop, dabke, lebanese pop]"
7,2OG1u9eX4Zjsi7fGQB9T8t,اخدني معك,حبك علمني,فضل شاكر,313939,46,short_term,فضل شاكر,[]
8,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Khserna Baad,Maya Diab,187561,44,short_term,Maya Diab,"[arab pop, lebanese pop]"
9,2nZKHO69mzHL9GtNWzAKpy,Habibi Wayno,Habibi Wayno,Ziad Bourji,286149,42,short_term,Ziad Bourji,"[arab pop, lebanese pop]"


<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           150 non-null    object
 1   name_x       150 non-null    object
 2   album        150 non-null    object
 3   artist       150 non-null    object
 4   duration_ms  150 non-null    int64 
 5   popularity   150 non-null    int64 
 6   time_range   150 non-null    object
 7   name_y       148 non-null    object
 8   genres       148 non-null    object
dtypes: int64(2), object(7)
memory usage: 11.7+ KB


None

id             0
name_x         0
album          0
artist         0
duration_ms    0
popularity     0
time_range     0
name_y         2
genres         2
dtype: int64

Unnamed: 0,duration_ms,popularity
count,150.0,150.0
mean,239424.48,54.52
std,66858.571266,18.76657
min,62375.0,0.0
25%,198885.75,43.0
50%,232098.5,54.5
75%,276988.5,70.0
max,427093.0,92.0


Audio features Top Tracks Info:


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,1HkOPLwAJH3kE8UnqgxF4s,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4
1,0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,5DBmXF7QO43Cuy9yqva116,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,278893,4
2,0.739,0.522,7,-12.304,0,0.151,0.821,0.00124,0.233,0.619,138.263,audio_features,4vHNeBWDQpVCmGbaccrRzi,spotify:track:4vHNeBWDQpVCmGbaccrRzi,https://api.spotify.com/v1/tracks/4vHNeBWDQpVC...,https://api.spotify.com/v1/audio-analysis/4vHN...,146390,4
3,0.643,0.573,4,-7.034,1,0.0709,0.41,0.00017,0.0695,0.47,165.192,audio_features,694vvR5o19xHPhhJ5QdLN7,spotify:track:694vvR5o19xHPhhJ5QdLN7,https://api.spotify.com/v1/tracks/694vvR5o19xH...,https://api.spotify.com/v1/audio-analysis/694v...,259747,4
4,0.621,0.965,7,-3.887,0,0.0431,0.00248,0.000169,0.119,0.809,144.971,audio_features,093CMFUwvPyFIsjBsVfBPO,spotify:track:093CMFUwvPyFIsjBsVfBPO,https://api.spotify.com/v1/tracks/093CMFUwvPyF...,https://api.spotify.com/v1/audio-analysis/093C...,315640,4
5,0.714,0.338,4,-10.994,0,0.0456,0.93,0.00644,0.389,0.667,128.385,audio_features,4vju55Ag7apDL2CfotuE7Q,spotify:track:4vju55Ag7apDL2CfotuE7Q,https://api.spotify.com/v1/tracks/4vju55Ag7apD...,https://api.spotify.com/v1/audio-analysis/4vju...,165067,4
6,0.612,0.736,7,-8.63,0,0.0696,0.402,0.000575,0.268,0.688,167.972,audio_features,1T8BJvWzqm59RIuwQaTob8,spotify:track:1T8BJvWzqm59RIuwQaTob8,https://api.spotify.com/v1/tracks/1T8BJvWzqm59...,https://api.spotify.com/v1/audio-analysis/1T8B...,280674,4
7,0.607,0.623,4,-8.804,0,0.0463,0.224,0.000156,0.0482,0.28,88.093,audio_features,2OG1u9eX4Zjsi7fGQB9T8t,spotify:track:2OG1u9eX4Zjsi7fGQB9T8t,https://api.spotify.com/v1/tracks/2OG1u9eX4Zjs...,https://api.spotify.com/v1/audio-analysis/2OG1...,313940,4
8,0.689,0.605,7,-7.917,0,0.0776,0.127,0.0,0.0827,0.146,87.017,audio_features,2UOopL3Y405ruJyMzJcdWD,spotify:track:2UOopL3Y405ruJyMzJcdWD,https://api.spotify.com/v1/tracks/2UOopL3Y405r...,https://api.spotify.com/v1/audio-analysis/2UOo...,187562,4
9,0.635,0.889,10,-5.512,1,0.0859,0.205,4e-06,0.0437,0.682,96.946,audio_features,2nZKHO69mzHL9GtNWzAKpy,spotify:track:2nZKHO69mzHL9GtNWzAKpy,https://api.spotify.com/v1/tracks/2nZKHO69mzHL...,https://api.spotify.com/v1/audio-analysis/2nZK...,286150,4


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      150 non-null    float64
 1   energy            150 non-null    float64
 2   key               150 non-null    int64  
 3   loudness          150 non-null    float64
 4   mode              150 non-null    int64  
 5   speechiness       150 non-null    float64
 6   acousticness      150 non-null    float64
 7   instrumentalness  150 non-null    float64
 8   liveness          150 non-null    float64
 9   valence           150 non-null    float64
 10  tempo             150 non-null    float64
 11  type              150 non-null    object 
 12  id                150 non-null    object 
 13  uri               150 non-null    object 
 14  track_href        150 non-null    object 
 15  analysis_url      150 non-null    object 
 16  duration_ms       150 non-null    int64  
 1

None

danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
id                  0
uri                 0
track_href          0
analysis_url        0
duration_ms         0
time_signature      0
dtype: int64

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
count,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0
mean,0.6048,0.536238,5.013333,-8.520553,0.48,0.066732,0.419996,0.027663,0.174531,0.524309,122.536713,239433.52,3.926667
std,0.142184,0.186905,3.525529,2.912444,0.501274,0.052623,0.289847,0.113705,0.124021,0.239391,34.067496,66851.896543,0.261556
min,0.278,0.0316,0.0,-22.895,0.0,0.0243,0.00239,0.0,0.0335,0.0553,48.718,62375.0,3.0
25%,0.52625,0.41975,2.0,-9.93925,0.0,0.0332,0.15475,0.0,0.0926,0.32825,92.80175,198886.5,4.0
50%,0.621,0.561,5.0,-7.9765,0.0,0.04325,0.389,4.1e-05,0.132,0.545,120.652,232099.0,4.0
75%,0.7105,0.6575,8.0,-6.634,1.0,0.084325,0.653,0.002375,0.20975,0.6925,148.07275,276988.5,4.0
max,0.913,0.965,11.0,-3.715,1.0,0.382,0.979,0.944,0.641,0.97,201.96,427093.0,4.0


Merged Top Tracks and Audio Features


Unnamed: 0,id,name_x,album,artist,duration_ms_x,popularity,time_range,genres,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,uri,track_href,analysis_url,duration_ms_y,time_signature
0,1HkOPLwAJH3kE8UnqgxF4s,Skinny Love,"For Emma, Forever Ago",Bon Iver,238532,69,short_term,"['chamber pop', 'eau claire indie', 'indie fol...",0.592,0.256,4,-14.031,0,0.0449,0.842,2e-06,0.088,0.103,76.358,audio_features,spotify:track:1HkOPLwAJH3kE8UnqgxF4s,https://api.spotify.com/v1/tracks/1HkOPLwAJH3k...,https://api.spotify.com/v1/audio-analysis/1HkO...,238533,4
1,5DBmXF7QO43Cuy9yqva116,Family Business,The College Dropout,Kanye West,278893,67,short_term,"['chicago rap', 'hip hop', 'rap']",0.744,0.524,1,-7.452,1,0.21,0.139,0.0,0.392,0.606,93.691,audio_features,spotify:track:5DBmXF7QO43Cuy9yqva116,https://api.spotify.com/v1/tracks/5DBmXF7QO43C...,https://api.spotify.com/v1/audio-analysis/5DBm...,278893,4
2,4vHNeBWDQpVCmGbaccrRzi,Goodie Bag,Goodie Bag,Still Woozy,146390,78,short_term,"['bedroom pop', 'oakland indie', 'pov: indie']",0.739,0.522,7,-12.304,0,0.151,0.821,0.00124,0.233,0.619,138.263,audio_features,spotify:track:4vHNeBWDQpVCmGbaccrRzi,https://api.spotify.com/v1/tracks/4vHNeBWDQpVC...,https://api.spotify.com/v1/audio-analysis/4vHN...,146390,4
3,694vvR5o19xHPhhJ5QdLN7,NO HALO,GINGER,BROCKHAMPTON,259746,59,short_term,"['boy band', 'rap']",0.643,0.573,4,-7.034,1,0.0709,0.41,0.00017,0.0695,0.47,165.192,audio_features,spotify:track:694vvR5o19xHPhhJ5QdLN7,https://api.spotify.com/v1/tracks/694vvR5o19xH...,https://api.spotify.com/v1/audio-analysis/694v...,259747,4
4,093CMFUwvPyFIsjBsVfBPO,El Hob El Kebir,El Hob El Kebir,Ragheb Alama,315640,50,short_term,"['classic arab pop', 'lebanese pop']",0.621,0.965,7,-3.887,0,0.0431,0.00248,0.000169,0.119,0.809,144.971,audio_features,spotify:track:093CMFUwvPyFIsjBsVfBPO,https://api.spotify.com/v1/tracks/093CMFUwvPyF...,https://api.spotify.com/v1/audio-analysis/093C...,315640,4
5,4vju55Ag7apDL2CfotuE7Q,Sunny,Sunny,Bobby Hebb,165066,71,short_term,['northern soul'],0.714,0.338,4,-10.994,0,0.0456,0.93,0.00644,0.389,0.667,128.385,audio_features,spotify:track:4vju55Ag7apDL2CfotuE7Q,https://api.spotify.com/v1/tracks/4vju55Ag7apD...,https://api.spotify.com/v1/audio-analysis/4vju...,165067,4
6,1T8BJvWzqm59RIuwQaTob8,Aghla El Habayeb,Yama Alo,Nawal Al Zoghbi,280673,44,short_term,"['arab pop', 'dabke', 'lebanese pop']",0.612,0.736,7,-8.63,0,0.0696,0.402,0.000575,0.268,0.688,167.972,audio_features,spotify:track:1T8BJvWzqm59RIuwQaTob8,https://api.spotify.com/v1/tracks/1T8BJvWzqm59...,https://api.spotify.com/v1/audio-analysis/1T8B...,280674,4
7,2OG1u9eX4Zjsi7fGQB9T8t,اخدني معك,حبك علمني,فضل شاكر,313939,46,short_term,[],0.607,0.623,4,-8.804,0,0.0463,0.224,0.000156,0.0482,0.28,88.093,audio_features,spotify:track:2OG1u9eX4Zjsi7fGQB9T8t,https://api.spotify.com/v1/tracks/2OG1u9eX4Zjs...,https://api.spotify.com/v1/audio-analysis/2OG1...,313940,4
8,2UOopL3Y405ruJyMzJcdWD,Khserna Baad,Khserna Baad,Maya Diab,187561,44,short_term,"['arab pop', 'lebanese pop']",0.689,0.605,7,-7.917,0,0.0776,0.127,0.0,0.0827,0.146,87.017,audio_features,spotify:track:2UOopL3Y405ruJyMzJcdWD,https://api.spotify.com/v1/tracks/2UOopL3Y405r...,https://api.spotify.com/v1/audio-analysis/2UOo...,187562,4
9,2nZKHO69mzHL9GtNWzAKpy,Habibi Wayno,Habibi Wayno,Ziad Bourji,286149,42,short_term,"['arab pop', 'lebanese pop']",0.635,0.889,10,-5.512,1,0.0859,0.205,4e-06,0.0437,0.682,96.946,audio_features,spotify:track:2nZKHO69mzHL9GtNWzAKpy,https://api.spotify.com/v1/tracks/2nZKHO69mzHL...,https://api.spotify.com/v1/audio-analysis/2nZK...,286150,4


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                150 non-null    object 
 1   name_x            150 non-null    object 
 2   album             150 non-null    object 
 3   artist            150 non-null    object 
 4   duration_ms_x     150 non-null    int64  
 5   popularity        150 non-null    int64  
 6   time_range        150 non-null    object 
 7   genres            150 non-null    object 
 8   danceability      150 non-null    float64
 9   energy            150 non-null    float64
 10  key               150 non-null    int64  
 11  loudness          150 non-null    float64
 12  mode              150 non-null    int64  
 13  speechiness       150 non-null    float64
 14  acousticness      150 non-null    float64
 15  instrumentalness  150 non-null    float64
 16  liveness          150 non-null    float64
 1

None

id                  0
name_x              0
album               0
artist              0
duration_ms_x       0
popularity          0
time_range          0
genres              0
danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
type                0
uri                 0
track_href          0
analysis_url        0
duration_ms_y       0
time_signature      0
dtype: int64

Unnamed: 0,duration_ms_x,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms_y,time_signature
count,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0,150.0
mean,239424.48,54.52,0.6048,0.536238,5.013333,-8.520553,0.48,0.066732,0.419996,0.027663,0.174531,0.524309,122.536713,239433.52,3.926667
std,66858.571266,18.76657,0.142184,0.186905,3.525529,2.912444,0.501274,0.052623,0.289847,0.113705,0.124021,0.239391,34.067496,66851.896543,0.261556
min,62375.0,0.0,0.278,0.0316,0.0,-22.895,0.0,0.0243,0.00239,0.0,0.0335,0.0553,48.718,62375.0,3.0
25%,198885.75,43.0,0.52625,0.41975,2.0,-9.93925,0.0,0.0332,0.15475,0.0,0.0926,0.32825,92.80175,198886.5,4.0
50%,232098.5,54.5,0.621,0.561,5.0,-7.9765,0.0,0.04325,0.389,4.1e-05,0.132,0.545,120.652,232099.0,4.0
75%,276988.5,70.0,0.7105,0.6575,8.0,-6.634,1.0,0.084325,0.653,0.002375,0.20975,0.6925,148.07275,276988.5,4.0
max,427093.0,92.0,0.913,0.965,11.0,-3.715,1.0,0.382,0.979,0.944,0.641,0.97,201.96,427093.0,4.0


In [142]:
#Cleaning top artists detailed dataframe

#first we take a look at top_artist and of artist_detailed info
#top artists
print('General Top Artist Info:')
display(df_top_artists.head(10))
display(df_top_artists.info())
display(df_top_artists.isnull().sum())
display(df_top_artists.describe())
#artist details
print('Artist Detail General Info:')
display(df_artist_details.head(10))
display(df_artist_details.info())
display(df_artist_details.isnull().sum())
display(df_artist_details.describe())
#merged dataframe for artists
print('Merged Artist and Artist Detail Dataframe')
#change dictionary datatype to string so as not to encounter errors when dropping row duplicates
df_top_artists_detailed['followers'] = df_top_artists_detailed['followers'].astype(str)
# Drop duplicate rows
df_top_artists_detailed.drop_duplicates(inplace=True)
#drop duplicate columns
if 'followers_count' in df_top_artists_detailed:
    df_top_artists_detailed.drop('followers_count', axis=1, inplace=True)
if 'popularity_y' in df_top_artists_detailed:  
    df_top_artists_detailed.drop('popularity_y', axis=1, inplace=True)
#reset index
df_top_artists_detailed = df_top_artists_detailed.reset_index(drop=True)
display(df_top_artists_detailed.head(10))
display(df_top_artists_detailed.info())
display(df_top_artists_detailed.isnull().sum())
display(df_top_artists_detailed.describe())

General Top Artist Info:


Unnamed: 0,id,name,genres,followers_count,popularity,time_range
0,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",0,43,short_term
1,5K4W6rqBFWDnAN6FQUkS6x,Kanye West,"chicago rap, hip hop, rap",0,88,short_term
2,2h93pZq0e7k5yf4dywlkpM,Frank Ocean,"lgbtq+ hip hop, neo soul",0,81,short_term
3,09A6IffSw0t8L8sfuOCVws,Wael Kfoury,"arab pop, belly dance, lebanese pop",0,52,short_term
4,5DPb3SKW8QZFwkRlmt7Gvo,Joseph Attieh,"arab pop, dabke, lebanese pop",0,40,short_term
5,3AA28KZvwAUcZuOKwyblJQ,Gorillaz,"alternative hip hop, modern rock, rock",0,77,short_term
6,4LLpKhyESsyAXpc4laK94U,Mac Miller,"hip hop, pittsburgh rap, pop rap, rap",0,81,short_term
7,6hnWRPzGGKiapVX1UCdEAC,Sam Cooke,"classic soul, soul, vocal jazz",0,66,short_term
8,4b5UHpUmrPycvsgu2M3ujz,Maya Diab,"arab pop, lebanese pop",0,41,short_term
9,1Bl6wpkWCQ4KVgnASpvzzA,BROCKHAMPTON,"boy band, rap",0,64,short_term


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               146 non-null    object
 1   name             146 non-null    object
 2   genres           146 non-null    object
 3   followers_count  146 non-null    int64 
 4   popularity       146 non-null    int64 
 5   time_range       146 non-null    object
dtypes: int64(2), object(4)
memory usage: 7.0+ KB


None

id                 0
name               0
genres             0
followers_count    0
popularity         0
time_range         0
dtype: int64

Unnamed: 0,followers_count,popularity
count,146.0,146.0
mean,0.0,60.356164
std,0.0,19.38707
min,0.0,20.0
25%,0.0,43.0
50%,0.0,61.0
75%,0.0,77.0
max,0.0,100.0


Artist Detail General Info:


Unnamed: 0,external_urls,followers,genres,href,id,images,name,popularity,type,uri
0,{'spotify': 'https://open.spotify.com/artist/0...,"{'href': None, 'total': 274200}","[arab pop, lebanese pop]",https://api.spotify.com/v1/artists/04N4sGkSTSx...,04N4sGkSTSxjVfbiItLvTj,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Ziad Bourji,43,artist,spotify:artist:04N4sGkSTSxjVfbiItLvTj
1,{'spotify': 'https://open.spotify.com/artist/5...,"{'href': None, 'total': 20790700}","[chicago rap, hip hop, rap]",https://api.spotify.com/v1/artists/5K4W6rqBFWD...,5K4W6rqBFWDnAN6FQUkS6x,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Kanye West,88,artist,spotify:artist:5K4W6rqBFWDnAN6FQUkS6x
2,{'spotify': 'https://open.spotify.com/artist/2...,"{'href': None, 'total': 11541738}","[lgbtq+ hip hop, neo soul]",https://api.spotify.com/v1/artists/2h93pZq0e7k...,2h93pZq0e7k5yf4dywlkpM,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Frank Ocean,81,artist,spotify:artist:2h93pZq0e7k5yf4dywlkpM
3,{'spotify': 'https://open.spotify.com/artist/0...,"{'href': None, 'total': 1518868}","[arab pop, belly dance, lebanese pop]",https://api.spotify.com/v1/artists/09A6IffSw0t...,09A6IffSw0t8L8sfuOCVws,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Wael Kfoury,52,artist,spotify:artist:09A6IffSw0t8L8sfuOCVws
4,{'spotify': 'https://open.spotify.com/artist/5...,"{'href': None, 'total': 301593}","[arab pop, dabke, lebanese pop]",https://api.spotify.com/v1/artists/5DPb3SKW8QZ...,5DPb3SKW8QZFwkRlmt7Gvo,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Joseph Attieh,40,artist,spotify:artist:5DPb3SKW8QZFwkRlmt7Gvo
5,{'spotify': 'https://open.spotify.com/artist/3...,"{'href': None, 'total': 10725672}","[alternative hip hop, modern rock, rock]",https://api.spotify.com/v1/artists/3AA28KZvwAU...,3AA28KZvwAUcZuOKwyblJQ,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Gorillaz,77,artist,spotify:artist:3AA28KZvwAUcZuOKwyblJQ
6,{'spotify': 'https://open.spotify.com/artist/4...,"{'href': None, 'total': 9319926}","[hip hop, pittsburgh rap, pop rap, rap]",https://api.spotify.com/v1/artists/4LLpKhyESsy...,4LLpKhyESsyAXpc4laK94U,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Mac Miller,81,artist,spotify:artist:4LLpKhyESsyAXpc4laK94U
7,{'spotify': 'https://open.spotify.com/artist/6...,"{'href': None, 'total': 1516294}","[classic soul, soul, vocal jazz]",https://api.spotify.com/v1/artists/6hnWRPzGGKi...,6hnWRPzGGKiapVX1UCdEAC,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Sam Cooke,66,artist,spotify:artist:6hnWRPzGGKiapVX1UCdEAC
8,{'spotify': 'https://open.spotify.com/artist/4...,"{'href': None, 'total': 161555}","[arab pop, lebanese pop]",https://api.spotify.com/v1/artists/4b5UHpUmrPy...,4b5UHpUmrPycvsgu2M3ujz,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",Maya Diab,41,artist,spotify:artist:4b5UHpUmrPycvsgu2M3ujz
9,{'spotify': 'https://open.spotify.com/artist/1...,"{'href': None, 'total': 2127842}","[boy band, rap]",https://api.spotify.com/v1/artists/1Bl6wpkWCQ4...,1Bl6wpkWCQ4KVgnASpvzzA,"[{'height': 640, 'url': 'https://i.scdn.co/ima...",BROCKHAMPTON,64,artist,spotify:artist:1Bl6wpkWCQ4KVgnASpvzzA


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   external_urls  146 non-null    object
 1   followers      146 non-null    object
 2   genres         146 non-null    object
 3   href           146 non-null    object
 4   id             146 non-null    object
 5   images         146 non-null    object
 6   name           146 non-null    object
 7   popularity     146 non-null    int64 
 8   type           146 non-null    object
 9   uri            146 non-null    object
dtypes: int64(1), object(9)
memory usage: 11.5+ KB


None

external_urls    0
followers        0
genres           0
href             0
id               0
images           0
name             0
popularity       0
type             0
uri              0
dtype: int64

Unnamed: 0,popularity
count,146.0
mean,60.356164
std,19.38707
min,20.0
25%,43.0
50%,61.0
75%,77.0
max,100.0


Merged Artist and Artist Detail Dataframe


Unnamed: 0,id,name,genres,popularity_x,time_range,followers
0,04N4sGkSTSxjVfbiItLvTj,Ziad Bourji,"arab pop, lebanese pop",43,short_term,"{'href': None, 'total': 274200}"
1,5K4W6rqBFWDnAN6FQUkS6x,Kanye West,"chicago rap, hip hop, rap",88,short_term,"{'href': None, 'total': 20790700}"
2,2h93pZq0e7k5yf4dywlkpM,Frank Ocean,"lgbtq+ hip hop, neo soul",81,short_term,"{'href': None, 'total': 11541738}"
3,09A6IffSw0t8L8sfuOCVws,Wael Kfoury,"arab pop, belly dance, lebanese pop",52,short_term,"{'href': None, 'total': 1518868}"
4,5DPb3SKW8QZFwkRlmt7Gvo,Joseph Attieh,"arab pop, dabke, lebanese pop",40,short_term,"{'href': None, 'total': 301593}"
5,3AA28KZvwAUcZuOKwyblJQ,Gorillaz,"alternative hip hop, modern rock, rock",77,short_term,"{'href': None, 'total': 10725672}"
6,4LLpKhyESsyAXpc4laK94U,Mac Miller,"hip hop, pittsburgh rap, pop rap, rap",81,short_term,"{'href': None, 'total': 9319926}"
7,6hnWRPzGGKiapVX1UCdEAC,Sam Cooke,"classic soul, soul, vocal jazz",66,short_term,"{'href': None, 'total': 1516294}"
8,4b5UHpUmrPycvsgu2M3ujz,Maya Diab,"arab pop, lebanese pop",41,short_term,"{'href': None, 'total': 161555}"
9,1Bl6wpkWCQ4KVgnASpvzzA,BROCKHAMPTON,"boy band, rap",64,short_term,"{'href': None, 'total': 2127842}"


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            146 non-null    object
 1   name          146 non-null    object
 2   genres        146 non-null    object
 3   popularity_x  146 non-null    int64 
 4   time_range    146 non-null    object
 5   followers     146 non-null    object
dtypes: int64(1), object(5)
memory usage: 7.0+ KB


None

id              0
name            0
genres          0
popularity_x    0
time_range      0
followers       0
dtype: int64

Unnamed: 0,popularity_x
count,146.0
mean,60.356164
std,19.38707
min,20.0
25%,43.0
50%,61.0
75%,77.0
max,100.0


Let us now save all our clean dataframes that we intend to use for data visualisation in our data visualisation notebook (see next file)

In [161]:
df_saved_tracks_with_audio_features.to_csv('saved_tracks.csv', index=False)
df_top_tracks_detailed.to_csv('top_tracks.csv', index=False)
df_top_artists_detailed.to_csv('top_artists.csv',index=False)

The below is me having fun by extracting the features of a recently released single, produced in my home studio, by my friends in the band Koteri :)

In [154]:
song_name = "Something I Need"
df_song = df_saved_tracks_genre.loc[df_saved_tracks_genre['name_x'] == song_name]
song_row = df_saved_tracks_genre.loc[df_saved_tracks_genre['name_x'] == song_name]
song_id = song_row['id'].values[0] # This gets the id of the song
audio_features = sp.audio_features(song_id)[0]  # The [0] is to unpack the result from a list
audio_features=pd.DataFrame([audio_features])
display(df_song.head())
display(audio_features.head())

Unnamed: 0,id,name_x,artist,album,release_date,duration_ms,added_at,genres
19,5j3so8OA4ddWtvObRxy0zm,Something I Need,Koteri,Something I Need,2023-03-21,216236,2023-03-21 16:11:21+00:00,[]


Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.585,0.563,9,-7.198,0,0.0414,0.706,0.00342,0.132,0.626,80.003,audio_features,5j3so8OA4ddWtvObRxy0zm,spotify:track:5j3so8OA4ddWtvObRxy0zm,https://api.spotify.com/v1/tracks/5j3so8OA4ddW...,https://api.spotify.com/v1/audio-analysis/5j3s...,216237,4


Spotipy library will typically handle refreshing the access token automatically in most instances. And since this is not a large scale applciation setting up a way to handle token refresh is not so important even though it's good practice. Considering the above points, the following cell below allows one to verify if the user's access token is still active or is expired/close to expiring, especially handy for debugging to ensure token expiry or authentication is not the issue.  

In [144]:
import time

def is_token_expired(token_info):
    now = int(time.time())
    return token_info['expires_at'] - now < 60  # less than a minute to expire

if is_token_expired(token_info):
    print("Token has expired or is about to expire.")
else:
    print("Token is still valid.")

Token is still valid.
