# Get Spotify data
This Jupyter notebook provides a Python script for accessing Spotify's Web API to retrieve data about playlists, playlist items, track audio features, and track audio analysis. The Spotify Web API provides a wide range of information about user's music preferences, which can be used for various music analytics applications.

Please refer to Spotify's official documentation here on [how to request for an access token](https://developer.spotify.com/documentation/web-api/tutorials/getting-started#request-an-access-token).

You'll need your own `CLIENT_ID`, `CLIENT_SECRET` and `USER_ID` credentials to proceed.

In [1]:
import os
import pandas as pd
import re
import requests
import time
import json
from utils import *
%load_ext dotenv
%dotenv

In [2]:
# Retrieve credentials from the environment variables
CLIENT_ID = os.environ['CLIENT_ID']
CLIENT_SECRET = os.environ['CLIENT_SECRET']
USER_ID = os.environ['USER_ID']

In [3]:
access_token = request_an_access_token(CLIENT_ID, CLIENT_SECRET)

## Get playlist
Get a playlist owned by a Spotify user. [See Spotify documentation](https://developer.spotify.com/documentation/web-api/reference/get-playlist)

In [4]:
# Get playlist for the first time and save raw data to a file
# data = get_playlist(USER_ID, access_token)
# with open('./data/raw/playlists.json', 'w') as f:
#     json.dump(data, f)

# Or, load saved data from a file
with open('./data/raw/playlists.json', 'r') as f:
    data = json.load(f)

In [5]:
# Normalize the data and explode the 'items' column to create a row for each playlist
df_playlists = pd.json_normalize(data).explode('items')

# Normalize the 'items' column to extract the relevant playlist information
df_playlists = pd.json_normalize(df_playlists['items'])

# Select only the columns we need for our analysis
df_playlists = df_playlists[['id', 'name', 'tracks.total']]

In [6]:
df_playlists.head()

Unnamed: 0,id,name,tracks.total
0,13Qsm0axSPpL11U5yGhwFS,Awkward,2
1,730Ce3gbzbasRtac5l8eXs,My Songs,1074
2,2gibcQ6TCJyyQgdJaDNWsT,Nicole&Shaun Wedding Playlist :),106
3,78TQufEn9zE564Is7DKk46,Karaoke,10
4,6mAJ1EqzlLuD5o97BB1VNP,1) 29.10 Pre-walk in,16


In [7]:
# Save tabulated playlists to a file
# df_playlists.to_csv('./data/playlists.csv', index = False)

## Get playlist items
Get full details of the items of a playlist owned by a Spotify user. [See Spotify documentation](https://developer.spotify.com/documentation/web-api/reference/get-playlists-tracks)

In [8]:
df_playlists = pd.read_csv('./data/playlists.csv')

# Calculate the number of offsets needed to retrieve all tracks in each playlist
df_playlists['offsets_needed'] = df_playlists['tracks.total'] // 100 + 1

In [9]:
# Get playlist items for the first time and save raw data to a file
# for index, row in df_playlists.iterrows():
#     for i in range(row['offsets_needed']):
#         playlist_id = row['id']
#         offset = i * 100
        
#         data = get_playlist_tracks(playlist_id, access_token, offset)
#         df_tmp = pd.json_normalize(data)
        
#         if index == 0 and i == 0:
#             df_raw_tracks = df_tmp
#         else:
#             df_raw_tracks = pd.concat([df_raw_tracks, df_tmp])
            
#         print(f'Playlist: {index}; Offset: {offset}')

# df_raw_tracks.to_json('./data/raw/tracks.json', orient = 'records')

# Or, load saved data from a file
df_raw_tracks = pd.read_json('./data/raw/tracks.json')

In [10]:
# Explode the 'items' column to create a row for each track
df_tracks = df_raw_tracks[['href', 'items']].explode('items')

# Add the normalize the 'items' columns to our df_tracks
df_tracks = pd.concat([df_tracks.reset_index(drop = True), pd.json_normalize(df_tracks['items']).reset_index(drop = True)], axis = 1)

# Extract the playlist ID from the track URL
df_tracks['playlist_id'] = df_tracks.apply(lambda x: re.sub(r'https:\/\/.*\/([\w-]+)\/.*$', r'\1', x['href']), axis = 1)

# Select only the columns we need for our analysis
df_tracks = df_tracks[['href', 'track.id', 'track.name', 'playlist_id']]

In [11]:
df_tracks.head()

Unnamed: 0,href,track.id,track.name,playlist_id
0,https://api.spotify.com/v1/playlists/13Qsm0axS...,4CPYZtb4tX2V03jcsJAZCD,Where's Kevin (From 'Overcooked! 2'),13Qsm0axSPpL11U5yGhwFS
1,https://api.spotify.com/v1/playlists/13Qsm0axS...,4cmRCH5q4Mp5DKqsGkQ2eu,"Super Mario Theme (From ""Super Mario"")",13Qsm0axSPpL11U5yGhwFS
2,https://api.spotify.com/v1/playlists/730Ce3gbz...,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,730Ce3gbzbasRtac5l8eXs
3,https://api.spotify.com/v1/playlists/730Ce3gbz...,4jPy3l0RUwlUI9T5XHBW2m,Mood (feat. iann dior),730Ce3gbzbasRtac5l8eXs
4,https://api.spotify.com/v1/playlists/730Ce3gbz...,2tGvwE8GcFKwNdAXMnlbfl,happier,730Ce3gbzbasRtac5l8eXs


In [12]:
# Save tabulated playlist items to a file
# df_tracks.to_csv('./data/tracks.csv', index = False)

## Get track's audio features
Get audio feature information for a single track identified by its unique Spotify ID. [See Spotify documentation](https://developer.spotify.com/documentation/web-api/reference/get-audio-features)

In [13]:
df_tracks = pd.read_csv('./data/tracks.csv')

# Select only the unique track IDs and names
df_unq_tracks = df_tracks[['track.id', 'track.name']].drop_duplicates().reset_index(drop=True)

# Assign a batch number to each unique track to group them for API requests
df_unq_tracks['batch'] = (df_unq_tracks.index.values.astype(int) // 100) + 1

In [14]:
df_unq_tracks

Unnamed: 0,track.id,track.name,batch
0,4CPYZtb4tX2V03jcsJAZCD,Where's Kevin (From 'Overcooked! 2'),1
1,4cmRCH5q4Mp5DKqsGkQ2eu,"Super Mario Theme (From ""Super Mario"")",1
2,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,1
3,4jPy3l0RUwlUI9T5XHBW2m,Mood (feat. iann dior),1
4,2tGvwE8GcFKwNdAXMnlbfl,happier,1
...,...,...,...
1173,37dYAkMa4lzRCH6kDbMT1L,We No Speak Americano (Edit),12
1174,3dxDj8pDPlIHCIrUPXuCeG,Sandstorm,12
1175,09cM9BjyNFizKUOXh6j9rT,Sparks Fly,12
1176,0tr6XR58KBdDYd8qvHVTs8,Back To December,12


In [15]:
# Get track audio features for the first time and save raw data to a file
# for i in df_unq_tracks['batch'].unique():
#     df_batch_i = df_unq_tracks[df_unq_tracks['batch'] == i]
#     data = df_batch_i.apply(lambda x: get_audio_features(x['track.id'], access_token), axis = 1)
#     df_tmp = pd.json_normalize(data)

#     if i == 1:
#         df_audio_features = df_tmp
#     else:
#         df_audio_features = pd.concat([df_audio_features, df_tmp])

#     print(f'Batch: {i}')
#     time.sleep(30)

# df_audio_features.to_csv('./data/raw/audio_features.csv', index = False)

# Or, load saved data from a file
df_audio_features = pd.read_csv('./data/raw/audio_features.csv')

In [16]:
df_audio_features.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.9,0.316,0,-14.0,0,0.171,0.137,0.757,0.0824,0.972,110.026,audio_features,4CPYZtb4tX2V03jcsJAZCD,spotify:track:4CPYZtb4tX2V03jcsJAZCD,https://api.spotify.com/v1/tracks/4CPYZtb4tX2V...,https://api.spotify.com/v1/audio-analysis/4CPY...,132620,4
1,0.769,0.324,0,-13.916,1,0.43,0.214,0.846,0.0505,0.964,100.123,audio_features,4cmRCH5q4Mp5DKqsGkQ2eu,spotify:track:4cmRCH5q4Mp5DKqsGkQ2eu,https://api.spotify.com/v1/tracks/4cmRCH5q4Mp5...,https://api.spotify.com/v1/audio-analysis/4cmR...,188843,4
2,0.514,0.73,1,-5.934,1,0.0598,0.00146,9.5e-05,0.0897,0.334,171.005,audio_features,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,200040,4
3,0.701,0.716,7,-3.671,0,0.0361,0.174,0.0,0.324,0.732,91.007,audio_features,4jPy3l0RUwlUI9T5XHBW2m,spotify:track:4jPy3l0RUwlUI9T5XHBW2m,https://api.spotify.com/v1/tracks/4jPy3l0RUwlU...,https://api.spotify.com/v1/audio-analysis/4jPy...,140533,4
4,0.395,0.443,6,-9.72,1,0.133,0.765,1e-05,0.0839,0.338,168.924,audio_features,2tGvwE8GcFKwNdAXMnlbfl,spotify:track:2tGvwE8GcFKwNdAXMnlbfl,https://api.spotify.com/v1/tracks/2tGvwE8GcFKw...,https://api.spotify.com/v1/audio-analysis/2tGv...,175933,3


## Get track's audio analysis
Get a low-level audio analysis for a track in the Spotify catalog. The audio analysis describes the track’s structure and musical content, including rhythm, pitch, and timbre. [See Spotify documentation](https://developer.spotify.com/documentation/web-api/reference/get-audio-analysis)

In [17]:
# Get track audio analysis for the first time and save raw data to a file
# for i in df_unq_tracks['batch'].unique():
#     df_batch_i = df_unq_tracks[df_unq_tracks['batch'] == i]
#     data = df_batch_i.apply(lambda x: get_audio_analysis(x['track.id'], access_token), axis = 1)
#     df_tmp = pd.json_normalize(data)

#     if i == 1:
#         df_audio_analysis = df_tmp
#     else:
#         df_audio_analysis = pd.concat([df_audio_analysis, df_tmp])

#     print(f'Batch: {i}')
#     time.sleep(30)

# df_audio_analysis = pd.concat([df_unq_tracks.reset_index(drop = True), df_audio_analysis.reset_index(drop = True)], axis = 1)
# df_audio_analysis.to_csv('./data/raw/audio_analysis.csv', index = False)
    
# Or, load saved data from a file
df_audio_analysis = pd.read_csv('./data/raw/audio_analysis.csv')

In [18]:
df_audio_analysis.head()

Unnamed: 0,track.id,track.name,batch,bars,beats,sections,segments,tatums,meta.analyzer_version,meta.platform,...,track.mode,track.mode_confidence,track.codestring,track.code_version,track.echoprintstring,track.echoprint_version,track.synchstring,track.synch_version,track.rhythmstring,track.rhythm_version
0,4CPYZtb4tX2V03jcsJAZCD,Where's Kevin (From 'Overcooked! 2'),1,"[{'start': 0.68467, 'duration': 2.18383, 'conf...","[{'start': 0.13989, 'duration': 0.54478, 'conf...","[{'start': 0.0, 'duration': 18.13542, 'confide...","[{'start': 0.0, 'duration': 0.13338, 'confiden...","[{'start': 0.13989, 'duration': 0.27239, 'conf...",4.0.0,Linux,...,0,0.253,eJxVmVuC5CgMBK_iI_AUcP-LTYRwV9V87GzjwjZIqVQmXn...,3.15,eJzlnQuOJDmPpK_kekvH0fP-R9jPGD0V1QmEcgu5tY3FDj...,4.12,eJxNlwmSLDkIQ6_iI6TB6_0vNtLD1X-iK7oqnV5ACIEjvv...,1.0,eJyNWgm2HDEKu0odwbvN_S82RgLs6vqZmfeSdKcWLyCEwJ...,1.0
1,4cmRCH5q4Mp5DKqsGkQ2eu,"Super Mario Theme (From ""Super Mario"")",1,"[{'start': 1.36865, 'duration': 2.39713, 'conf...","[{'start': 0.76593, 'duration': 0.60272, 'conf...","[{'start': 0.0, 'duration': 16.95381, 'confide...","[{'start': 0.0, 'duration': 0.16259, 'confiden...","[{'start': 0.76593, 'duration': 0.30136, 'conf...",4.0.0,Linux,...,1,0.543,eJxdm9mB6zoMQ1txCdop9d_YHEBelPl4940Ux6YpEgSX5J...,3.15,eJzNnQvSa7mNpLd0-CaXw-f-lzBfQmXLfaNFhUdTUdMOo8...,4.12,eJxVWQmS3CAM_IqfYCQu__9joQ8xm8rWZMZg0NGSGhHxjv...,1.0,eJxtXA2S5LwKu0qOEP9h-_4XWyMJnJ7aqm_fzHR3EtuAEI...,1.0
2,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,1,"[{'start': 1.26362, 'duration': 1.40759, 'conf...","[{'start': 0.20422, 'duration': 0.36619, 'conf...","[{'start': 0.0, 'duration': 11.98776, 'confide...","[{'start': 0.0, 'duration': 0.08857, 'confiden...","[{'start': 0.20422, 'duration': 0.1831, 'confi...",4.0.0,Linux,...,1,0.152,eJxdmgl2ZLkNBK9SR-C-3P9ijkhWSxr3s-eJLH4uIJBIAJ...,3.15,eJzNvQmWLLmNtLsl50wuh-P-l_A-Q5QqpNvPmaf_PHXUPU...,4.12,eJxVWAlyxDAI-4qfYMDn_z9WkOS0ne5mG8cHhxAQ9z57i2...,1.0,eJyVXAmS5Dquu4qPYO3S_S82BEBKclbPxP8Rr7u6MmUtFA...,1.0
3,4jPy3l0RUwlUI9T5XHBW2m,Mood (feat. iann dior),1,"[{'start': 0.24518, 'duration': 2.61968, 'conf...","[{'start': 0.24518, 'duration': 0.65366, 'conf...","[{'start': 0.0, 'duration': 8.14435, 'confiden...","[{'start': 0.0, 'duration': 0.22045, 'confiden...","[{'start': 0.24518, 'duration': 0.32683, 'conf...",4.0.0,Linux,...,0,0.672,eJxVmgmy5TYOBK-iI3Bf7n-xySy-7vY4wvYnJZHEViiAr-...,3.15,eJzdnQm2HLmSXLcUmIHlYNz_EnTNsz6zyNOJJ-o1RbX6n_...,4.12,eJxdWQmS5EgI-4qfYEjy-v_HFkng6tmYimqX8-IUIsf9ne...,1.0,eJxlWolxJLsOS6VDkKg7_8Q-cUiefb9qXV7PdOsgQRCkVG...,1.0
4,2tGvwE8GcFKwNdAXMnlbfl,happier,1,"[{'start': 1.90661, 'duration': 1.07988, 'conf...","[{'start': 1.17935, 'duration': 0.3577, 'confi...","[{'start': 0.0, 'duration': 4.06404, 'confiden...","[{'start': 0.0, 'duration': 0.93401, 'confiden...","[{'start': 1.17935, 'duration': 0.17885, 'conf...",4.0.0,Linux,...,1,0.648,eJxVmlmi5DYOBK-iI3Bf7n-xiUiWnz3-aDdZEkUCyEQC7F...,3.15,eJzNnWmObTluhLekeViOxv0vwV8oG37dBR_mjwfDBmxWV-...,4.12,eJydWAeS4zgM_AqfIASm_39s0Q1Qksc7W1dX4-LQIIicaN...,1.0,eJxNmwt2ZDkIQ7fyluD_Z_8ba3SFK31OZ5JUqvwwBiGEp9...,1.0
