This code identifies the tracks in a playlist and then retrieves audio_analysis data for them. The dataset analyzes the track in two different ways. One way is listing duration of tatums (bars and beats are very similar concepts but tatum is more specific, leading to more observations). The other representation is splitting the track in musically similar parts (sections or segments, segment being more specific). A segment contains 30 features including its duration, loudness, pitch and timbre.

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

client_credentials_manager = SpotifyClientCredentials(
        "26399552a8ce4d1285397254189cac50",
        "fdacbbba2dd34dbeb127dedb459f7ea3")
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

In [3]:
# get the track IDs in Germany Top 50
ger_pl = sp.playlist("spotify:playlist:37i9dQZEVXbJiZcmkrIHGU")
ger_t_list = []
for track in ger_pl["tracks"]["items"]:
    ger_t_list = ger_t_list + [track["track"]["uri"]]

In [10]:
# Audio analysis of a sample track
audio = sp.audio_analysis(ger_t_list[0])
print(audio.keys())

dict_keys(['meta', 'track', 'bars', 'beats', 'tatums', 'sections', 'segments'])


In [12]:
# Beats are subdivisions of bars. Tatums are subdivisions of beats.
tatum_data = pd.DataFrame(audio["tatums"])
tatum_data = tatum_data.assign(end = tatum_data.start + tatum_data.duration)
# Just a check to see if a tatum ends before the next
print("Max difference between the end of a tatum and the start of the next is: ",
      '{:.5}'.format((tatum_data.start - tatum_data.end.shift()).max()))
# Sequence with only one series (tatum durations). Could be used to train a simple model.

Max difference between the end of a tatum and the start of the next is:  1e-05


In [17]:
# Audio segments attempts to subdivide a song into many segments, with each 
# segment containing a roughly consistent sound throughout its duration.
# A segment contains 30 features.
ger_data = pd.DataFrame()
for track in ger_t_list:
    audio = sp.audio_analysis(track)
    segments_data =  pd.DataFrame(audio["segments"])
    pitch = segments_data.pitches.apply(pd.Series)
    pitch.columns = ["p" + str(i) for i in range(1,13)]
    timbre = segments_data.timbre.apply(pd.Series)
    timbre.columns = ["t" + str(i) for i in range(1,13)]
    segments_data = segments_data.drop(["pitches", "timbre", "loudness_end"],
                                       axis = 1)
    segments_data = segments_data.join([pitch, timbre])
    segments_data["track_id"] = track
    ger_data = ger_data.append(segments_data)
print(ger_data.groupby("track_id").size())

track_id
spotify:track:07f2b3CTdgKKlhv0mqUksz     667
spotify:track:0C6bsQq58Ue1XfL5PKTO6D     649
spotify:track:0LPRq5I35z8FoqYo84xn48     933
spotify:track:0Vl4eICpXMjtiK0RhdaWov     805
spotify:track:0dv22i02nk3o8JwmK6BwjI     723
spotify:track:0nbXyq5TXYPCO7pr3N8S4I     751
spotify:track:0oFrTaO9UIgqu6MuzkFu7B     689
spotify:track:0sf12qNH5qcw8qpgymFOqD     772
spotify:track:0ui2kVwPZKHaZxGhdIzBrp     940
spotify:track:16wAOAZ2OkqoIDN7TpChjR     812
spotify:track:1B89LtaW92jj4AqT7OZ0Fj     875
spotify:track:1E1YyZjbteIz2XQyLvtRxD     759
spotify:track:1R4xkZXQUQ8QJtAdwHkSgC     651
spotify:track:1V7JaMp11LKGwKiVmSetf0     678
spotify:track:1hoLUVBx0ixX3kn0EX0P5n     692
spotify:track:1rgnBhdG2JDFTbYkYRZAku     754
spotify:track:24Yi9hE78yPEbZ4kxyoXAI     776
spotify:track:2GdDsXV5v47AZsuwtGjKKy     815
spotify:track:2PGA1AsJal6cyMNmKyE56q     760
spotify:track:2RaKlveGCllSaXloN8kmzV     805
spotify:track:2fzPZOozISzcAU7FSwkN7g     737
spotify:track:2rWnTpXD0jq5lymyg4xIKQ     642
s

In [16]:
print(ger_data.columns)

Index(['start', 'duration', 'confidence', 'loudness_start',
       'loudness_max_time', 'loudness_max', 'p1', 'p2', 'p3', 'p4', 'p5', 'p6',
       'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 't1', 't2', 't3', 't4', 't5',
       't6', 't7', 't8', 't9', 't10', 't11', 't12', 'track_id'],
      dtype='object')


### Pitch
Pitch content is given by a “chroma” vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0. https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-analysis/

### Timbre
Timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment’s spectro-temporal surface, independently of pitch and loudness. The timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance.