# Dissecting Spotify Valence - Getting the Data

Spotify offers some metrics regarding each song, for instance danceability, instrumentalness, acousticness. Among them is also *valence*, which was introduced in 2014 by a company and counts the happiness of a song. 

In this notebook, data is downloaded from Spotify using **spotipy** library. 

---

> Natalia Katsiapi <br />
> Department of Management Science and Technology <br />
> Athens University of Economics and Business <br />
> t8180040@aueb.gr

In order to connect to Spotify Api, a file `spotify_config.py` with the following contents needs to be created:

  ```
  config = {
      'client_id' : 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
      'client_secret' :'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
  }
  ```

In [1]:
import spotipy
import random
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

from spotify_config import config

client_credentials_manager = SpotifyClientCredentials(config['client_id'],
                                                      config['client_secret'])
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)


The idea is to get as random songs as possible, since getting songs from specific playlists which usually contain certain genres would make our models biased. No seeds were added, because generators bases on Spotify's search, which is biased based on the logged in user.

The code was adapted from [here](https://perryjanssen.medium.com/getting-random-tracks-using-the-spotify-api-61889b0c0c27). Adding a random market to the query came up later, since it was found that search was biased as to what people from the logged in user's market listen to.

In [2]:
def get_random_query():
    # A list of all characters that can be chosen.
    characters = 'abcdefghijklmnopqrstuvwxyz'

    country_codes = ['AD', 'AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'EC', 
        'SV', 'EE', 'FI', 'FR', 'DE', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'ID', 'IE', 'IT', 'JP', 'LV', 'LI', 'LT', 'LU', 'MY', 'MT', 'MX', 'MC', 'NL', 
        'NZ', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'ES', 'SK', 'SE', 'CH', 'TW', 'TR', 'GB', 'US', 'UY']
    
    # Gets a random character from the characters string.
    random_character = random.choice(characters)
    random_search = ''

    # Gets a random market
    random_market = random.choice(country_codes)
    offset = random.randint(0, 500)


    # Places the wildcard character at the beginning, or both beginning and end, randomly.
    rand = random.randint(0, 1)
    if rand == 0:
        random_search = random_character + '%'
    else:
        random_search = '%' + random_character + '%'
    return random_search, random_market, offset


15000 was considered a reasonable number of tracks to analyze.

In [3]:
SONGS_LIMIT = 15000
songs = set()

The following loop does the queries to the Spotify API. What is interesting is the fact that for every loop the new songs added are becoming less and less. In fact, without the market the loop may have been infinite. This shows that the API is biased.

In [4]:
while len(songs) < SONGS_LIMIT:
    random_search = get_random_query()
    random_search, random_market, offset = get_random_query()
    print(f"Searching for {random_search} {offset} in market {random_market}, current songs length {len(songs)}")
    
    result = sp.search(q=random_search, type="track", limit=50, offset=offset, market=random_market)
    items = result["tracks"]["items"]
    current_songs = {(song["name"], song["id"], song["artists"][0]["name"]) for song in items}
    songs = songs.union(current_songs)

Searching for b% 402 in market ES, current songs length 0
Searching for z% 337 in market HK, current songs length 50
Searching for %g% 25 in market IE, current songs length 100
Searching for h% 51 in market BG, current songs length 149
Searching for j% 470 in market EC, current songs length 198
Searching for %b% 467 in market AR, current songs length 248
Searching for %j% 422 in market SK, current songs length 298
Searching for n% 158 in market ES, current songs length 348
Searching for e% 158 in market CO, current songs length 393
Searching for %n% 87 in market GT, current songs length 438
Searching for x% 60 in market GB, current songs length 482
Searching for w% 425 in market DK, current songs length 531
Searching for y% 53 in market LI, current songs length 579
Searching for l% 99 in market EC, current songs length 625
Searching for %n% 208 in market EE, current songs length 672
Searching for %x% 479 in market CO, current songs length 719
Searching for w% 112 in market IS, current 

Searching for k% 131 in market NO, current songs length 4813
Searching for e% 321 in market PT, current songs length 4847
Searching for %j% 386 in market EE, current songs length 4870
Searching for u% 219 in market AT, current songs length 4891
Searching for x% 424 in market AU, current songs length 4920
Searching for k% 56 in market BG, current songs length 4959
Searching for d% 151 in market LU, current songs length 4983
Searching for %i% 211 in market AT, current songs length 5006
Searching for u% 148 in market LT, current songs length 5036
Searching for %l% 494 in market PY, current songs length 5057
Searching for %d% 5 in market IS, current songs length 5079
Searching for s% 289 in market NL, current songs length 5107
Searching for %z% 328 in market TW, current songs length 5127
Searching for %g% 44 in market ID, current songs length 5161
Searching for h% 447 in market SK, current songs length 5179
Searching for %v% 210 in market DO, current songs length 5212
Searching for %q% 477

Searching for t% 113 in market LU, current songs length 7837
Searching for %l% 326 in market HK, current songs length 7843
Searching for %v% 388 in market BG, current songs length 7851
Searching for t% 330 in market IE, current songs length 7882
Searching for %s% 3 in market FR, current songs length 7898
Searching for y% 32 in market ID, current songs length 7918
Searching for j% 279 in market LI, current songs length 7937
Searching for %b% 497 in market LU, current songs length 7942
Searching for %s% 25 in market PA, current songs length 7964
Searching for %n% 245 in market MT, current songs length 7980
Searching for k% 264 in market IE, current songs length 7997
Searching for %v% 300 in market SE, current songs length 8020
Searching for h% 305 in market TR, current songs length 8043
Searching for %n% 63 in market LT, current songs length 8079
Searching for %q% 246 in market GT, current songs length 8085
Searching for %j% 418 in market GB, current songs length 8122
Searching for r% 10

Searching for %r% 225 in market DK, current songs length 10062
Searching for u% 326 in market IS, current songs length 10066
Searching for l% 395 in market MT, current songs length 10078
Searching for j% 398 in market PA, current songs length 10084
Searching for %z% 198 in market NL, current songs length 10084
Searching for %p% 153 in market ES, current songs length 10120
Searching for %k% 270 in market DO, current songs length 10140
Searching for %t% 325 in market ES, current songs length 10158
Searching for n% 183 in market EE, current songs length 10168
Searching for %n% 217 in market HU, current songs length 10177
Searching for a% 305 in market HK, current songs length 10192
Searching for %y% 228 in market NO, current songs length 10198
Searching for %t% 306 in market BO, current songs length 10209
Searching for %s% 216 in market MX, current songs length 10216
Searching for e% 310 in market GB, current songs length 10234
Searching for %g% 475 in market CH, current songs length 1025

Searching for k% 303 in market FR, current songs length 11979
Searching for %f% 415 in market NO, current songs length 11995
Searching for u% 197 in market LT, current songs length 12009
Searching for %o% 290 in market AU, current songs length 12021
Searching for a% 106 in market BE, current songs length 12031
Searching for j% 305 in market JP, current songs length 12036
Searching for %v% 310 in market PH, current songs length 12055
Searching for %p% 303 in market PY, current songs length 12065
Searching for o% 440 in market LV, current songs length 12078
Searching for %b% 299 in market ES, current songs length 12098
Searching for r% 110 in market TR, current songs length 12108
Searching for %d% 196 in market MT, current songs length 12113
Searching for %t% 490 in market EE, current songs length 12114
Searching for u% 57 in market TW, current songs length 12127
Searching for %c% 153 in market NL, current songs length 12131
Searching for n% 498 in market BG, current songs length 12132
S

Searching for %f% 18 in market HU, current songs length 13628
Searching for n% 71 in market ID, current songs length 13640
Searching for c% 320 in market DO, current songs length 13654
Searching for o% 161 in market LU, current songs length 13656
Searching for %i% 295 in market CR, current songs length 13660
Searching for k% 272 in market FI, current songs length 13677
Searching for %b% 201 in market CY, current songs length 13714
Searching for k% 63 in market US, current songs length 13715
Searching for v% 10 in market BE, current songs length 13719
Searching for q% 217 in market SV, current songs length 13733
Searching for k% 212 in market HK, current songs length 13746
Searching for x% 34 in market SE, current songs length 13748
Searching for d% 475 in market CA, current songs length 13764
Searching for r% 461 in market PT, current songs length 13783
Searching for h% 292 in market NZ, current songs length 13792
Searching for %p% 233 in market PL, current songs length 13815
Searching

The fetched songs are the following. Indeed, we see some famous artists like Bach, Drake and Bruno Mars, proving that the search has a preference to well-known ones.

In [5]:
songs

{('On Hold', '5w3CRkbTWXfbYepIdFpGUN', 'The xx'),
 ('Tranquilito', '2LJRoFOtMGLnSxJqlJuhge', 'Gerardo Ortiz'),
 ('Summertime Sadness (Lana Del Rey Vs. Cedric Gervais) - Cedric Gervais Remix',
  '1XZMileUddtQ2XwTMfUL3n',
  'Lana Del Rey'),
 ('Flower Shops (feat. Morgan Wallen)', '2ccuOtUjIyx3tPcsnpeBzJ', 'ERNEST'),
 ('Sad Song (feat. Elena Coats)', '05CrK6Q5VGtfPDtyQFJ4Kf', 'We The Kings'),
 ('Paramedic!', '2tPcTFiQF9MbVUyjZ3zDhA', 'SOB X RBE'),
 ('Wellerman - Sea Shanty / 220 KID x Billen Ted Remix',
  '4Ao22PDzNV4NeQ8skauZCZ',
  'Nathan Evans'),
 ('BAGDAD - Cap.7: Liturgia', '1zZ7vl1amOLI4GE5oUn0YB', 'ROSALÍA'),
 ('Today (feat. Gunna)', '7z3rc7SCgjbAztt1cbXkVj', 'Cordae'),
 ('En Cero', '0QcexelVPB2NLdlcYTTmt6', 'Yandel'),
 ('Sin Ti Estoy Bien', '4RrOSjdnV8rkpIuOIfkKYS', 'Nanpa Básico'),
 ('Major (feat. Key Glock)', '3Yt2ph8Ko0JBANpdawzSF2', 'Young Dolph'),
 ('Heat Waves', '4eVodI68OEjzC7MK8uyQPX', 'Glass Animals'),
 ('Hotline Bling', '0wwPcA6wtMf6HUMpIRdeP7', 'Drake'),
 ('Heroes (we c

In [6]:
len(songs)

15009

Then we put the songs to a DataFrame. The set was previously preferred for efficiency reasons.

In [7]:
df = pd.DataFrame()
df["id"] = [song[1] for song in songs]
df["title"] = [song[0] for song in songs]
df["main_artist"] = [song[2] for song in songs]

Afterwards, we fetch the audio features for the songs. Luckily, it is possible to fetch them in batches of 100 tracks, so it runs in a reasonable time period.

In [8]:
features = {}
all_track_ids = list(df['id'])

In [9]:
def get_features(start, num_tracks):
    while start < len(all_track_ids):
        print(f'getting from {start} to {start+num_tracks}')
        tracks_batch = all_track_ids[start:start+num_tracks]
        features_batch = sp.audio_features(tracks_batch)
        features.update({ track_id : track_features 
                        for track_id, track_features in zip(tracks_batch, features_batch) })
        start += num_tracks

In [10]:
get_features(0, 100)

getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200
getting from 1200 to 1300
getting from 1300 to 1400
getting from 1400 to 1500
getting from 1500 to 1600
getting from 1600 to 1700
getting from 1700 to 1800
getting from 1800 to 1900
getting from 1900 to 2000
getting from 2000 to 2100
getting from 2100 to 2200
getting from 2200 to 2300
getting from 2300 to 2400
getting from 2400 to 2500
getting from 2500 to 2600
getting from 2600 to 2700
getting from 2700 to 2800
getting from 2800 to 2900
getting from 2900 to 3000
getting from 3000 to 3100
getting from 3100 to 3200
getting from 3200 to 3300
getting from 3300 to 3400
getting from 3400 to 3500
getting from 3500 to 3600
getting from 3600 to 3700
getting from 3700 to 3800
getting from 3800 to 3900
getting

We then add features to a DataFrame and merge it with the previous DataFrame.

In [11]:
tracks = pd.DataFrame.from_dict(features)
tracks = tracks.T
tracks = tracks.reset_index().rename(columns={'index' : 'song_id'})
tracks

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,5w3CRkbTWXfbYepIdFpGUN,0.723,0.669,0,-6.784,1,0.0348,0.0522,0.183,0.109,0.349,125.053,audio_features,5w3CRkbTWXfbYepIdFpGUN,spotify:track:5w3CRkbTWXfbYepIdFpGUN,https://api.spotify.com/v1/tracks/5w3CRkbTWXfb...,https://api.spotify.com/v1/audio-analysis/5w3C...,224133,4
1,2LJRoFOtMGLnSxJqlJuhge,0.858,0.634,0,-4.058,1,0.0286,0.518,0,0.236,0.909,110.03,audio_features,2LJRoFOtMGLnSxJqlJuhge,spotify:track:2LJRoFOtMGLnSxJqlJuhge,https://api.spotify.com/v1/tracks/2LJRoFOtMGLn...,https://api.spotify.com/v1/audio-analysis/2LJR...,164360,3
2,1XZMileUddtQ2XwTMfUL3n,0.572,0.81,1,-5.791,0,0.0557,0.0147,0.000007,0.13,0.106,126.045,audio_features,1XZMileUddtQ2XwTMfUL3n,spotify:track:1XZMileUddtQ2XwTMfUL3n,https://api.spotify.com/v1/tracks/1XZMileUddtQ...,https://api.spotify.com/v1/audio-analysis/1XZM...,214912,4
3,2ccuOtUjIyx3tPcsnpeBzJ,0.527,0.461,7,-5.908,1,0.0269,0.118,0,0.0831,0.227,128.153,audio_features,2ccuOtUjIyx3tPcsnpeBzJ,spotify:track:2ccuOtUjIyx3tPcsnpeBzJ,https://api.spotify.com/v1/tracks/2ccuOtUjIyx3...,https://api.spotify.com/v1/audio-analysis/2ccu...,214405,3
4,05CrK6Q5VGtfPDtyQFJ4Kf,0.512,0.526,1,-5.44,1,0.0251,0.0724,0,0.0675,0.249,85.024,audio_features,05CrK6Q5VGtfPDtyQFJ4Kf,spotify:track:05CrK6Q5VGtfPDtyQFJ4Kf,https://api.spotify.com/v1/tracks/05CrK6Q5VGtf...,https://api.spotify.com/v1/audio-analysis/05Cr...,226330,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15004,4wo0NtF3OzPrYiugNbUDvT,0.81,0.784,3,-3.223,0,0.0622,0.0874,0.000166,0.083,0.817,88.001,audio_features,4wo0NtF3OzPrYiugNbUDvT,spotify:track:4wo0NtF3OzPrYiugNbUDvT,https://api.spotify.com/v1/tracks/4wo0NtF3OzPr...,https://api.spotify.com/v1/audio-analysis/4wo0...,254907,4
15005,630Ug0XtmhhFvAKo0PNuEI,0.84,0.344,5,-8.613,0,0.0374,0.456,0.000034,0.35,0.526,111.994,audio_features,630Ug0XtmhhFvAKo0PNuEI,spotify:track:630Ug0XtmhhFvAKo0PNuEI,https://api.spotify.com/v1/tracks/630Ug0XtmhhF...,https://api.spotify.com/v1/audio-analysis/630U...,199227,4
15006,5HujDMiRNqZwEzVAh3oFD3,0.733,0.784,3,-5.211,1,0.0606,0.0927,0,0.195,0.377,120.016,audio_features,5HujDMiRNqZwEzVAh3oFD3,spotify:track:5HujDMiRNqZwEzVAh3oFD3,https://api.spotify.com/v1/tracks/5HujDMiRNqZw...,https://api.spotify.com/v1/audio-analysis/5Huj...,164000,4
15007,439TlnnznSiBbQbgXiBqAd,0.487,0.729,2,-6.815,1,0.271,0.0538,0.000004,0.44,0.217,91.048,audio_features,439TlnnznSiBbQbgXiBqAd,spotify:track:439TlnnznSiBbQbgXiBqAd,https://api.spotify.com/v1/tracks/439TlnnznSiB...,https://api.spotify.com/v1/audio-analysis/439T...,350120,4


In [12]:
tracks = tracks.merge(df, left_on="song_id", right_on="id")


In [13]:
tracks

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,...,type,id_x,uri,track_href,analysis_url,duration_ms,time_signature,id_y,title,main_artist
0,5w3CRkbTWXfbYepIdFpGUN,0.723,0.669,0,-6.784,1,0.0348,0.0522,0.183,0.109,...,audio_features,5w3CRkbTWXfbYepIdFpGUN,spotify:track:5w3CRkbTWXfbYepIdFpGUN,https://api.spotify.com/v1/tracks/5w3CRkbTWXfb...,https://api.spotify.com/v1/audio-analysis/5w3C...,224133,4,5w3CRkbTWXfbYepIdFpGUN,On Hold,The xx
1,2LJRoFOtMGLnSxJqlJuhge,0.858,0.634,0,-4.058,1,0.0286,0.518,0,0.236,...,audio_features,2LJRoFOtMGLnSxJqlJuhge,spotify:track:2LJRoFOtMGLnSxJqlJuhge,https://api.spotify.com/v1/tracks/2LJRoFOtMGLn...,https://api.spotify.com/v1/audio-analysis/2LJR...,164360,3,2LJRoFOtMGLnSxJqlJuhge,Tranquilito,Gerardo Ortiz
2,1XZMileUddtQ2XwTMfUL3n,0.572,0.81,1,-5.791,0,0.0557,0.0147,0.000007,0.13,...,audio_features,1XZMileUddtQ2XwTMfUL3n,spotify:track:1XZMileUddtQ2XwTMfUL3n,https://api.spotify.com/v1/tracks/1XZMileUddtQ...,https://api.spotify.com/v1/audio-analysis/1XZM...,214912,4,1XZMileUddtQ2XwTMfUL3n,Summertime Sadness (Lana Del Rey Vs. Cedric Ge...,Lana Del Rey
3,2ccuOtUjIyx3tPcsnpeBzJ,0.527,0.461,7,-5.908,1,0.0269,0.118,0,0.0831,...,audio_features,2ccuOtUjIyx3tPcsnpeBzJ,spotify:track:2ccuOtUjIyx3tPcsnpeBzJ,https://api.spotify.com/v1/tracks/2ccuOtUjIyx3...,https://api.spotify.com/v1/audio-analysis/2ccu...,214405,3,2ccuOtUjIyx3tPcsnpeBzJ,Flower Shops (feat. Morgan Wallen),ERNEST
4,05CrK6Q5VGtfPDtyQFJ4Kf,0.512,0.526,1,-5.44,1,0.0251,0.0724,0,0.0675,...,audio_features,05CrK6Q5VGtfPDtyQFJ4Kf,spotify:track:05CrK6Q5VGtfPDtyQFJ4Kf,https://api.spotify.com/v1/tracks/05CrK6Q5VGtf...,https://api.spotify.com/v1/audio-analysis/05Cr...,226330,4,05CrK6Q5VGtfPDtyQFJ4Kf,Sad Song (feat. Elena Coats),We The Kings
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15004,4wo0NtF3OzPrYiugNbUDvT,0.81,0.784,3,-3.223,0,0.0622,0.0874,0.000166,0.083,...,audio_features,4wo0NtF3OzPrYiugNbUDvT,spotify:track:4wo0NtF3OzPrYiugNbUDvT,https://api.spotify.com/v1/tracks/4wo0NtF3OzPr...,https://api.spotify.com/v1/audio-analysis/4wo0...,254907,4,4wo0NtF3OzPrYiugNbUDvT,Ahora,J Balvin
15005,630Ug0XtmhhFvAKo0PNuEI,0.84,0.344,5,-8.613,0,0.0374,0.456,0.000034,0.35,...,audio_features,630Ug0XtmhhFvAKo0PNuEI,spotify:track:630Ug0XtmhhFvAKo0PNuEI,https://api.spotify.com/v1/tracks/630Ug0XtmhhF...,https://api.spotify.com/v1/audio-analysis/630U...,199227,4,630Ug0XtmhhFvAKo0PNuEI,Remember Me,UMI
15006,5HujDMiRNqZwEzVAh3oFD3,0.733,0.784,3,-5.211,1,0.0606,0.0927,0,0.195,...,audio_features,5HujDMiRNqZwEzVAh3oFD3,spotify:track:5HujDMiRNqZwEzVAh3oFD3,https://api.spotify.com/v1/tracks/5HujDMiRNqZw...,https://api.spotify.com/v1/audio-analysis/5Huj...,164000,4,5HujDMiRNqZwEzVAh3oFD3,farfalle,sangiovanni
15007,439TlnnznSiBbQbgXiBqAd,0.487,0.729,2,-6.815,1,0.271,0.0538,0.000004,0.44,...,audio_features,439TlnnznSiBbQbgXiBqAd,spotify:track:439TlnnznSiBbQbgXiBqAd,https://api.spotify.com/v1/tracks/439TlnnznSiB...,https://api.spotify.com/v1/audio-analysis/439T...,350120,4,439TlnnznSiBbQbgXiBqAd,m.A.A.d city,Kendrick Lamar


Finally we save the songs to a file so as to use them in the analysis notebook.

In [14]:
tracks.to_csv("songs15000.csv", index=False)

## Test Data

**spotify_ids.txt** contains the track ids given to test the models, so we initially read the file.

In [15]:
test_ids = open("spotify_ids.txt", "r")
all_track_ids = test_ids.readlines()
all_track_ids

['7lPN2DXiMsVn7XUKtOW1CS\n',
 '5QO79kh1waicV47BqGRL3g\n',
 '0VjIjW4GlUZAMYd2vXMi3b\n',
 '4MzXwWMhyBbmu6hOcLVD49\n',
 '5Kskr9LcNYa0tpt5f0ZEJx\n',
 '6tDDoYIxWvMLTdKpjFkc1B\n',
 '3VT8hOC5vuDXBsHrR53WFh\n',
 '1xK1Gg9SxG8fy2Ya373oqb\n',
 '6f3Slt0GbA2bPZlz0aIFXN\n',
 '3tjFYV6RSFtuktYl3ZtYcq\n',
 '27OeeYzk6klgBh83TSvGMA\n',
 '2XIc1pqjXV3Cr2BQUGNBck\n',
 '60ynsPSSKe6O3sfwRnIBRf\n',
 '1M4OcYkxAtu3ErzSgDEfoi\n',
 '3YJJjQPAbDT7mGpX3WtQ9A\n',
 '5nujrmhLynf4yMoMtj8AQF\n',
 '1t9WgS8FN0534tLBRwbaxO\n',
 '7vrJn5hDSXRmdXoR30KgF1\n',
 '4saklk6nie3yiGePpBwUoc\n',
 '3FAJ6O0NOHQV8Mc5Ri6ENp\n',
 '0D75ciM842cdUMKSMfAR9y\n',
 '35mvY5S1H3J2QZyna3TFe0\n',
 '6Im9k8u9iIzKMrmV7BWtlF\n',
 '5YYW3yRktprLRr47WK219Y\n',
 '7hxHWCCAIIxFLCzvDgnQHX\n',
 '3VvA1wSxukMLsvXoXtlwWx\n',
 '1diS6nkxMQc3wwC4G1j0bh\n',
 '6ft4hAq6yde8jPZY2i5zLr\n',
 '7qEHsqek33rTcFNT9PFqLf\n',
 '31qCy5ZaophVA81wtlwLc4\n',
 '3iw6V4LH7yPj1ESORX9RIN\n',
 '45bE4HXI0AwGZXfZtMp8JR\n',
 '02MWAaffLxlfxAUY7c5dvx\n',
 '1tkg4EHVoqnhR6iFEXb60y\n',
 '5uEYRdEIh9Bo

Then, we remove the **\n** character.

In [16]:
all_track_ids = [x.split()[0] for x in all_track_ids]
all_track_ids

['7lPN2DXiMsVn7XUKtOW1CS',
 '5QO79kh1waicV47BqGRL3g',
 '0VjIjW4GlUZAMYd2vXMi3b',
 '4MzXwWMhyBbmu6hOcLVD49',
 '5Kskr9LcNYa0tpt5f0ZEJx',
 '6tDDoYIxWvMLTdKpjFkc1B',
 '3VT8hOC5vuDXBsHrR53WFh',
 '1xK1Gg9SxG8fy2Ya373oqb',
 '6f3Slt0GbA2bPZlz0aIFXN',
 '3tjFYV6RSFtuktYl3ZtYcq',
 '27OeeYzk6klgBh83TSvGMA',
 '2XIc1pqjXV3Cr2BQUGNBck',
 '60ynsPSSKe6O3sfwRnIBRf',
 '1M4OcYkxAtu3ErzSgDEfoi',
 '3YJJjQPAbDT7mGpX3WtQ9A',
 '5nujrmhLynf4yMoMtj8AQF',
 '1t9WgS8FN0534tLBRwbaxO',
 '7vrJn5hDSXRmdXoR30KgF1',
 '4saklk6nie3yiGePpBwUoc',
 '3FAJ6O0NOHQV8Mc5Ri6ENp',
 '0D75ciM842cdUMKSMfAR9y',
 '35mvY5S1H3J2QZyna3TFe0',
 '6Im9k8u9iIzKMrmV7BWtlF',
 '5YYW3yRktprLRr47WK219Y',
 '7hxHWCCAIIxFLCzvDgnQHX',
 '3VvA1wSxukMLsvXoXtlwWx',
 '1diS6nkxMQc3wwC4G1j0bh',
 '6ft4hAq6yde8jPZY2i5zLr',
 '7qEHsqek33rTcFNT9PFqLf',
 '31qCy5ZaophVA81wtlwLc4',
 '3iw6V4LH7yPj1ESORX9RIN',
 '45bE4HXI0AwGZXfZtMp8JR',
 '02MWAaffLxlfxAUY7c5dvx',
 '1tkg4EHVoqnhR6iFEXb60y',
 '5uEYRdEIh9Bo4fpjDd4Na9',
 '6UelLqGlWMcVH1E5c4H7lY',
 '1J14CdDAvBTE1AJYUOwl6C',
 

We empty the features dictionary and fetch tracks' features.

In [17]:
features = {}
get_features(0, 100)


getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200


In [18]:
features

{'7lPN2DXiMsVn7XUKtOW1CS': {'danceability': 0.585,
  'energy': 0.436,
  'key': 10,
  'loudness': -8.761,
  'mode': 1,
  'speechiness': 0.0601,
  'acousticness': 0.721,
  'instrumentalness': 1.31e-05,
  'liveness': 0.105,
  'valence': 0.132,
  'tempo': 143.874,
  'type': 'audio_features',
  'id': '7lPN2DXiMsVn7XUKtOW1CS',
  'uri': 'spotify:track:7lPN2DXiMsVn7XUKtOW1CS',
  'track_href': 'https://api.spotify.com/v1/tracks/7lPN2DXiMsVn7XUKtOW1CS',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/7lPN2DXiMsVn7XUKtOW1CS',
  'duration_ms': 242014,
  'time_signature': 4},
 '5QO79kh1waicV47BqGRL3g': {'danceability': 0.68,
  'energy': 0.826,
  'key': 0,
  'loudness': -5.487,
  'mode': 1,
  'speechiness': 0.0309,
  'acousticness': 0.0212,
  'instrumentalness': 1.24e-05,
  'liveness': 0.543,
  'valence': 0.644,
  'tempo': 118.051,
  'type': 'audio_features',
  'id': '5QO79kh1waicV47BqGRL3g',
  'uri': 'spotify:track:5QO79kh1waicV47BqGRL3g',
  'track_href': 'https://api.spotify.com/v1/tr

Finally, we save them to a DataFrame and afterwards to a file. There is no need to fetch the names or artists of the songs, since we are not interested to understand them, they will be needed only for testing.

In [19]:
tracks = pd.DataFrame.from_dict(features)
tracks = tracks.T
tracks = tracks.reset_index().rename(columns={'index' : 'song_id'})
tracks

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,7lPN2DXiMsVn7XUKtOW1CS,0.585,0.436,10,-8.761,1,0.0601,0.721,0.000013,0.105,0.132,143.874,audio_features,7lPN2DXiMsVn7XUKtOW1CS,spotify:track:7lPN2DXiMsVn7XUKtOW1CS,https://api.spotify.com/v1/tracks/7lPN2DXiMsVn...,https://api.spotify.com/v1/audio-analysis/7lPN...,242014,4
1,5QO79kh1waicV47BqGRL3g,0.68,0.826,0,-5.487,1,0.0309,0.0212,0.000012,0.543,0.644,118.051,audio_features,5QO79kh1waicV47BqGRL3g,spotify:track:5QO79kh1waicV47BqGRL3g,https://api.spotify.com/v1/tracks/5QO79kh1waic...,https://api.spotify.com/v1/audio-analysis/5QO7...,215627,4
2,0VjIjW4GlUZAMYd2vXMi3b,0.514,0.73,1,-5.934,1,0.0598,0.00146,0.000095,0.0897,0.334,171.005,audio_features,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,200040,4
3,4MzXwWMhyBbmu6hOcLVD49,0.731,0.573,4,-10.059,0,0.0544,0.401,0.000052,0.113,0.145,109.928,audio_features,4MzXwWMhyBbmu6hOcLVD49,spotify:track:4MzXwWMhyBbmu6hOcLVD49,https://api.spotify.com/v1/tracks/4MzXwWMhyBbm...,https://api.spotify.com/v1/audio-analysis/4MzX...,205090,4
4,5Kskr9LcNYa0tpt5f0ZEJx,0.907,0.393,4,-7.636,0,0.0539,0.451,0.000001,0.135,0.202,104.949,audio_features,5Kskr9LcNYa0tpt5f0ZEJx,spotify:track:5Kskr9LcNYa0tpt5f0ZEJx,https://api.spotify.com/v1/tracks/5Kskr9LcNYa0...,https://api.spotify.com/v1/audio-analysis/5Ksk...,205458,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1157,4lUmnwRybYH7mMzf16xB0y,0.596,0.65,9,-5.167,1,0.337,0.138,0,0.14,0.188,133.997,audio_features,4lUmnwRybYH7mMzf16xB0y,spotify:track:4lUmnwRybYH7mMzf16xB0y,https://api.spotify.com/v1/tracks/4lUmnwRybYH7...,https://api.spotify.com/v1/audio-analysis/4lUm...,257428,4
1158,1fzf9Aad4y1RWrmwosAK5y,0.588,0.85,4,-6.431,1,0.0318,0.168,0.00202,0.0465,0.768,93.003,audio_features,1fzf9Aad4y1RWrmwosAK5y,spotify:track:1fzf9Aad4y1RWrmwosAK5y,https://api.spotify.com/v1/tracks/1fzf9Aad4y1R...,https://api.spotify.com/v1/audio-analysis/1fzf...,187310,4
1159,3E3pb3qH11iny6TFDJvsg5,0.754,0.66,0,-6.811,1,0.267,0.179,0,0.194,0.316,83.0,audio_features,3E3pb3qH11iny6TFDJvsg5,spotify:track:3E3pb3qH11iny6TFDJvsg5,https://api.spotify.com/v1/tracks/3E3pb3qH11in...,https://api.spotify.com/v1/audio-analysis/3E3p...,209299,4
1160,3yTkoTuiKRGL2VAlQd7xsC,0.584,0.836,0,-4.925,1,0.079,0.0558,0,0.0663,0.484,104.973,audio_features,3yTkoTuiKRGL2VAlQd7xsC,spotify:track:3yTkoTuiKRGL2VAlQd7xsC,https://api.spotify.com/v1/tracks/3yTkoTuiKRGL...,https://api.spotify.com/v1/audio-analysis/3yTk...,202204,4


In [20]:
tracks.to_csv("test_tracks.csv", index=False)