# Spotify Exploration

As of now (8/17/2023) I have only done this in R. So before we create the functions, we will recreate the functionality of my R script. 

1. Install Necessary Packages
2. Set system environments appropriately for Spotify ID's
3. Experiment with the Spotipy package
a. Get songs from an artist
b. Using track_id, get the song info (length, tempo, danceability, energy)
4. How many songs do I have saved?
5. Read in names of all songs
6. Get relevant data from all songs
7. Manage the dataframe
8. Create playlist groupings by tempo
9. Create playlist groupings by danceability
10. Create playlist groupings by energy
11. Write Playlist
12. Work with functions

# 1. Import Packages

In [5]:
import config
import pandas as pd
import numpy as np
import os
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import math
import time

# 2. Authorization

Here we'll keep an example code from Spotipy for client authorizaiton (search spotify database but not any user data)
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

auth_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=auth_manager)

playlists = sp.user_playlists('spotify')
while playlists:
    for i, playlist in enumerate(playlists['items']):
        print("%4d %s %s" % (i + 1 + playlists['offset'], playlist['uri'],  playlist['name']))
    if playlists['next']:
        playlists = sp.next(playlists)
    else:
        playlists = None

Set system environments

In [6]:
os.environ["SPOTIPY_CLIENT_ID"] = config.SPOTIPY_CLIENT_ID
os.environ["SPOTIPY_CLIENT_SECRET"] = config.SPOTIPY_CLIENT_SECRET
os.environ["SPOTIPY_REDIRECT_URI"] = config.SPOTIPY_REDIRECT_URI

Set scope & do a data pull

* Note that "user-library-read" is the only necessary scope to read saved tracks

In [7]:
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

profile = sp.current_user()
results = sp.current_user_saved_tracks(limit = 20, offset = 0, market = None)

# Item is a specific "row". Or a single song
for idx, item in enumerate(results['items']):
    
    track = item['track']
    print(idx, track['artists'][0]['name'], " – ", track['name'])

0 Lil Uzi Vert  –  Watch This - ARIZONATEARS Pluggnb Remix
1 Post Malone  –  Don't Understand
2 Dominic Fike  –  3 Nights
3 Dominic Fike  –  Double Negative (Skeleton Milkshake)
4 Post Malone  –  Something Real
5 Lit  –  My Own Worst Enemy
6 J. Cole  –  h u n g e r . o n . h i l l s i d e (with Bas)
7 J. Cole  –  a p p l y i n g . p r e s s u r e
8 J. Cole  –  m y . l i f e (with 21 Savage & Morray)
9 J. Cole  –  p u n c h i n ‘ . t h e . c l o c k
10 J. Cole  –  9 5 . s o u t h
11 J. Cole  –  a m a r i
12 BlocBoy JB  –  Look Alive (feat. Drake)
13 J. Cole  –  1 0 0 . m i l ‘ (with Bas)
14 jxdn  –  ANGELS & DEMONS
15 The White Stripes  –  Seven Nation Army
16 Nirvana  –  Come As You Are
17 Sublime  –  Santeria
18 100 gecs  –  Hollywood Baby
19 Lovejoy  –  Call Me What You Like


For each song that you get from current_user_saved_tracks, the variable will have two pieces ('added at', and 'track')

In [4]:
print('you have', results['total'], 'saved songs')

you have 4412 saved songs


In [5]:
profile

{'display_name': 'Dante Goss',
 'external_urls': {'spotify': 'https://open.spotify.com/user/1218158724'},
 'href': 'https://api.spotify.com/v1/users/1218158724',
 'id': '1218158724',
 'images': [{'url': 'https://scontent-sea1-1.xx.fbcdn.net/v/t39.30808-1/287739885_5170783559641650_7329131775805179735_n.jpg?stp=cp0_dst-jpg_p50x50&_nc_cat=108&ccb=1-7&_nc_sid=dbb9e7&_nc_ohc=k_WMIxgYS_gAX_9-18Z&_nc_ht=scontent-sea1-1.xx&edm=AP4hL3IEAAAA&oh=00_AfB4chcvxYSWVwW_RFoGMcsuq0Kr59kgMHX3OLWHHW9N_g&oe=64E583B5',
   'height': 64,
   'width': 64},
  {'url': 'https://scontent-sea1-1.xx.fbcdn.net/v/t39.30808-1/287739885_5170783559641650_7329131775805179735_n.jpg?stp=dst-jpg_p320x320&_nc_cat=108&ccb=1-7&_nc_sid=0c64ff&_nc_ohc=k_WMIxgYS_gAX_9-18Z&_nc_ht=scontent-sea1-1.xx&edm=AP4hL3IEAAAA&oh=00_AfBoB0rWAKun62cOmvLti53YbXDdTInKzdxcL9HUTjV2qw&oe=64E583B5',
   'height': 300,
   'width': 300}],
 'type': 'user',
 'uri': 'spotify:user:1218158724',
 'followers': {'href': None, 'total': 15},
 'country': 'US',
 'p

In [6]:
# item is a dictionary
print('track id: ', item['track']['id'])

print('track name: ', item['track']['name'])

print('track length in seconds: ',item['track']['duration_ms']/1000)

print('album name: ',item['track']['album']['name'])

print('first artist: ', item['track']['artists'][0]['name'])

print('number of artists: ', len(item['track']['artists']))


track id:  2QF8FbGBTXTzm0CRUWqndE
track name:  Call Me What You Like
track length in seconds:  226.96
album name:  Wake Up & It's Over
first artist:  Lovejoy
number of artists:  1


In [7]:
# First get a loop that will capture the data we want into a df
d =[]
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))
result = sp.current_user_saved_tracks(limit = 20, offset = 0, market = None)
for item in enumerate(result['items']):
    d.append(
        {
            'Track_Name' : item['track']['name'],
            'Track_Id' : item['track']['id'],
            'Artist_Name' : item['track']['artists'][0]['name'],
            'Artist_Num' : len(item['track']['artists']),
            'Track_Length' : (item['track']['duration_ms']/1000)
        }
    )
d2 = pd.DataFrame(d)
d2.head()

# This loop didnt work

TypeError: tuple indices must be integers or slices, not str

In [8]:
# Now lets make a new loop

# Initialize df
df = pd.DataFrame(columns=['Track_Name','Track_ID','Artist_Name','Artist_Num','Track_Len'])

# Set scope
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
# Authorize
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))
# Get saved tracks
result = sp.current_user_saved_tracks(limit = 20, offset = 0, market = None)
# Loop through Saved tracks
for item in result['items']:
    track = item['track']
    df = df.append({
        'Track_Name': track['name'],
        'Track_ID' : track['id'],
        'Artist_Name' : track['artists'][0]['name'],
        'Artist_Num' : len(track['artists']),
        'Track_Len' : track['duration_ms']/1000
    })
# Note given that frame.append is deprecated and will be removed. Use pandas.concat

  df = df.append({


TypeError: Can only append a dict if ignore_index=True

In [9]:
# Loop 3

# Initialize df
df = []
# Set scope
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
# Authorize
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))
# Get saved tracks
result = sp.current_user_saved_tracks(limit = 20, offset = 0, market = None)

# Loop through saved tracks
for item in result['items']:
    track = item['track']
    
    # Assign Variables
    Track_Name = track['name']
    Track_ID = track['id']
    Artist_Name = track['artists'][0]['name']
    Artist_ID = track['artists'][0]['id']
    Artist_Num = len(track['artists'])
    Track_Len = track['duration_ms']/1000
    # Add values to df
    df.append({
        'Track_Name' : Track_Name,
        'Track_ID' : Track_ID,
        'Artist_Name' : Artist_Name,
        'Artist_ID' : Artist_ID,
        'Artist_Num' : Artist_Num,
        'Track_Len' : Track_Len
    })
df = pd.DataFrame(df, columns=['Track_Name','Track_ID','Artist_Name','Artist_ID','Artist_Num','Track_Len'])
print(df)
    

                                        Track_Name                Track_ID  \
0          Watch This - ARIZONATEARS Pluggnb Remix  0FA4wrjDJvJTTU8AepZTup   
1                                 Don't Understand  4MTuL20LF3pWebeJbcNh7p   
2                                         3 Nights  0uI7yAKUf52Cn7y3sYyjiX   
3             Double Negative (Skeleton Milkshake)  7ACT6YaXbYvl7hRWEOOEHQ   
4                                   Something Real  444vevlQjTnKioLLncteGv   
5                               My Own Worst Enemy  33iv3wnGMrrDugd7GBso1z   
6   h u n g e r . o n . h i l l s i d e (with Bas)  5BwQjRasNcdRPuVWKcHto2   
7                a p p l y i n g . p r e s s u r e  1d7q712nXjG98HiwHk7HFS   
8          m y . l i f e (with 21 Savage & Morray)  1D3z6HTiQsNmZxjl7F7eoG   
9              p u n c h i n ‘ . t h e . c l o c k  57ZUX6TNyKLBydAdVVd02x   
10                                 9 5 . s o u t h  5R691ipUYRDYW6ehapjoj6   
11                                       a m a r i  2cnKST6T9qUo

In [9]:
# Now we need to make a loop that will go through all saved songs
# Set scope
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
# Authorize
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))
# Step 1: How many saved songs do we have?
# You can get this by doing a current_user_saved_tracks search
song = sp.current_user_saved_tracks(limit = 1, offset = 0, market = None)
num_songs = song['total']

# Initialize Variables
track_name = [] 
track_id = []
artist_name = []
artist_id = []
artist_num = []
track_len = []
# The limit for current_user_saved_tracks is 20, so we need to round up to the nearest multiple of 20
# in the math package the ceil function will round up
num_loops = math.ceil(song['total']/20)

# Now make a big loop that will go through it all
for i in range(0,num_loops+1):
    print('Loop Iteration',i+1)
    result = sp.current_user_saved_tracks(limit = 20, offset = (i*20), market = None)
    # sleep
    time.sleep(3)# 3 second sleep
    # Loop through saved tracks
    for item in result['items']:
        track = item['track']
        track_name.append(track['name'])
        track_id.append(track['id'])   
        artist_name.append(track['artists'][0]['name'])
        artist_id.append(track['artists'][0]['id'])
        artist_num.append(len(track['artists']))
        track_len.append(track['duration_ms']/1000)
# Convert to DF
Track_Name=pd.DataFrame(track_name,columns=['Track_Name'])
Track_ID=pd.DataFrame(track_id,columns=['Track_ID'])
Artist_Name=pd.DataFrame(artist_name,columns=['Artist_Name'])
Artist_ID=pd.DataFrame(artist_id,columns=['Artist_ID'])
Artist_Num=pd.DataFrame(artist_num,columns=['Artist_Num'])
Track_Len=pd.DataFrame(track_len,columns=['Track_Len'])
# Combine
df = pd.concat([Track_Name,Track_ID,Artist_Name,Artist_ID,Artist_Num,Track_Len],axis =1)
df



Loop Iteration 1
Loop Iteration 2


KeyboardInterrupt: 

In [11]:
# Now we can do a loop that gets key info for each track
# This version will be slow because it's one at a time.
# Set scope
#scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
# Authorize
#sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

#trackfeatures = []
#for tracks in track_id:
    #trackfeatures.append( sp.audio_features(tracks = tracks) )
    #time.sleep(3)
#track_features = pd.DataFrame(trackfeatures)

# Step 1: How many saved songs do we have?
# You can get this by doing a current_user_saved_tracks search
song = sp.current_user_saved_tracks(limit = 1, offset = 0, market = None)
num_songs = song['total']
num_loops = math.ceil(song['total']/20)

# Realistically, we can do up to 100 searches at a time from the spotipy package
scope = "user-library-read playlist-read-private playlist-modify-public playlist-modify-private"
# Authorize
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope))

trackfeatures = pd.DataFrame(columns = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms',
       'time_signature'])
for i in range(0,num_loops):
    firstnum = i*20
    secondnum = firstnum +20
    print(firstnum,secondnum)
    audio = (sp.audio_features(tracks = track_id[firstnum:secondnum]))
    audio = pd.DataFrame(audio)
    trackfeatures = pd.concat([trackfeatures,audio],axis=0)
    time.sleep(2.1)
    idxval = num_loops+1-i
    print(idxval,'Loops to go! :)')
    if idxval <1:
        print("\a")# alarm?
        print('We have liftoff!')
        print('Come back to your code!')

trackfeatures

0 20
222 Loops to go! :)
20 40
221 Loops to go! :)
40 60
220 Loops to go! :)
60 80
219 Loops to go! :)
80 100
218 Loops to go! :)
100 120
217 Loops to go! :)
120 140
216 Loops to go! :)
140 160
215 Loops to go! :)
160 180
214 Loops to go! :)
180 200
213 Loops to go! :)
200 220
212 Loops to go! :)
220 240
211 Loops to go! :)
240 260
210 Loops to go! :)
260 280
209 Loops to go! :)
280 300
208 Loops to go! :)
300 320
207 Loops to go! :)
320 340
206 Loops to go! :)
340 360
205 Loops to go! :)
360 380
204 Loops to go! :)
380 400
203 Loops to go! :)
400 420
202 Loops to go! :)
420 440
201 Loops to go! :)
440 460
200 Loops to go! :)
460 480
199 Loops to go! :)
480 500
198 Loops to go! :)
500 520
197 Loops to go! :)
520 540
196 Loops to go! :)
540 560
195 Loops to go! :)
560 580
194 Loops to go! :)
580 600
193 Loops to go! :)
600 620
192 Loops to go! :)
620 640
191 Loops to go! :)
640 660
190 Loops to go! :)
660 680
189 Loops to go! :)
680 700
188 Loops to go! :)
700 720
187 Loops to go! :)
72

NameError: name 'track_features' is not defined

In [39]:
test = pd.DataFrame(columns = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
       'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms',
       'time_signature'])
#test = sp.audio_features(tracks = track_id[0:20])
#test = pd.DataFrame(test)

test2 = sp.audio_features(tracks = track_id[20:40])
test2 = pd.DataFrame(test2)
pd.concat([test,test2])
#test2.columns

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.754,0.646,7,-5.795,1,0.317,0.152,1.8e-05,0.108,0.429,176.089,audio_features,0fea68AdmYNygeTGI4RC18,spotify:track:0fea68AdmYNygeTGI4RC18,https://api.spotify.com/v1/tracks/0fea68AdmYNy...,https://api.spotify.com/v1/audio-analysis/0fea...,242573,4
1,0.82,0.752,1,-7.635,0,0.0648,0.0584,0.00417,0.628,0.423,127.999,audio_features,4hceSKjrkDTO0nMKFcb3sj,spotify:track:4hceSKjrkDTO0nMKFcb3sj,https://api.spotify.com/v1/tracks/4hceSKjrkDTO...,https://api.spotify.com/v1/audio-analysis/4hce...,187500,4
2,0.668,0.875,1,-4.829,1,0.0411,0.00285,3e-06,0.151,0.703,163.036,audio_features,1PkbdjdrsV2lFMzT7q9MlS,spotify:track:1PkbdjdrsV2lFMzT7q9MlS,https://api.spotify.com/v1/tracks/1PkbdjdrsV2l...,https://api.spotify.com/v1/audio-analysis/1Pkb...,178045,4
3,0.574,0.79,0,-5.541,1,0.0411,0.00443,0.0,0.121,0.44,106.997,audio_features,3FtQes77xlbS9QTVts7p2u,spotify:track:3FtQes77xlbS9QTVts7p2u,https://api.spotify.com/v1/tracks/3FtQes77xlbS...,https://api.spotify.com/v1/audio-analysis/3FtQ...,204953,4
4,0.419,0.729,1,-5.1,0,0.0586,0.0429,0.0,0.19,0.0964,155.057,audio_features,6V1TqJxtw3P0ouCKICvl9l,spotify:track:6V1TqJxtw3P0ouCKICvl9l,https://api.spotify.com/v1/tracks/6V1TqJxtw3P0...,https://api.spotify.com/v1/audio-analysis/6V1T...,197287,5
5,0.555,0.729,1,-5.062,1,0.0443,0.000374,0.000139,0.347,0.482,139.864,audio_features,3t0ic4mkhvhamrKDkulB8v,spotify:track:3t0ic4mkhvhamrKDkulB8v,https://api.spotify.com/v1/tracks/3t0ic4mkhvha...,https://api.spotify.com/v1/audio-analysis/3t0i...,147680,4
6,0.531,0.89,2,-6.308,1,0.144,0.00198,0.497,0.0913,0.411,169.963,audio_features,7B0gxo0jQCy5Lk93RIODAC,spotify:track:7B0gxo0jQCy5Lk93RIODAC,https://api.spotify.com/v1/tracks/7B0gxo0jQCy5...,https://api.spotify.com/v1/audio-analysis/7B0g...,152559,4
7,0.822,0.498,2,-8.47,1,0.245,0.0188,0.0,0.0989,0.207,94.017,audio_features,5e2jIB5KT9kmTDCBzAmvQr,spotify:track:5e2jIB5KT9kmTDCBzAmvQr,https://api.spotify.com/v1/tracks/5e2jIB5KT9km...,https://api.spotify.com/v1/audio-analysis/5e2j...,153199,4
8,0.821,0.783,2,-4.498,1,0.0423,0.118,0.0,0.137,0.333,106.989,audio_features,6vz3Fyhj6smbuYuaIZHksu,spotify:track:6vz3Fyhj6smbuYuaIZHksu,https://api.spotify.com/v1/tracks/6vz3Fyhj6smb...,https://api.spotify.com/v1/audio-analysis/6vz3...,226542,4
9,0.795,0.55,7,-5.704,0,0.0882,0.123,0.0,0.0873,0.152,119.975,audio_features,48rsYvIQXUAtxcmIoStOaM,spotify:track:48rsYvIQXUAtxcmIoStOaM,https://api.spotify.com/v1/tracks/48rsYvIQXUAt...,https://api.spotify.com/v1/audio-analysis/48rs...,131493,4


Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement winsound (from versions: none)
ERROR: No matching distribution found for winsound

[notice] A new release of pip is available: 23.0 -> 23.2.1
[notice] To update, run: C:\Users\ddg12\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [58]:
def NumSavedSongs():
    song = sp.current_user_saved_tracks(limit = 1, offset = 0, market = None)
    num_songs = song['total']
    return num_songs

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
