# Exploratory Analysis

## Eploring SpotiPy
- Explore available data
- Select features of interest
- Generate initial dataframe and database

###### Note:
###### - Potentially building a recomender system that can take a set of the most frequently played songs from one user, and match them with a second users profile; potentially between courting couples and friends
###### - Consider doing podcasts as a feature for the people that might be interested in that
###### - Also, consider calling the playlists `"{user_1} and {user_2}'s Playlist Baby"`

In [1]:
# imports

import sys
import json
import spotipy
import webbrowser
import numpy as np
import pandas as pd
from os import getenv
import spotipy.util as util
from dotenv import load_dotenv
from json.decoder import JSONDecodeError
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth


notes: pivoting given that we cannot create two playlists for two users simultaneously
- we can create a single play list for one user given the other user's library
- so what we'll do it take the two user libraries and generate a playlist for a single user
- given the music from a second play list

In [2]:
# We are using the client Module from the Python library for the Spotify API
# (https://spotipy.readthedocs.io/en/2.13.0/#module-spotipy.client)
# Client Credentials Flow


load_dotenv()  # this imports all .env variables

# Setting up env variables to connect to API
uri = getenv('uri') # must match in the Spotify app dashboard
SPOTIFY_CLIENT_ID = getenv('SPOTIFY_CLIENT_ID')
SPOTIFY_CLIENT_SECRET = getenv('SPOTIFY_CLIENT_SECRET')
username = getenv('USER_ID')  #  user who's data we are collecting
# scope = 'playlist-modify-public'  #  determines the kind of access you have to a user profile
# scope = 'user-top-read'
scope = 'user-library-read'

# Access token to obtain user info
token = util.prompt_for_user_token(username='spotify',
                                   client_id=SPOTIFY_CLIENT_ID,
                                   client_secret=SPOTIFY_CLIENT_SECRET,
                                   scope=scope,
                                  redirect_uri=uri)

# activating spotify session
spotify_session = spotipy.Spotify(auth=token)

## Goals 
- Connect to user library using [scopes](https://developer.spotify.com/documentation/general/guides/scopes/)
- Scopes to connect to are [user-library-read](https://developer.spotify.com/documentation/general/guides/scopes/#user-library-read), [playlist-modify-public](https://developer.spotify.com/documentation/general/guides/scopes/#playlist-modify-public), and [user-top-read](https://developer.spotify.com/documentation/general/guides/scopes/#user-top-read)

##### **The goal here is to cnnect to the users' respective libraries, analyze them, and create a new playlist.**
- For this analysis I will exlpore both the Audio Analysis Objects and the Audio Features Objects.  From the looks of it, the Audio Features Objects might provide better ensight regarding the machine learning model.

In [3]:
# Playing with the api: accessing user top read, modify playlist, read library

# User top artists
top_artists = spotify_session.current_user_top_artists(limit=1)

# User top tracks
top_tracks = spotify_session.current_user_top_tracks(limit=50, time_range='medium_term')

# Exploring Track_id
top_tracks['items'][0]['id'] == "0akyEssGRVHstqCSWXusJL"
top_track_id = top_tracks['items'][0]['id']

# Top ten tracks ids
top_50_tracks_id = [top_tracks['items'][x]['id'] for x in range(len(top_tracks['items']))]
top_50_tracks_id

# Top Track Audio Analysis Object
audio_anal = spotify_session.audio_analysis(top_track_id)

# Top track Audio Features Object
audio_feat = spotify_session.audio_features(tracks=top_50_tracks_id)

In [4]:
# Obtainiing track-name and artist name

top_tracks.keys()
top_tracks['items'][0].keys()
top_tracks['items'][0]['name']                # Generates track name
top_tracks['items'][0]['artists'][0]['name']  # Generates artist name

'Ed Maverick'

In [5]:
# Generating lists containing Artist names and Track names

# Lists to be populated
track_names_lst = []
artist_names_lst = []

# Iterates over top tracks to append track/artist name to correpsonding list
for t in range(len(top_50_tracks_id)):
    track_names_lst.append(top_tracks['items'][t]['name'])
    artist_names_lst.append(top_tracks['items'][t]['artists'][0]['name'])

print(track_names_lst, '\n', artist_names_lst)

['Fuentes de Ortiz', 'Llevo', 'Wru - (donde estás)', 'River', 'Diez Pasos Hacia Ti', 'Catorce', 'Baby Blue', 'Tú', 'Ropa De Bazar', 'Gracias Por Nada', 'Somos Algo', 'Staring - Acoustic', 'Si Nos Dejan', 'Fluir', 'From the Dining Table', 'Watermelon Sugar', 'lo que pienso', 'Me Gustas', "Let's Fall In Love For The Night - One World: Together At Home", 'Catch Me I’m Falling', "I Don't Know You", 'El Amor de Mi Vida - Versión Acústica', 'Canela', 'Las Vacas', 'Mujer Distante', 'Three Lovers', 'Ansiedad', 'Bonita', 'Sin Ti Estoy Bien', 'Nos Queda Mucho Dolor Por Recorrer', 'Vámonos a Marte', 'Vino Tinto', 'Tierrita Mojada', 'Azul (with Rodrigo Amarante)', 'Gravity', 'Agua Con Chía', 'No Sé Decirte No', 'Marry Yourself', 'Del río', 'Globos', 'Dormir Contigo', 'El Amante', 'siempreestoypati', 'Ojos Café', 'a mis amigos', 'Acurrucar', 'Lluvias de Mayo', 'Inside Friend (feat. John Mayer)', 'New Girl', 'MY SUN'] 
 ['Ed Maverick', 'Luke Martinez', 'Ed Maverick', 'Leon Bridges', 'Daniel, Me Está

In [6]:
# Creating a dataframe from the previously created lists

track_artist_names = pd.DataFrame(list(zip(track_names_lst, artist_names_lst)), columns=['track name', 'artist'])
track_artist_names[:10]

Unnamed: 0,track name,artist
0,Fuentes de Ortiz,Ed Maverick
1,Llevo,Luke Martinez
2,Wru - (donde estás),Ed Maverick
3,River,Leon Bridges
4,Diez Pasos Hacia Ti,"Daniel, Me Estás Matando"
5,Catorce,Sebastián Romero
6,Baby Blue,Kevin Kaarl
7,Tú,maye
8,Ropa De Bazar,Ed Maverick
9,Gracias Por Nada,Jordano


In [7]:
# Audio Analysis Objects

# See the following link for discriptive information on objects
# (https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-analysis/#time-interval-object)

# The following are one of three kinds of objects: 'Time interval'(TI), 'Sections'(Sc), or 'Segments'(Sg)
# audio_anal['bars']   # (TI) Segment of time defined as given number of beats
# audio_anal['beats']  # (TI) Time intervals of beats throughout track
# audio_anal['sections']  # (Sc) Defined by large variations in rhythm
# audio_anal['segments']  # (Sg) Song subdivisions, attempting to with each segment containing roughly a consistent sound
# audio_anal['tatums']  # (TI) Beats are subdivisions of bars, and Tatums are subdivisions of beats

# consider taking podcasts into this shit; make a playlist baby with fill in the bank
# print(audio_anal.keys(),
#       '\n'*2,
#       audio_anal['track'].keys(),
#       '\n'*2,
#       top_50_tracks_id)

In [8]:
# Audio Features Objects

# See the reference README file for a description of the Audio Features Objects, or explore the following link
# (https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)

# Note: 'key' is -1 if no key is detected.  Consider when training model, or processing data

# The following explores the keys of the Audio Features Object for a single track:
print("The following are the key value pairs contained in the audio features for 10 tracks:", '\n')

k_lst = list(audio_feat[0].keys())    # will eventually become the column names
lst_v_lst= []                         # a list of lists of values
for i in range(len(audio_feat)):      # for loop to run through the Audio Feat. Objects
    v_lst = []
    for _, v in audio_feat[i].items():  # for loop through the 0th item to append values
        v_lst.append(v)                 # values appended to list
    lst_v_lst.append(v_lst)         # list of values appending to list
# print(k_lst)
lst_v_lst[0]

The following are the key value pairs contained in the audio features for 10 tracks: 

['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms', 'time_signature']


[0.83,
 0.159,
 1,
 -14.461,
 1,
 0.0383,
 0.946,
 2.02e-05,
 0.362,
 0.189,
 104.95,
 'audio_features',
 '0akyEssGRVHstqCSWXusJL',
 'spotify:track:0akyEssGRVHstqCSWXusJL',
 'https://api.spotify.com/v1/tracks/0akyEssGRVHstqCSWXusJL',
 'https://api.spotify.com/v1/audio-analysis/0akyEssGRVHstqCSWXusJL',
 207400,
 4]

In [9]:
# Generating the dataframe for the tracks

# Take the two lists (K_LST, V_LST) and turn those into the dataframeh; the K_LST will be the column names
# V_LST will be the values; and they will be indexed by the track id's

top_tracks_df = pd.DataFrame(lst_v_lst, columns=k_lst)
top_tracks_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.83,0.159,1,-14.461,1,0.0383,0.946,2e-05,0.362,0.189,104.95,audio_features,0akyEssGRVHstqCSWXusJL,spotify:track:0akyEssGRVHstqCSWXusJL,https://api.spotify.com/v1/tracks/0akyEssGRVHs...,https://api.spotify.com/v1/audio-analysis/0aky...,207400,4
1,0.726,0.125,5,-9.194,0,0.0803,0.835,0.0,0.131,0.277,92.23,audio_features,02gaYAEdeR6poHcBH1KUQF,spotify:track:02gaYAEdeR6poHcBH1KUQF,https://api.spotify.com/v1/tracks/02gaYAEdeR6p...,https://api.spotify.com/v1/audio-analysis/02ga...,183711,4
2,0.78,0.23,4,-12.706,1,0.0448,0.913,0.00279,0.0798,0.125,123.937,audio_features,6plO0gM4tUvRC9TKFGIuaN,spotify:track:6plO0gM4tUvRC9TKFGIuaN,https://api.spotify.com/v1/tracks/6plO0gM4tUvR...,https://api.spotify.com/v1/audio-analysis/6plO...,240307,4
3,0.658,0.179,8,-10.866,1,0.0448,0.689,0.0,0.17,0.191,128.128,audio_features,0NeJjNlprGfZpeX2LQuN6c,spotify:track:0NeJjNlprGfZpeX2LQuN6c,https://api.spotify.com/v1/tracks/0NeJjNlprGfZ...,https://api.spotify.com/v1/audio-analysis/0NeJ...,238560,4
4,0.77,0.325,7,-11.301,1,0.0322,0.899,0.000556,0.22,0.721,103.085,audio_features,54KsfVVnN4YWI2mMrnyUcC,spotify:track:54KsfVVnN4YWI2mMrnyUcC,https://api.spotify.com/v1/tracks/54KsfVVnN4YW...,https://api.spotify.com/v1/audio-analysis/54Ks...,209652,4


In [10]:
# Combining 'track/artist name' and 'top_tracks_df' DataFrames and droping useless columns

# Dopping columns from 'top_tracks_df'
drop_col = ['type', 'track_href', 'analysis_url']
top_tracks_df = top_tracks_df.drop(drop_col, axis=1)
top_tracks_df

# DataFrame containing the top 30 tracks for a given user
top_tracks_df = pd.concat([track_artist_names, top_tracks_df.reindex(track_artist_names.index)], axis=1)
top_tracks_df.head()

Unnamed: 0,track name,artist,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id,uri,duration_ms,time_signature
0,Fuentes de Ortiz,Ed Maverick,0.83,0.159,1,-14.461,1,0.0383,0.946,2e-05,0.362,0.189,104.95,0akyEssGRVHstqCSWXusJL,spotify:track:0akyEssGRVHstqCSWXusJL,207400,4
1,Llevo,Luke Martinez,0.726,0.125,5,-9.194,0,0.0803,0.835,0.0,0.131,0.277,92.23,02gaYAEdeR6poHcBH1KUQF,spotify:track:02gaYAEdeR6poHcBH1KUQF,183711,4
2,Wru - (donde estás),Ed Maverick,0.78,0.23,4,-12.706,1,0.0448,0.913,0.00279,0.0798,0.125,123.937,6plO0gM4tUvRC9TKFGIuaN,spotify:track:6plO0gM4tUvRC9TKFGIuaN,240307,4
3,River,Leon Bridges,0.658,0.179,8,-10.866,1,0.0448,0.689,0.0,0.17,0.191,128.128,0NeJjNlprGfZpeX2LQuN6c,spotify:track:0NeJjNlprGfZpeX2LQuN6c,238560,4
4,Diez Pasos Hacia Ti,"Daniel, Me Estás Matando",0.77,0.325,7,-11.301,1,0.0322,0.899,0.000556,0.22,0.721,103.085,54KsfVVnN4YWI2mMrnyUcC,spotify:track:54KsfVVnN4YWI2mMrnyUcC,209652,4


In [11]:
top_tracks_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   track name        50 non-null     object 
 1   artist            50 non-null     object 
 2   danceability      50 non-null     float64
 3   energy            50 non-null     float64
 4   key               50 non-null     int64  
 5   loudness          50 non-null     float64
 6   mode              50 non-null     int64  
 7   speechiness       50 non-null     float64
 8   acousticness      50 non-null     float64
 9   instrumentalness  50 non-null     float64
 10  liveness          50 non-null     float64
 11  valence           50 non-null     float64
 12  tempo             50 non-null     float64
 13  id                50 non-null     object 
 14  uri               50 non-null     object 
 15  duration_ms       50 non-null     int64  
 16  time_signature    50 non-null     int64  
dtyp

In [12]:
# Function: Generates DataFrame with user library

Experimenting with a use and playlists

In [13]:
# user = spotify_session.user('gabriela_ayala19')
# user

# creates a playlist
# spotify_session.user_playlist_create(userid='some number', name='some name', public=True)

# deletes a playlist
# spotify_session.user_playlist_unfollow('37t3cvb5u3o97hin4bsj40abw', '6SKkyCwyHieUTNhiSxazhd')

In [15]:
# Getting the playlists for the user Spotify

playlists = spotify_session.user_playlists('spotify')
playlist_ids = []
while playlists:
    for i, playlist in enumerate(playlists['items']):
#         print(i +
#              1 +
#              playlists['offset'],
#              playlist['uri'].strip('spotify:playlist'))
        playlist_ids.append(playlist['id'])
    if playlists['next']:
        playlists = spotify_session.next(playlists)
    else:
        playlists = None

1 37i9dQZF1DXcBWIGoYBM5M
2 37i9dQZF1DX0XUsuxWHRQd
3 37i9dQZF1DX1lVhptIYRd
4 37i9dQZF1DX10zKzsJ2jv
5 37i9dQZF1DX4JAvHpjipBk
6 37i9dQZF1DX4dyzvuaRJ0n
7 37i9dQZF1DX4SBhb3fqCJd
8 37i9dQZF1DWXRqgorJj26U
9 37i9dQZF1DX4sWSpwq3LiO
10 37i9dQZF1DXcF6B6QPhFDv
11 37i9dQZF1DWXJfnUiYjUKT
12 37i9dQZF1DXcRXFNfZr7T
13 37i9dQZF1DX4o1oenSJRJd
14 37i9dQZF1DXbTxeAdrVG2
15 37i9dQZF1DX4UtSsGT1Sbe
16 37i9dQZF1DWTJ7xPn4vNaz
17 37i9dQZF1DXaKIA8E7WcJj
18 37i9dQZF1DWSV3Tk4GO2fq
19 37i9dQZF1DWTwnEm1IYyoj
20 37i9dQZF1DX2A29LI7xHn1
21 37i9dQZF1DX2RxBh64BHjQ
22 37i9dQZF1DWVA1Gq4XHa6U
23 37i9dQZF1DWY4xHQp97fN6
24 37i9dQZF1DWX3387IZmjN
25 37i9dQZF1DWYkaDif7Ztb
26 37i9dQZF1DX5hR0J49CmXC
27 37i9dQZF1DXan38dNVDdl4
28 37i9dQZF1DWSvKsRPPnv5
29 37i9dQZF1DWUVpAXiEPK8P
30 37i9dQZF1DX0Tkc6ltcBfU
31 37i9dQZF1DX1YPTAhwehsC
32 37i9dQZF1DWTggY0yqBxES
33 37i9dQZF1DX0HRj9P7NxeE
34 37i9dQZF1DWT6SJaitNDax
35 37i9dQZF1DX2r0FByV5U4C
36 37i9dQZF1DXaKctwWdt4be
37 37i9dQZF1DWT2jS7NwYPVI
38 37i9dQZF1DX82GYcclJ3Ug
39 37i9dQZF1DX49jUV2NfGku
40

In [52]:
# playing with the 'playlist_tracks' method to see how to obtain large number of track ids

response = spotify_session.playlist_tracks(playlist_ids[0],
                                         offset=1,
                                         fields='items.track.id')

[{'track': {'id': '4Oun2ylbjFKMPTiaSbbCih'}},
 {'track': {'id': '6UelLqGlWMcVH1E5c4H7lY'}},
 {'track': {'id': '7ytR5pFWmSjzHJIeQkgog4'}},
 {'track': {'id': '2SAqBLGA283SUiwJ3xOUVI'}},
 {'track': {'id': '3H7ihDc1dqLriiWXwsc2po'}},
 {'track': {'id': '5T490vvoFNU6psep0NPmxs'}},
 {'track': {'id': '27ycaQnQAxaPiyeg3nr2aB'}},
 {'track': {'id': '4wosxLl0mAqhneDzya2MfY'}},
 {'track': {'id': '551xyaSJsg8hILXFq9JdST'}},
 {'track': {'id': '0EhpEsp4L0oRGM0vmeaN5e'}},
 {'track': {'id': '6wQlQrTY5mVS8EGaFZVwVF'}},
 {'track': {'id': '3tjFYV6RSFtuktYl3ZtYcq'}},
 {'track': {'id': '1IIKrJVP1C9N7iPtG6eOsK'}},
 {'track': {'id': '1raaNykBg1bDnWENUiglUA'}},
 {'track': {'id': '59qrUpoplZxbIZxk6X0Bm3'}},
 {'track': {'id': '7y7w4tl4MaRC2UMEj1mPtr'}},
 {'track': {'id': '5f1joOtoMeyppIcJGZQvqJ'}},
 {'track': {'id': '2ygvZOXrIeVL4xZmAWJT2C'}},
 {'track': {'id': '24Yi9hE78yPEbZ4kxyoXAI'}},
 {'track': {'id': '3kwgqoBqTwoAH4nT29TYrq'}},
 {'track': {'id': '6o3QUC5oAE4g6WxRIFcZtb'}},
 {'track': {'id': '45bE4HXI0AwGZXf

In [53]:
# Obtaining track IDs for tracks in a playlist

trx = []
for i in playlist_ids:
    offset = 0
    while True:
        response = spotify_session.playlist_tracks(i,
                                                   offset=offset,
                                                   fields='items.track.id')
#         trx.append(response['items'])
        offset = offset + len(response['items'])
        if len(response['items']) == 0:
            break
        trx.append(response['items'])

retrying ...1secs
retrying ...1secs


In [None]:
# Dropping empty lists

for k, v in enumerate(trx):
    if len(trx[k]) == 0:
#         print(trx[k])
        trx.pop(k)

In [56]:
# Creating a list of track-id strings

track_ids = []

for lst in trx:
#     print(lst)
    for tracks in lst:
        if tracks['track'] == None:
            continue
        track_ids.append(tracks['track']['id'])
#         print(tracks['track']['id'])

In [58]:
len(track_ids)

104195

In [94]:
# Removing None type track ids

for k, track in enumerate(track_ids):
    if track == None:
        track_ids.pop(k)
len(track_ids)

104177

In [165]:
# Need to figure out a way to get the audio features for all 38k, or even potentially more

# pseudo code
token = spotipy.prompt_for_user_token('agustinvargas',
                                     client_id=SPOTIFY_CLIENT_ID,
                                     client_secret=SPOTIFY_CLIENT_SECRET,
                                     redirect_uri=uri)

spotify_session = spotipy.Spotify(auth=token)

# end_offset = 0
# start_offset = 0
# while end_offset <= len(track_ids):
#     start_offset = start_offset + end_offset
#     end_offset = start_offset + 50
#     get the audio_feats for the first 50
#     
len(track_ids)
j = track_ids[0]
audio_feat_2 = spotify_session.audio_features(tracks=j)
for _, v in audio_feat_2[0].items():
    print(v)
print(audio_feat_2[0])

0.746
0.765
6
-4.41
0
0.0993
0.0112
0
0.0936
0.737
114.044
audio_features
0v1x6rN6JHRapa03JElljE
spotify:track:0v1x6rN6JHRapa03JElljE
https://api.spotify.com/v1/tracks/0v1x6rN6JHRapa03JElljE
https://api.spotify.com/v1/audio-analysis/0v1x6rN6JHRapa03JElljE
199054
4
{'danceability': 0.746, 'energy': 0.765, 'key': 6, 'loudness': -4.41, 'mode': 0, 'speechiness': 0.0993, 'acousticness': 0.0112, 'instrumentalness': 0, 'liveness': 0.0936, 'valence': 0.737, 'tempo': 114.044, 'type': 'audio_features', 'id': '0v1x6rN6JHRapa03JElljE', 'uri': 'spotify:track:0v1x6rN6JHRapa03JElljE', 'track_href': 'https://api.spotify.com/v1/tracks/0v1x6rN6JHRapa03JElljE', 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0v1x6rN6JHRapa03JElljE', 'duration_ms': 199054, 'time_signature': 4}


In [216]:

spot_cc = spotipy.oauth2.SpotifyOAuth(username='agustinvargas',
                                      client_id=SPOTIFY_CLIENT_ID,
                                      client_secret=SPOTIFY_CLIENT_SECRET,
                                      redirect_uri=uri)
# token = spot_cc.get_access_token('agustinvargas',
#                                      client_id=SPOTIFY_CLIENT_ID,
#                                      client_secret=SPOTIFY_CLIENT_SECRET,
#                                      redirect_uri=uri)
# spotipy.Spotify.is_token_expired(token)
accs_token = spot_cc.get_access_token(as_dict=True)
token_exp = spot_cc.is_token_expired(accs_token)
refresh_accs_token = spot_cc.refresh_access_token(accs_token['refresh_token'])
# refresh_accs_token['access_token']
# refresh_accs_token
# accs_token
# token_exp





  # Remove the CWD from sys.path while we load stuff.


'BQDFWzUnjMKHPGBI87uYEePrtI5Iv_QIY4frXZc8-OnWJfAdoNrlbkhaOmHa6NhY5wh2TVsLxGxHqksqJkEtlQrfmKVzuhzfjr79GpntIpVeAm_lCiU8dryB2Kyd3PMzWhfgLqKiOrajL16heknHCm1sHDEly3_w05bOx9VKxqGOKCYzFJnRwNHrpgSdUQZYTp_tP_3_86o5tk4j9akt1blq9GN4'

In [225]:
accs_token = spot_cc.refresh_access_token(accs_token['refresh_token'])
token_exp = spot_cc.is_token_expired(accs_token)
token_exp

False

In [226]:
spotify_session = spotipy.client.Spotify(auth=accs_token['access_token'])
accs_token = spot_cc.get_access_token(as_dict=True)
token_exp = spot_cc.is_token_expired(accs_token)
lst_v_lst= []                         # a list of lists of values
for j in track_ids:
    token_exp = spot_cc.is_token_expired(accs_token)
    if token_exp == False:
        audio_feat = spotify_session.audio_features(tracks=j)
        if audio_feat[0] is None:
            continue
        else:
            v_lst = []
            for _, v in audio_feat[0].items():  # for loop through the 0th item to append values
                v_lst.append(v)                 # values appended to list
            lst_v_lst.append(v_lst)
    else:
        accs_token = spot_cc.refresh_access_token(accs_token['refresh_token'])
        token_exp = spot_cc.is_token_expired(accs_token)
        spotify_session = spotipy.client.Spotify(auth=accs_token['access_token'])





  


retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying .

In [241]:
# Collecting IDs from the songs that generated Audio Features

# list to be populated
track_ids_aud_feat = []

for k, v in enumerate(lst_v_lst):
    track_ids_aud_feat.append(lst_v_lst[k][12])
len(track_ids_aud_feat)

104144

In [None]:
# Generating lists containing Artist names and Track names

# refreshing token
accs_token = spot_cc.refresh_access_token(accs_token['refresh_token'])
spotify_session = spotipy.client.Spotify(auth=accs_token['access_token'])

# Getting artist name and track name (test)
track = spotify_session.track(track_ids_aud_feat[0])
print(track['artists'][0]['name'])    # Generates artist name
print(track['name'])                  # Generates track name      

# Lists to be populated
track_names_lst = []
artist_names_lst = []

# Iterating over list of track IDs (track_ids_aud_feat)
for j in track_ids_aud_feat:
    token_exp = spot_cc.is_token_expired(accs_token)
    if token_exp == False:
        track = spotify_session.track(j)
        track_names_lst.append(track['name'])
        artist_names_lst.append(track['artists'][0]['name'])
    else:
        accs_token = spot_cc.refresh_access_token(accs_token['refresh_token'])
        token_exp = spot_cc.is_token_expired(accs_token)
        spotify_session = spotipy.client.Spotify(auth=accs_token['access_token'])

BTS
Dynamite
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1secs
retrying ...1secs
retrying ...2secs
retrying ...1se

In [266]:
len(track_names_lst), len(artist_names_lst)

['BTS',
 'Cardi B',
 'Harry Styles',
 'DaBaby',
 'Drake',
 'Topic',
 'Jawsh 685',
 'Miley Cyrus',
 'Joel Corry',
 'Travis Scott',
 'J Balvin',
 'Juice WRLD',
 '24kGoldn',
 'Chris Brown']

In [229]:
# Generating a DF from the obtained audio features

full_df = pd.DataFrame(lst_v_lst, columns=k_lst)
full_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.746,0.765,6,-4.41,0,0.0993,0.0112,0.0,0.0936,0.737,114.044,audio_features,0v1x6rN6JHRapa03JElljE,spotify:track:0v1x6rN6JHRapa03JElljE,https://api.spotify.com/v1/tracks/0v1x6rN6JHRa...,https://api.spotify.com/v1/audio-analysis/0v1x...,199054,4
1,0.935,0.454,1,-7.509,1,0.375,0.0194,0.0,0.0824,0.357,133.073,audio_features,4Oun2ylbjFKMPTiaSbbCih,spotify:track:4Oun2ylbjFKMPTiaSbbCih,https://api.spotify.com/v1/tracks/4Oun2ylbjFKM...,https://api.spotify.com/v1/audio-analysis/4Oun...,187541,4
2,0.548,0.816,0,-4.209,1,0.0465,0.122,0.0,0.335,0.557,95.39,audio_features,6UelLqGlWMcVH1E5c4H7lY,spotify:track:6UelLqGlWMcVH1E5c4H7lY,https://api.spotify.com/v1/tracks/6UelLqGlWMcV...,https://api.spotify.com/v1/audio-analysis/6Uel...,174000,4
3,0.746,0.69,11,-7.956,1,0.164,0.247,0.0,0.101,0.497,89.977,audio_features,7ytR5pFWmSjzHJIeQkgog4,spotify:track:7ytR5pFWmSjzHJIeQkgog4,https://api.spotify.com/v1/tracks/7ytR5pFWmSjz...,https://api.spotify.com/v1/audio-analysis/7ytR...,181733,4
4,0.761,0.518,0,-8.871,1,0.134,0.244,3.5e-05,0.107,0.522,133.976,audio_features,2SAqBLGA283SUiwJ3xOUVI,spotify:track:2SAqBLGA283SUiwJ3xOUVI,https://api.spotify.com/v1/tracks/2SAqBLGA283S...,https://api.spotify.com/v1/audio-analysis/2SAq...,261493,4


In [231]:
# Dopping columns from 'top_tracks_df'
drop_col = ['type', 'track_href', 'analysis_url']
full_df = full_df.drop(drop_col, axis=1)
full_df.head(3)

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id,uri,duration_ms,time_signature
0,0.746,0.765,6,-4.41,0,0.0993,0.0112,0.0,0.0936,0.737,114.044,0v1x6rN6JHRapa03JElljE,spotify:track:0v1x6rN6JHRapa03JElljE,199054,4
1,0.935,0.454,1,-7.509,1,0.375,0.0194,0.0,0.0824,0.357,133.073,4Oun2ylbjFKMPTiaSbbCih,spotify:track:4Oun2ylbjFKMPTiaSbbCih,187541,4
2,0.548,0.816,0,-4.209,1,0.0465,0.122,0.0,0.335,0.557,95.39,6UelLqGlWMcVH1E5c4H7lY,spotify:track:6UelLqGlWMcVH1E5c4H7lY,174000,4


In [262]:
# Exporting dataframe of +100k songs to csv file
# full_df.to_csv(r'/Users/flanuer/Downloads/Lambda/Course_material/misc_datasets/100k_song_aud_feat.csv')

# Baseline Explorations
- Select type of problem type (class/reg)
- Determine model baselines
- Model evaluations/comparisons

In [171]:
# About needs description from me