# Spotify Song Data Scraping Sandbox

### References:
#### Based on the following tutorials: <br />
Max Hilsdorf, "How to Create Large Music Datasets Using Spotipy", <i>Towards Data Science</i>, 25 April 2020: <br />
https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6 <br />
Max Tingle, "Getting Started with Spotify’s API & Spotipy", <i>Towards Data Science</i>, 3 Oct 2019: <br />
https://medium.com/@maxtingle/getting-started-with-spotifys-api-spotipy-197c3dc6353b <br />
Sandra Radgowska, "How to use Spotify API and what data science opportunities can it open up?", <i>My Journey As A Data Scientist</i>, 18 August 2021:<br />
https://datascientistdiary.com/index.php/2021/03/04/how-to-use-spotify-api-and-what-data-science-opportunities-can-it-open-up/<br />
Angelica Dietzel, "How to Extract Any Artist’s Data Using Spotify’s API, Python, and Spotipy", <i>Better Programming</i>, 25 March 2020:<br />
https://betterprogramming.pub/how-to-extract-any-artists-data-using-spotify-s-api-python-and-spotipy-4c079401bc37<br />
StackOverflow: Spotipy: How To Read More Than 100 Tracks From A Playlist:<br />
https://stackoverflow.com/questions/39086287/spotipy-how-to-read-more-than-100-tracks-from-a-playlist<br />
Github: How Do I Get Every Track of A Playlist:<br /> https://github.com/plamere/spotipy/issues/246

## Setup

### Import packages

In [179]:
import json
import time
from tqdm import tqdm
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import creds

### Spotify Credentials

#### Load credentials

Loads the creds.py file, containing the following two lines for variables client_id and secret, which is gitignored for sharing. 

client_id = 'Your Client ID Here'<br />
secret = 'Your secret here'

In [None]:
%run -i 'creds.py'

#### Set credentials

In [None]:
client_credentials_manager = SpotifyClientCredentials(client_id=client_id,client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Functions for data extraction

### Get track data including features (updated to ensure retrieval of all artists)
#### Details: uri, name, album, artist name, release date, explicit T/F, duration in mins
#### Audio features: acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness, tempo, time_signature

#### Function to extract all the track ids from your playlist (returns only 100 items)

In [147]:
def get_track_ids(playlist_id):
    music_id_list = []
    playlist = sp.playlist(playlist_id)
    for item in playlist['tracks']['items']:
        music_track = item['track']
        music_id_list.append(music_track['id'])
    return music_id_list 

In [148]:
music_id_list = get_track_ids('3avCwQPH6DkhMTRsizon7N')
print(len(music_id_list))

100


#### Function to extract all album artists given a track id:

In [None]:
def get_all_album_artists_names(track_id):
    meta = sp.track(track_id)
    album_artist_list = []
    for item in (meta['album']['artists']):
        album_artist = item['name']
        album_artist_list.append(album_artist)
    return album_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [None]:
get_all_album_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

#### Function to extract all the track artists given the track id:

In [None]:
def get_all_track_artists_names(track_id):
    meta = sp.track(track_id)
    track_artist_list = []
    for item in (meta['artists']):
        track_artist = item['name']
        track_artist_list.append(track_artist)
    return track_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [None]:
get_all_track_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

#### Function to extract all the track artists' ids given a track id:

In [None]:
def get_all_track_artists_ids(track_id):
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [None]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

#### Function to extract all the details and features of each track by passing its ID:

In [None]:
def get_track_data(track_id):
    meta = sp.track(track_id)
    features = sp.audio_features(track_id)
    analysis = sp.audio_analysis(track_id)
    track_details = {'uri': meta['uri'],
                    'name': meta['name'],
                    'track_artists': get_all_track_artists_names(track_id),
                    'track_artists_ids': get_all_track_artists_ids(track_id),
                    'album': meta['album']['name'],
                    'album_artists': get_all_album_artists_names(track_id),
                    'release_date': meta['album']['release_date'],
                    'explicit': meta['explicit'],
                    'duration_in_mins': round((meta['duration_ms'] * 0.001) / 60.0, 2),
                    'acousticness' : features[0]['acousticness'],
                    'danceability' : features[0]['danceability'],
                    'energy' : features[0]['energy'],
                    'instrumentalness' : features[0]['instrumentalness'],
                    'liveness' : features[0]['liveness'],
                    'loudness' : features[0]['loudness'],
                    'speechiness' : features[0]['speechiness'],
                    'tempo' : features[0]['tempo'],
                    'time_signature' : features[0]['time_signature'],
                    'track_duration_in_seconds' : analysis['track']['duration'],
                    'end_of_fade_in' : analysis['track']['end_of_fade_in'],
                    'start_of_fade_out' : analysis['track']['start_of_fade_out'],
                    'key' : analysis['track']['key'],
                    'mode' : analysis['track']['mode']
                    }
    return track_details

#### Loop function possibility 1: 

In [117]:
def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks(playlist_id)
    track_count = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks
    

###  Extract track data

#### Create track container dictionaries

In [None]:
tracks_with_allartists = []

#### Extract info of each track

For testing:  playlist_id = '27Wi4y5VlHr43Q6UpZMVyS'

In [None]:
# Get the ids for all the songs in your playlist
playlist_id = input('Enter the playlist id')
track_ids = get_track_ids(playlist_id)
print(len(track_ids))
print(track_ids)

#  Loop over track ids and get their data points
for i in tqdm(range(len(track_ids))):
    time.sleep(.5)
    track = get_track_data(track_ids[i])
    tracks_with_allartists.append(track)

#### Create dataframe

In [None]:
df_allartists = pd.DataFrame(tracks_with_allartists)
df_allartists

In [None]:
df_allartists['track_artists'][2]

### Get artist data (id, artist name, genre, popularity, followers)

#### Extract track artist id column

In [None]:
artist_ids = df_allartists['track_artists_ids']
artist_ids

#### Explode column

In [None]:
splody_ids = artist_ids.explode('track_artists_ids')
id_df= pd.DataFrame(splody_ids)
id_df

#### Remove duplicates

In [None]:
id_df2 = id_df.drop_duplicates(subset=['track_artists_ids'], keep='first')
id_df2

#### Convert to list

In [None]:
artist_id_list = id_df2['track_artists_ids'].tolist()
artist_id_list

#### Function to extract all the details of each artist by passing their ID:

In [None]:
def get_artist_data(artist_id):
    meta = sp.artist(artist_id)
    artist_details = {'artist id': meta['id'],
                    'artist name': meta['name'],
                    'genres': meta['genres'],
                    'popularity': meta['popularity'],
                    'followers': meta['followers']['total']
                    }
    return artist_details

####  Extract artist data

Extract artist data from list

In [None]:
artists = []
#  Loop over track ids and get their data points
for i in tqdm(range(len(artist_id_list))):
    time.sleep(.5)
    artist = get_artist_data(artist_id_list[i])
    artists.append(artist)

In [None]:
artists

#### Create dataframe

In [None]:
artist_df = pd.DataFrame(artists)
artist_df

# =================================================

## Self Contained Loop Example

### Supporting function definitions

#### Album artists names function

In [182]:
def get_all_album_artists_names(track_id):
    meta = sp.track(track_id)
    album_artist_list = []
    for item in (meta['album']['artists']):
        album_artist = item['name']
        album_artist_list.append(album_artist)
    return album_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [181]:
get_all_album_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD']

#### Album artists ids function

In [187]:
def get_all_album_artists_ids(track_id):
    meta = sp.track(track_id)
    album_artist_id_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_id_list.append(album_artist_id)
    return album_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [188]:
get_all_album_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz']

#### Album artists genres function

In [205]:
def get_all_album_artists_genres(track_id):
    meta = sp.track(track_id)
    album_artist_genre_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_genres = sp.artist(album_artist_id)['genres']
        album_artist_genre_list.append(album_artist_genres)
    return album_artist_genre_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [206]:
get_all_album_artists_genres('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[['atl hip hop', 'hip hop', 'pop rap', 'rap', 'southern hip hop', 'trap'],
 ['chicago rap', 'melodic rap']]

#### Album artists popularity function

In [209]:
def get_all_album_artists_popularity(track_id):
    meta = sp.track(track_id)
    album_artist_popularity_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_popularity = sp.artist(album_artist_id)['popularity']
        album_artist_popularity_list.append(album_artist_popularity)
    return album_artist_popularity_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [210]:
get_all_album_artists_popularity('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[91, 97]

#### Album artists followers function

In [213]:
def get_all_album_artists_followers(track_id):
    meta = sp.track(track_id)
    album_artist_followers_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_followers = sp.artist(album_artist_id)['followers']['total']
        album_artist_followers_list.append(album_artist_followers)
    return album_artist_followers_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [214]:
get_all_album_artists_followers('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[10972060, 21500025]

#### Track artists names function

In [189]:
def get_all_track_artists_names(track_id):
    meta = sp.track(track_id)
    track_artist_list = []
    for item in (meta['artists']):
        track_artist = item['name']
        track_artist_list.append(track_artist)
    return track_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [190]:
get_all_track_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD', 'Young Thug']

#### Track artists ids function

In [191]:
def get_all_track_artists_ids(track_id):
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [192]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz', '50co4Is1HCEo8bhOyUWKpn']

#### Track artists genres function

In [215]:
def get_all_track_artists_genres(track_id):
    meta = sp.track(track_id)
    track_artist_genre_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_genres = sp.artist(track_artist_id)['genres']
        track_artist_genre_list.append(track_artist_genres)
    return track_artist_genre_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [216]:
get_all_track_artists_genres('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[['atl hip hop', 'hip hop', 'pop rap', 'rap', 'southern hip hop', 'trap'],
 ['chicago rap', 'melodic rap'],
 ['atl hip hop',
  'atl trap',
  'gangster rap',
  'hip hop',
  'melodic rap',
  'rap',
  'trap']]

#### Track artists popularity function

In [221]:
def get_all_track_artists_popularity(track_id):
    meta = sp.track(track_id)
    track_artist_popularity_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_popularity = sp.artist(track_artist_id)['popularity']
        track_artist_popularity_list.append(track_artist_popularity)
    return track_artist_popularity_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [222]:
get_all_track_artists_popularity('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[91, 97, 90]

#### Track artists followers function

In [223]:
def get_all_track_artists_followers(track_id):
    meta = sp.track(track_id)
    track_artist_followers_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_followers = sp.artist(track_artist_id)['followers']['total']
        track_artist_followers_list.append(track_artist_followers)
    return track_artist_followers_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [224]:
get_all_track_artists_followers('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[10972060, 21500025, 6833604]

### Playlist data return function

In [266]:
def get_playlist_tracks_more_than_100_songs(username, playlist_id):
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    results = tracks    
    
    pl_id = []
    playlist_name = []
    chart_position = []
    album_name = []
    album_id = []
    album_release_date = []
    album_artists = []
    album_artists_ids = []
    album_artists_genres = []
    album_artists_popularity = []
    album_artists_followers = []
    track_name = []
    track_id = []
    track_popularity = []
    track_explicit = []
    track_artists = []
    track_artists_ids = []
    track_artists_genres = []
    track_artists_popularity = []
    track_artists_followers = []

    for i in tqdm(range(len(results))):
        if i == 0:
            pl_id = playlist_id
            chart_position = i + 1
            album_name = results[i]['track']['album']['name']
            album_id = results[i]['track']['album']['id']
            album_release_date = results[i]['track']['album']['release_date']
            album_artists = get_all_album_artists_names(results[i]['track']['id'])
            album_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
            album_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
            album_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
            album_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
            track_name = results[i]['track']['name']
            track_id = results[i]['track']['id']
            track_popularity = results[i]['track']['popularity']
            track_explicit = results[i]['track']['explicit']
            track_artists = get_all_track_artists_names(results[i]['track']['id'])
            track_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
            track_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
            track_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
            track_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
            features = sp.audio_features(track_id)
            features_df = pd.DataFrame(data=features, columns=features[0].keys())
            features_df['playlist_id'] = pl_id
            features_df['position'] = chart_position
            features_df['album_name'] = album_name
            features_df['album_id'] = album_id
            features_df['album_release_date'] = album_release_date
            features_df['album_artists'] = album_artists
            features_df['album_artists_ids'] = album_artists_ids
            features_df['album_artists_genres'] = album_artists_genres
            features_df['album_artists_popularity'] = album_artists_popularity
            features_df['album_artists_followers'] = album_artists_followers
            features_df['track_name'] = track_name
            features_df['track_id'] = track_id
            features_df['track_popularity'] = track_popularity
            features_df['track_explicit'] = track_explicit
            features_df['track_artists'] = track_artists
            features_df['track_artists_ids'] = track_artists_ids
            features_df['track_artists_genres'] = track_artists_genres
            features_df['track_artists_popularity'] = track_artists_popularity
            features_df['track_artists_followers'] = track_artists_followers            
            features_df = features_df[['playlist_id', 'position', 
                                       'album_name', 'album_id', 'album_release_date', 'album_artists', 'album_artists_ids',
                                       'album_artists_genres', 'album_artists_popularity', 'album_artists_followers',
                                       'track_name', 'track_id', 'track_popularity', 'track_artists', 
                                       'track_artists_ids', 'track_artists_genres', 'track_artists_popularity', 
                                       'track_explicit', 'track_artists_followers', 'danceability', 'energy', 
                                       'key', 'loudness', 'mode', 'acousticness', 'instrumentalness',
                                       'liveness', 'valence', 'tempo',
                                       'duration_ms', 'time_signature']]
            continue
        else:
            try:
                pl_id = playlist_id
                chart_position = (i + 1)
                album_name = results[i]['track']['album']['name']
                album_id = results[i]['track']['album']['id']
                album_release_date = results[i]['track']['album']['release_date']
                album_artists = get_all_album_artists_names(results[i]['track']['id'])
                album_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
                album_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
                album_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
                album_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
                track_name = results[i]['track']['name']
                track_id = results[i]['track']['id']
                track_popularity = results[i]['track']['popularity']
                track_artists = get_all_track_artists_names(results[i]['track']['id'])
                track_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
                track_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
                track_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
                track_explicit = results[i]['track']['explicit']
                track_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
                features = sp.audio_features(track_id)
                new_row = {'playlist_id':[pl_id],
                    'position':[chart_position],
                    'album_name':[album_name],
                    'album_id': [album_id],
                    'album_release_date': [album_release_date],
                    'album_artists': [album_artists],
                    'album_artists_ids': [album_artists],
                    'album_artists_genres': [album_artists_genres],
                    'album_artists_popularity': [album_artists_popularity],
                    'album_artists_followers': [album_artists_followers],
                    'track_name': [track_name],
                    'track_id': [track_id],
                    'track_popularity': [track_popularity],
                    'track_artists': [track_artists],
                    'track_artists_ids': [track_artists_ids],
                    'track_artists_genres': [track_artists_genres],
                    'track_artists_popularity': [track_artists_popularity],
                    'track_explicit': [track_explicit],
                    'track_artists_followers': [track_artists_followers],
                    'danceability':[features[0]['danceability']],
                    'energy':[features[0]['energy']],
                    'key':[features[0]['key']],
                    'loudness':[features[0]['loudness']],
                    'mode':[features[0]['mode']],
                    'acousticness':[features[0]['acousticness']],
                    'instrumentalness':[features[0]['instrumentalness']],
                    'liveness':[features[0]['liveness']],
                    'valence':[features[0]['valence']],
                    'tempo':[features[0]['tempo']],
                    'duration_ms':[features[0]['duration_ms']],
                    'time_signature':[features[0]['time_signature']]
                }

                dfs = [features_df, pd.DataFrame(new_row)]
                features_df = pd.concat(dfs, ignore_index = True)
            except:
                continue
                
    return features_df

#### Playlist data return test

In [267]:
chart2021 = get_playlist_tracks_more_than_100_songs('katiekellert', '3avCwQPH6DkhMTRsizon7N')

100%|██████████| 200/200 [05:45<00:00,  1.73s/it]


In [270]:
chart2021['track_explicit']

0      False
1       True
2       True
3      False
4       True
       ...  
195    False
196    False
197    False
198    False
199    False
Name: track_explicit, Length: 200, dtype: bool

# =====================================================

# WIP AREA

### Get artist IDs from playlists or tracks

#### Create track container dictionaries

* Note that since artists variable is not created in cell with function call, subsequent calls will be appended to the same dictionary

In [271]:
artists = []

#### Function to extract all of the tracks' artist ids from your playlist:

In [272]:
def get_artist_ids(playlist_id):
    artist_id_list = []
    playlist = sp.playlist(playlist_id)
    for item in playlist['tracks']['items']:
        music_track = item['track']
        artist_id_list.append(music_track['artists'][0]['id'])
    return artist_id_list 

Test with playlist id '27Wi4y5VlHr43Q6UpZMVyS'

In [273]:
get_artist_ids('27Wi4y5VlHr43Q6UpZMVyS')

['246dkjvS1zLTtiykXe5h60',
 '06HL4z0CvFAxyc27GXpf02',
 '4kYSro6naA4h99UJvo89HB',
 '15UsOTVnJzReFVN1VCnxy4',
 '2YZyLoL8N0Wb9xBt1NhZWg']

In [274]:
def get_all_track_artists_ids(track_id):
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [275]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz', '50co4Is1HCEo8bhOyUWKpn']

#### Function to extract all the details of each artist by passing their ID:

In [276]:
def get_artist_data(artist_id):
    meta = sp.artist(artist_id)
    artist_details = {'artist id': meta['id'],
                    'artist name': meta['name'],
                    'genres': meta['genres'],
                    'popularity': meta['popularity'],
                    'followers': meta['followers']['total']
                    }
    return artist_details

####  Extract artist data

Extract artist data of each track

For testing:  playlist_id = '27Wi4y5VlHr43Q6UpZMVyS'

In [279]:
# Get the ids for all the songs in your playlist
playlist_id = input('Enter the playlist id')
artist_ids = get_artist_ids(playlist_id)
print(len(artist_ids))
print(artist_ids)

#  Loop over track ids and get their data points
for i in tqdm(range(len(artist_ids))):
    time.sleep(.5)
    artist = get_artist_data(artist_ids[i])
    artists.append(artist)

Enter the playlist id27Wi4y5VlHr43Q6UpZMVyS


  0%|          | 0/5 [00:00<?, ?it/s]

5
['246dkjvS1zLTtiykXe5h60', '06HL4z0CvFAxyc27GXpf02', '4kYSro6naA4h99UJvo89HB', '15UsOTVnJzReFVN1VCnxy4', '2YZyLoL8N0Wb9xBt1NhZWg']


100%|██████████| 5/5 [00:02<00:00,  1.68it/s]


#### Create dataframe

In [280]:
artist_df = pd.DataFrame(artists)
artist_df.head()

Unnamed: 0,artist id,artist name,genres,popularity,followers
0,246dkjvS1zLTtiykXe5h60,Post Malone,"[dfw rap, melodic rap, rap]",92,35378269
1,06HL4z0CvFAxyc27GXpf02,Taylor Swift,[pop],100,47325018
2,4kYSro6naA4h99UJvo89HB,Cardi B,"[dance pop, pop, pop rap, rap]",86,18844041
3,15UsOTVnJzReFVN1VCnxy4,XXXTENTACION,"[emo rap, miami hip hop]",91,33440056
4,2YZyLoL8N0Wb9xBt1NhZWg,Kendrick Lamar,"[conscious hip hop, hip hop, rap, west coast rap]",90,18984347


### Get track's audio features directly from playlist (for concept only, still a WIP)

#### Function to extract each track's audio features from a playlist directly

In [281]:
def get_playlist_tracks(playlist_id):
    track_attributes = sp.playlist_tracks(playlist_id)
    return track_attributes

In [282]:
playlist_tracks_data = []
playlist_ids = ['0qfagBJB5ou0r1kwQDZ8Op']

#  Loop over playlist ids and get their data points
for i in tqdm(range(len(playlist_ids))):
    time.sleep(.5)
    playlist_track = get_playlist_tracks(playlist_ids[i])
    playlist_tracks_data.append(playlist_track)

100%|██████████| 1/1 [00:00<00:00,  1.57it/s]


In [283]:
playlist_df = pd.DataFrame(playlist_tracks_data)
playlist_df

Unnamed: 0,href,items,limit,next,offset,previous,total
0,https://api.spotify.com/v1/playlists/0qfagBJB5...,"[{'added_at': '2015-12-04T17:25:30Z', 'added_b...",100,,0,,21


### Get track's audio features directly from playlist (for concept only, still a WIP)

#### Function to extract each track's audio features from a playlist directly

In [284]:
def get_playlist_tracks(playlist_id):
    track_attributes = sp.playlist_tracks(playlist_id)
    return track_attributes

In [285]:
playlist_tracks_data = []
playlist_ids = ['0qfagBJB5ou0r1kwQDZ8Op']

#  Loop over playlist ids and get their data points
for i in tqdm(range(len(playlist_ids))):
    time.sleep(.5)
    playlist_track = get_playlist_tracks(playlist_ids[i])
    playlist_tracks_data.append(playlist_track)

100%|██████████| 1/1 [00:00<00:00,  1.53it/s]


In [286]:
playlist_df = pd.DataFrame(playlist_tracks_data)
playlist_df

Unnamed: 0,href,items,limit,next,offset,previous,total
0,https://api.spotify.com/v1/playlists/0qfagBJB5...,"[{'added_at': '2015-12-04T17:25:30Z', 'added_b...",100,,0,,21


# ================================================

In [287]:
tracks_count = sp.playlist('3avCwQPH6DkhMTRsizon7N')['tracks']['total']
tracks_count

200

Extract info of each track

For testing:  playlist_id = '3avCwQPH6DkhMTRsizon7N'

#### Loop testing: 

In [288]:
playlist_id = '3avCwQPH6DkhMTRsizon7N'
tracks_count = sp.playlist(playlist_id)['tracks']['total']
playlist_name = sp.playlist(playlist_id)['name']
print(playlist_name)
print(tracks_count)

Billboard 200 Top Albums 2021
200


In [290]:
def get_playlist_tracks(playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks
      
      
      
# empty list, batchsize and the counter for None results
rows = []
batchsize = 100
None_counter = 0

for i in range(0,len(df_tracks['track_id']),batchsize):
    batch = df_tracks['track_id'][i:i+batchsize]
    feature_results = sp.audio_features(batch)
    for i, t in enumerate(feature_results):
        if t == None:
            None_counter = None_counter + 1
        else:
            rows.append(t)
            
print('Number of tracks where no audio features were available:',None_counter)







def show_tracks (results, uriArray):
    for i, item in enumerate(results['items']) :
        track = item['track']
        uriArray-append(track['id'])
                              
def get_playlist_track_id (username, playlist_ id):
    trackId = []
    results = sp.user_playlist(username, playlist_id)
    tracks = results['tracks']
    show_tracks (tracks, trackId)
    while tracks['next']:
            tracks = sp.next (tracks)
            show_tracks (tracks, trackId)
        return trackId

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 46)

#### Create dataframe

In [291]:
df_allartists = pd.DataFrame(tracks_with_allartists)
df_allartists

Unnamed: 0,uri,name,track_artists,track_artists_ids,album,album_artists,release_date,explicit,duration_in_mins,acousticness,...,liveness,loudness,speechiness,tempo,time_signature,track_duration_in_seconds,end_of_fade_in,start_of_fade_out,key,mode
0,spotify:track:0e7ipj03S05BNilyu5bRzt,rockstar (feat. 21 Savage),"[Post Malone, 21 Savage]","[246dkjvS1zLTtiykXe5h60, 1URnnhqYAYcrqrcwql10ft]",beerbongs & bentleys,[Post Malone],2018-04-27,True,3.64,0.124,...,0.131,-6.136,0.0712,159.801,4,218.14667,0.0,215.41151,5,0
1,spotify:track:1P17dC1amhFzptugyAO7Il,Look What You Made Me Do,[Taylor Swift],[06HL4z0CvFAxyc27GXpf02],reputation,[Taylor Swift],2017-11-10,False,3.53,0.204,...,0.126,-6.471,0.123,128.07,4,211.85333,0.34884,209.84454,9,0
2,spotify:track:3bsycjdQtbcJeR6822SBvd,I Like It,"[Cardi B, Bad Bunny, J Balvin]","[4kYSro6naA4h99UJvo89HB, 4q3ewBCX7sLwd24euuV69...",Invasion of Privacy,[Cardi B],2018-04-05,False,4.22,0.0981,...,0.378,-4.026,0.136,136.05,4,253.39029,0.15116,241.78938,5,0
3,spotify:track:3ee8Jmje8o58CHK66QrVC2,SAD!,[XXXTENTACION],[15UsOTVnJzReFVN1VCnxy4],?,[XXXTENTACION],2018-03-16,True,2.78,0.258,...,0.123,-4.88,0.145,75.023,4,166.60553,0.0,155.38794,8,1
4,spotify:track:3GCdLUSnKSMJhs4Tj6CV3s,All The Stars (with SZA),"[Kendrick Lamar, SZA]","[2YZyLoL8N0Wb9xBt1NhZWg, 7tYKF4w9nC0nq9CsPZTHyP]",Black Panther The Album Music From And Inspire...,"[Kendrick Lamar, SZA]",2018-02-09,True,3.87,0.0605,...,0.0926,-4.946,0.0597,96.924,4,232.18668,0.0,227.4917,8,1
5,spotify:track:0e7ipj03S05BNilyu5bRzt,rockstar (feat. 21 Savage),"[Post Malone, 21 Savage]","[246dkjvS1zLTtiykXe5h60, 1URnnhqYAYcrqrcwql10ft]",beerbongs & bentleys,[Post Malone],2018-04-27,True,3.64,0.124,...,0.131,-6.136,0.0712,159.801,4,218.14667,0.0,215.41151,5,0
6,spotify:track:1P17dC1amhFzptugyAO7Il,Look What You Made Me Do,[Taylor Swift],[06HL4z0CvFAxyc27GXpf02],reputation,[Taylor Swift],2017-11-10,False,3.53,0.204,...,0.126,-6.471,0.123,128.07,4,211.85333,0.34884,209.84454,9,0
7,spotify:track:3bsycjdQtbcJeR6822SBvd,I Like It,"[Cardi B, Bad Bunny, J Balvin]","[4kYSro6naA4h99UJvo89HB, 4q3ewBCX7sLwd24euuV69...",Invasion of Privacy,[Cardi B],2018-04-05,False,4.22,0.0981,...,0.378,-4.026,0.136,136.05,4,253.39029,0.15116,241.78938,5,0
8,spotify:track:3ee8Jmje8o58CHK66QrVC2,SAD!,[XXXTENTACION],[15UsOTVnJzReFVN1VCnxy4],?,[XXXTENTACION],2018-03-16,True,2.78,0.258,...,0.123,-4.88,0.145,75.023,4,166.60553,0.0,155.38794,8,1
9,spotify:track:3GCdLUSnKSMJhs4Tj6CV3s,All The Stars (with SZA),"[Kendrick Lamar, SZA]","[2YZyLoL8N0Wb9xBt1NhZWg, 7tYKF4w9nC0nq9CsPZTHyP]",Black Panther The Album Music From And Inspire...,"[Kendrick Lamar, SZA]",2018-02-09,True,3.87,0.0605,...,0.0926,-4.946,0.0597,96.924,4,232.18668,0.0,227.4917,8,1
