# Spotify Song Data Scraping Sandbox

### References:
#### Based on techniques from the following tutorials/feeds: <br />
Max Hilsdorf, "How to Create Large Music Datasets Using Spotipy", <i>Towards Data Science</i>, 25 April 2020: <br />
https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6 <br />
Max Tingle, "Getting Started with Spotify’s API & Spotipy", <i>Towards Data Science</i>, 3 Oct 2019: <br />
https://medium.com/@maxtingle/getting-started-with-spotifys-api-spotipy-197c3dc6353b <br />
Sandra Radgowska, "How to use Spotify API and what data science opportunities can it open up?", <i>My Journey As A Data Scientist</i>, 18 August 2021:<br />
https://datascientistdiary.com/index.php/2021/03/04/how-to-use-spotify-api-and-what-data-science-opportunities-can-it-open-up/<br />
Angelica Dietzel, "How to Extract Any Artist’s Data Using Spotify’s API, Python, and Spotipy", <i>Better Programming</i>, 25 March 2020:<br />
https://betterprogramming.pub/how-to-extract-any-artists-data-using-spotify-s-api-python-and-spotipy-4c079401bc37<br />
StackOverflow: Spotipy: How To Read More Than 100 Tracks From A Playlist:<br />
https://stackoverflow.com/questions/39086287/spotipy-how-to-read-more-than-100-tracks-from-a-playlist<br />
Github: How Do I Get Every Track of A Playlist:<br /> https://github.com/plamere/spotipy/issues/246

## Setup

### Import packages

In [3]:
import json
import time
from tqdm import tqdm
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import creds

### Set display options

In [4]:
pd.set_option('display.max_columns', 500)

### Spotify Credentials

#### Load credentials

Loads the creds.py file, containing the following two lines for variables client_id and secret, which is gitignored for sharing. 

client_id = 'Your Client ID Here'<br />
secret = 'Your secret here'

In [5]:
%run -i 'creds.py'

#### Set credentials

In [6]:
# Load client credentials for Spotipy
client_credentials_manager = SpotifyClientCredentials(client_id=client_id,client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Functions for data extraction

### Get track data including features (updated to ensure retrieval of all artists)
#### Details: uri, name, album, artist name, release date, explicit T/F, duration in mins
#### Audio features: acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness, tempo, time_signature

#### Function to extract all the track ids from your playlist (returns only 100 items)

In [17]:
def get_track_ids(playlist_id):
    """Extracts track ids from playlist"""
    music_id_list = []    
    playlist = sp.playlist(playlist_id)    
    for item in tqdm(playlist['tracks']['items']):    
        music_track = item['track']    
        music_id_list.append(music_track['id'])    
    return music_id_list 

In [18]:
# Test function with playlist id for list of 200 tracks
music_id_list = get_track_ids('3avCwQPH6DkhMTRsizon7N')
# Check all IDs returned
print("Music ID list length:", len(music_id_list))

# Truncated by API track return limit (100)

100%|██████████| 100/100 [00:00<00:00, 438276.28it/s]

Music ID list length: 100





#### Function to extract all album artists given a track id:

In [19]:
def get_all_album_artists_names(track_id):
    """ Returns a list of artist names for the given track's album """
    meta = sp.track(track_id)
    album_artist_list = [] 
    for item in (meta['album']['artists']):
        album_artist = item['name']
        album_artist_list.append(album_artist)
    return album_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [20]:
get_all_album_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD']

#### Function to extract all the track artists given the track id:

In [21]:
def get_all_track_artists_names(track_id):
    """Returns the list of artists' names for the given track """
    meta = sp.track(track_id)
    track_artist_list = []
    for item in (meta['artists']):
        track_artist = item['name']
        track_artist_list.append(track_artist)
    return track_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [22]:
get_all_track_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD', 'Young Thug']

#### Function to extract all the track artists' ids given a track id:

In [23]:
def get_all_track_artists_ids(track_id):
    """ Returns list of artists' ids for the given track """
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [24]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz', '50co4Is1HCEo8bhOyUWKpn']

#### Function to extract all the details and features of each track by passing its ID:

In [25]:
def get_track_data(track_id):
    """Returns list of data for given track id and builds a dataframe"""
    meta = sp.track(track_id)
    features = sp.audio_features(track_id)
    analysis = sp.audio_analysis(track_id)
    track_details = {'uri': meta['uri'],
                    'name': meta['name'],
                    'track_artists': get_all_track_artists_names(track_id),
                    'track_artists_ids': get_all_track_artists_ids(track_id),
                    'album': meta['album']['name'],
                    'album_artists': get_all_album_artists_names(track_id),
                    'release_date': meta['album']['release_date'],
                    'explicit': meta['explicit'],
                    'duration_in_mins': round((meta['duration_ms'] * 0.001) / 60.0, 2),
                    'acousticness' : features[0]['acousticness'],
                    'danceability' : features[0]['danceability'],
                    'energy' : features[0]['energy'],
                    'instrumentalness' : features[0]['instrumentalness'],
                    'liveness' : features[0]['liveness'],
                    'loudness' : features[0]['loudness'],
                    'speechiness' : features[0]['speechiness'],
                    'tempo' : features[0]['tempo'],
                    'time_signature' : features[0]['time_signature'],
                    'track_duration_in_seconds' : analysis['track']['duration'],
                    'end_of_fade_in' : analysis['track']['end_of_fade_in'],
                    'start_of_fade_out' : analysis['track']['start_of_fade_out'],
                    'key' : analysis['track']['key'],
                    'mode' : analysis['track']['mode']
                    }
    return track_details

#### Loop function possibility 1: 

In [92]:
def get_playlist_tracks(username, playlist_id):
    """Loops through track pages in Spotify >100 tracks"""
    results = sp.user_playlist_tracks(username, playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks
    

In [93]:
test_tracks = get_playlist_tracks('katiekellert', '3avCwQPH6DkhMTRsizon7N')
# Returns 200 for 200 tracks

In [94]:
test_tracks_df = pd.DataFrame(test_tracks)
test_tracks_df

Unnamed: 0,added_at,added_by,is_local,primary_color,track,video_thumbnail
0,2021-12-28T22:47:43Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
1,2021-12-28T22:46:26Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
2,2021-12-28T18:04:19Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
3,2021-12-28T22:45:24Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
4,2021-12-28T22:57:11Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
...,...,...,...,...,...,...
195,2021-12-29T03:40:46Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
196,2021-12-29T03:41:25Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
197,2021-12-29T03:41:59Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}
198,2021-12-29T03:42:28Z,{'external_urls': {'spotify': 'https://open.sp...,False,,"{'album': {'album_type': 'album', 'artists': [...",{'url': None}


###  Extract track data

#### Create track container dictionaries

In [100]:
tracks_with_allartists = []

#### Extract info of each track

For testing:  playlist_id = '27Wi4y5VlHr43Q6UpZMVyS'

In [101]:
# Get the ids for all the songs in your playlist
playlist_id = input('Enter the playlist id')
track_ids = get_track_ids(playlist_id)
print(len(track_ids))
print(track_ids)

#  Loop over track ids and get their data points
for i in tqdm(range(len(track_ids))):
    time.sleep(.5)
    track = get_track_data(track_ids[i])
    tracks_with_allartists.append(track)

# *Returns <=100 tracks for longer playlists

Enter the playlist id27Wi4y5VlHr43Q6UpZMVyS


100%|██████████| 5/5 [00:00<00:00, 31871.61it/s]
  0%|          | 0/5 [00:00<?, ?it/s]

5
['0e7ipj03S05BNilyu5bRzt', '1P17dC1amhFzptugyAO7Il', '3bsycjdQtbcJeR6822SBvd', '3ee8Jmje8o58CHK66QrVC2', '3GCdLUSnKSMJhs4Tj6CV3s']


100%|██████████| 5/5 [00:06<00:00,  1.37s/it]


#### Create dataframe

In [102]:
df_allartists = pd.DataFrame(tracks_with_allartists)
df_allartists

Unnamed: 0,uri,name,track_artists,track_artists_ids,album,album_artists,release_date,explicit,duration_in_mins,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,time_signature,track_duration_in_seconds,end_of_fade_in,start_of_fade_out,key,mode
0,spotify:track:0e7ipj03S05BNilyu5bRzt,rockstar (feat. 21 Savage),"[Post Malone, 21 Savage]","[246dkjvS1zLTtiykXe5h60, 1URnnhqYAYcrqrcwql10ft]",beerbongs & bentleys,[Post Malone],2018-04-27,True,3.64,0.124,0.585,0.52,7e-05,0.131,-6.136,0.0712,159.801,4,218.14667,0.0,215.41151,5,0
1,spotify:track:1P17dC1amhFzptugyAO7Il,Look What You Made Me Do,[Taylor Swift],[06HL4z0CvFAxyc27GXpf02],reputation,[Taylor Swift],2017-11-10,False,3.53,0.204,0.766,0.709,1.4e-05,0.126,-6.471,0.123,128.07,4,211.85333,0.34884,209.84454,9,0
2,spotify:track:3bsycjdQtbcJeR6822SBvd,I Like It,"[Cardi B, Bad Bunny, J Balvin]","[4kYSro6naA4h99UJvo89HB, 4q3ewBCX7sLwd24euuV69...",Invasion of Privacy,[Cardi B],2018-04-05,False,4.22,0.0981,0.814,0.721,0.0,0.378,-4.026,0.136,136.05,4,253.39029,0.15116,241.78938,5,0
3,spotify:track:3ee8Jmje8o58CHK66QrVC2,SAD!,[XXXTENTACION],[15UsOTVnJzReFVN1VCnxy4],?,[XXXTENTACION],2018-03-16,True,2.78,0.258,0.74,0.613,0.00372,0.123,-4.88,0.145,75.023,4,166.60553,0.0,155.38794,8,1
4,spotify:track:3GCdLUSnKSMJhs4Tj6CV3s,All The Stars (with SZA),"[Kendrick Lamar, SZA]","[2YZyLoL8N0Wb9xBt1NhZWg, 7tYKF4w9nC0nq9CsPZTHyP]",Black Panther The Album Music From And Inspire...,"[Kendrick Lamar, SZA]",2018-02-09,True,3.87,0.0605,0.698,0.633,0.000194,0.0926,-4.946,0.0597,96.924,4,232.18668,0.0,227.4917,8,1


In [103]:
df_allartists['track_artists'][2]

['Cardi B', 'Bad Bunny', 'J Balvin']

### Get artist data (id, artist name, genre, popularity, followers)

#### Extract track artist id column

In [109]:
artist_ids = df_allartists['track_artists_ids']
artist_ids

0     [246dkjvS1zLTtiykXe5h60, 1URnnhqYAYcrqrcwql10ft]
1                             [06HL4z0CvFAxyc27GXpf02]
2    [4kYSro6naA4h99UJvo89HB, 4q3ewBCX7sLwd24euuV69...
3                             [15UsOTVnJzReFVN1VCnxy4]
4     [2YZyLoL8N0Wb9xBt1NhZWg, 7tYKF4w9nC0nq9CsPZTHyP]
Name: track_artists_ids, dtype: object

#### Explode column

In [110]:
splody_ids = artist_ids.explode('track_artists_ids')
id_df= pd.DataFrame(splody_ids)
id_df

Unnamed: 0,track_artists_ids
0,246dkjvS1zLTtiykXe5h60
1,1URnnhqYAYcrqrcwql10ft
2,06HL4z0CvFAxyc27GXpf02
3,4kYSro6naA4h99UJvo89HB
4,4q3ewBCX7sLwd24euuV69X
5,1vyhD5VmyZ7KMfW5gqLgo5
6,15UsOTVnJzReFVN1VCnxy4
7,2YZyLoL8N0Wb9xBt1NhZWg
8,7tYKF4w9nC0nq9CsPZTHyP


#### Remove duplicates

In [111]:
id_df2 = id_df.drop_duplicates(subset=['track_artists_ids'], keep='first')
id_df2

Unnamed: 0,track_artists_ids
0,246dkjvS1zLTtiykXe5h60
1,1URnnhqYAYcrqrcwql10ft
2,06HL4z0CvFAxyc27GXpf02
3,4kYSro6naA4h99UJvo89HB
4,4q3ewBCX7sLwd24euuV69X
5,1vyhD5VmyZ7KMfW5gqLgo5
6,15UsOTVnJzReFVN1VCnxy4
7,2YZyLoL8N0Wb9xBt1NhZWg
8,7tYKF4w9nC0nq9CsPZTHyP


#### Convert to list

In [112]:
artist_id_list = id_df2['track_artists_ids'].tolist()
artist_id_list

['246dkjvS1zLTtiykXe5h60',
 '1URnnhqYAYcrqrcwql10ft',
 '06HL4z0CvFAxyc27GXpf02',
 '4kYSro6naA4h99UJvo89HB',
 '4q3ewBCX7sLwd24euuV69X',
 '1vyhD5VmyZ7KMfW5gqLgo5',
 '15UsOTVnJzReFVN1VCnxy4',
 '2YZyLoL8N0Wb9xBt1NhZWg',
 '7tYKF4w9nC0nq9CsPZTHyP']

#### Function to extract all the details of each artist by passing their ID:

In [113]:
def get_artist_data(artist_id):
    """Returns artist data for given id"""
    meta = sp.artist(artist_id)
    artist_details = {'artist id': meta['id'],
                    'artist name': meta['name'],
                    'genres': meta['genres'],
                    'popularity': meta['popularity'],
                    'followers': meta['followers']['total']
                    }
    return artist_details

####  Extract artist data

Extract artist data from list

In [114]:
artists = []
#  Loop over track ids and get their data points
for i in tqdm(range(len(artist_id_list))):
    time.sleep(.5)
    artist = get_artist_data(artist_id_list[i])
    artists.append(artist)

100%|██████████| 9/9 [00:05<00:00,  1.67it/s]


In [115]:
artists

[{'artist id': '246dkjvS1zLTtiykXe5h60',
  'artist name': 'Post Malone',
  'genres': ['dfw rap', 'melodic rap', 'rap'],
  'popularity': 92,
  'followers': 35414369},
 {'artist id': '1URnnhqYAYcrqrcwql10ft',
  'artist name': '21 Savage',
  'genres': ['atl hip hop', 'rap', 'trap'],
  'popularity': 89,
  'followers': 10588325},
 {'artist id': '06HL4z0CvFAxyc27GXpf02',
  'artist name': 'Taylor Swift',
  'genres': ['pop'],
  'popularity': 100,
  'followers': 47442319},
 {'artist id': '4kYSro6naA4h99UJvo89HB',
  'artist name': 'Cardi B',
  'genres': ['dance pop', 'pop', 'pop rap', 'rap'],
  'popularity': 86,
  'followers': 18871226},
 {'artist id': '4q3ewBCX7sLwd24euuV69X',
  'artist name': 'Bad Bunny',
  'genres': ['latin', 'reggaeton', 'trap latino'],
  'popularity': 99,
  'followers': 41622707},
 {'artist id': '1vyhD5VmyZ7KMfW5gqLgo5',
  'artist name': 'J Balvin',
  'genres': ['latin', 'reggaeton', 'reggaeton colombiano', 'trap latino'],
  'popularity': 94,
  'followers': 31646698},
 {'ar

#### Create dataframe

In [116]:
artist_df = pd.DataFrame(artists)
artist_df

Unnamed: 0,artist id,artist name,genres,popularity,followers
0,246dkjvS1zLTtiykXe5h60,Post Malone,"[dfw rap, melodic rap, rap]",92,35414369
1,1URnnhqYAYcrqrcwql10ft,21 Savage,"[atl hip hop, rap, trap]",89,10588325
2,06HL4z0CvFAxyc27GXpf02,Taylor Swift,[pop],100,47442319
3,4kYSro6naA4h99UJvo89HB,Cardi B,"[dance pop, pop, pop rap, rap]",86,18871226
4,4q3ewBCX7sLwd24euuV69X,Bad Bunny,"[latin, reggaeton, trap latino]",99,41622707
5,1vyhD5VmyZ7KMfW5gqLgo5,J Balvin,"[latin, reggaeton, reggaeton colombiano, trap ...",94,31646698
6,15UsOTVnJzReFVN1VCnxy4,XXXTENTACION,"[emo rap, miami hip hop]",92,33521355
7,2YZyLoL8N0Wb9xBt1NhZWg,Kendrick Lamar,"[conscious hip hop, hip hop, rap, west coast rap]",90,19003331
8,7tYKF4w9nC0nq9CsPZTHyP,SZA,"[pop, r&b, rap]",89,6894392


# =================================================

## API Workaround Loop Example

### Supporting function definitions

#### Album artists names function

In [117]:
def get_all_album_artists_names(track_id):
    """Returns a list of all artist names for the given trac's album"""
    meta = sp.track(track_id)
    album_artist_list = []
    for item in (meta['album']['artists']):
        album_artist = item['name']
        album_artist_list.append(album_artist)
    return album_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [118]:
get_all_album_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD']

#### Album artists ids function

In [119]:
def get_all_album_artists_ids(track_id):
    """ Returns a list of all artist ids for the given track's album"""
    meta = sp.track(track_id)
    album_artist_id_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_id_list.append(album_artist_id)
    return album_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [120]:
get_all_album_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz']

#### Album artists genres function

In [121]:
def get_all_album_artists_genres(track_id):
    """ Returns a lits of genres for the given track's album artists"""
    meta = sp.track(track_id)
    album_artist_genre_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_genres = sp.artist(album_artist_id)['genres']
        album_artist_genre_list.append(album_artist_genres)
    return album_artist_genre_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [122]:
get_all_album_artists_genres('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[['atl hip hop', 'hip hop', 'pop rap', 'rap', 'southern hip hop', 'trap'],
 ['chicago rap', 'melodic rap']]

#### Album artists popularity function

In [123]:
def get_all_album_artists_popularity(track_id):
    """Returns a list of popularity values for the given track's album artists"""
    meta = sp.track(track_id)
    album_artist_popularity_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_popularity = sp.artist(album_artist_id)['popularity']
        album_artist_popularity_list.append(album_artist_popularity)
    return album_artist_popularity_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [124]:
get_all_album_artists_popularity('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[91, 97]

#### Album artists followers function

In [125]:
def get_all_album_artists_followers(track_id):
    """ Returns the follower count for all of the given thack's album artists"""
    meta = sp.track(track_id)
    album_artist_followers_list = []
    for item in (meta['album']['artists']):
        album_artist_id = item['id']
        album_artist_followers = sp.artist(album_artist_id)['followers']['total']
        album_artist_followers_list.append(album_artist_followers)
    return album_artist_followers_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [126]:
get_all_album_artists_followers('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[10985372, 21575514]

#### Track artists names function

In [127]:
def get_all_track_artists_names(track_id):
    """ Returns a list of the given track's artist names"""
    meta = sp.track(track_id)
    track_artist_list = []
    for item in (meta['artists']):
        track_artist = item['name']
        track_artist_list.append(track_artist)
    return track_artist_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [128]:
get_all_track_artists_names('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['Future', 'Juice WRLD', 'Young Thug']

#### Track artists ids function

In [129]:
def get_all_track_artists_ids(track_id):
    """ Returns a list of the given track's artist ids"""
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [130]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz', '50co4Is1HCEo8bhOyUWKpn']

#### Track artists genres function

In [131]:
def get_all_track_artists_genres(track_id):
    """ Returns a list of the given track's artists' genres """
    meta = sp.track(track_id)
    track_artist_genre_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_genres = sp.artist(track_artist_id)['genres']
        track_artist_genre_list.append(track_artist_genres)
    return track_artist_genre_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [132]:
get_all_track_artists_genres('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[['atl hip hop', 'hip hop', 'pop rap', 'rap', 'southern hip hop', 'trap'],
 ['chicago rap', 'melodic rap'],
 ['atl hip hop',
  'atl trap',
  'gangster rap',
  'hip hop',
  'melodic rap',
  'rap',
  'trap']]

#### Track artists popularity function

In [133]:
def get_all_track_artists_popularity(track_id):
    """Returns a list of the given track's artists' popularity values"""
    meta = sp.track(track_id)
    track_artist_popularity_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_popularity = sp.artist(track_artist_id)['popularity']
        track_artist_popularity_list.append(track_artist_popularity)
    return track_artist_popularity_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [134]:
get_all_track_artists_popularity('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[91, 97, 90]

#### Track artists followers function

In [135]:
def get_all_track_artists_followers(track_id):
    """ Returns a list of the given track's artists' follwer counts"""
    meta = sp.track(track_id)
    track_artist_followers_list = []
    for item in (meta['artists']):
        track_artist_id = item['id']
        track_artist_followers = sp.artist(track_artist_id)['followers']['total']
        track_artist_followers_list.append(track_artist_followers)
    return track_artist_followers_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [136]:
get_all_track_artists_followers('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

[10985372, 21575514, 6843959]

### Playlist data return function

In [137]:
def get_playlist_tracks_more_than_100_songs(username, playlist_id):
    """ Returns data for a playlists's tracks and creates a dataframe """
    results = sp.user_playlist_tracks(username, playlist_id)
    playlist_name = sp.playlist(playlist_id)['name']
    tracks = results['items']
    """ Loop for continuing through Spotify pagination of playlist tracks"""
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    results = tracks    
    
    """Create list containers for track information"""
    pl_id = []
    pl_name = []
    chart_position = []
    album_name = []
    album_id = []
    album_release_date = []
    album_artists = []
    album_artists_ids = []
    album_artists_genres = []
    album_artists_popularity = []
    album_artists_followers = []
    track_name = []
    track_id = []
    track_popularity = []
    track_explicit = []
    track_artists = []
    track_artists_ids = []
    track_artists_genres = []
    track_artists_popularity = []
    track_artists_followers = []

    """Loop for pulling track information"""
    """tqdm included to provide a progress bar"""
    for i in tqdm(range(len(results))):
        if i == 0:
            """ Set inital call variablesfor first row"""
            pl_id = playlist_id
            pl_name = playlist_name
            chart_position = i + 1
            album_name = results[i]['track']['album']['name']
            album_id = results[i]['track']['album']['id']
            album_release_date = results[i]['track']['album']['release_date']
            album_artists = get_all_album_artists_names(results[i]['track']['id'])
            album_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
            album_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
            album_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
            album_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
            track_name = results[i]['track']['name']
            track_id = results[i]['track']['id']
            track_popularity = results[i]['track']['popularity']
            track_explicit = results[i]['track']['explicit']
            track_artists = get_all_track_artists_names(results[i]['track']['id'])
            track_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
            track_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
            track_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
            track_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
            """Call a list of all audio features"""
            features = sp.audio_features(track_id)
            """Create dataframe"""
            features_df = pd.DataFrame(data=features, columns=features[0].keys())
            """ Set data value location from created variables"""
            features_df['playlist_id'] = pl_id
            features_df['playlist_name'] = pl_name
            features_df['position'] = chart_position
            features_df['album_name'] = album_name
            features_df['album_id'] = album_id
            features_df['album_release_date'] = album_release_date
            features_df['album_artists'] = album_artists
            features_df['album_artists_ids'] = album_artists_ids
            features_df['album_artists_genres'] = album_artists_genres
            features_df['album_artists_popularity'] = album_artists_popularity
            features_df['album_artists_followers'] = album_artists_followers
            features_df['track_name'] = track_name
            features_df['track_id'] = track_id
            features_df['track_popularity'] = track_popularity
            features_df['track_explicit'] = track_explicit
            features_df['track_artists'] = track_artists
            features_df['track_artists_ids'] = track_artists_ids
            features_df['track_artists_genres'] = track_artists_genres
            features_df['track_artists_popularity'] = track_artists_popularity
            features_df['track_artists_followers'] = track_artists_followers            
            """ Set index values"""
            features_df = features_df[['playlist_id', 'playlist_name', 'position', 
                                       'album_name', 'album_id', 'album_release_date', 'album_artists', 'album_artists_ids',
                                       'album_artists_genres', 'album_artists_popularity', 'album_artists_followers',
                                       'track_name', 'track_id', 'track_popularity', 'track_artists', 
                                       'track_artists_ids', 'track_artists_genres', 'track_artists_popularity', 
                                       'track_explicit', 'track_artists_followers', 'danceability', 'energy', 
                                       'key', 'loudness', 'mode', 'acousticness', 'instrumentalness',
                                       'liveness', 'valence', 'tempo',
                                       'duration_ms', 'time_signature']]
            continue
        else:
            try:
                """ Set inital call variablesfor subsequent rows"""
                pl_id = playlist_id
                pl_name = playlist_name
                chart_position = (i + 1)
                album_name = results[i]['track']['album']['name']
                album_id = results[i]['track']['album']['id']
                album_release_date = results[i]['track']['album']['release_date']
                album_artists = get_all_album_artists_names(results[i]['track']['id'])
                album_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
                album_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
                album_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
                album_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
                track_name = results[i]['track']['name']
                track_id = results[i]['track']['id']
                track_popularity = results[i]['track']['popularity']
                track_artists = get_all_track_artists_names(results[i]['track']['id'])
                track_artists_ids = get_all_album_artists_ids(results[i]['track']['id'])
                track_artists_genres = get_all_album_artists_genres(results[i]['track']['id'])
                track_artists_popularity = get_all_album_artists_popularity(results[i]['track']['id'])
                track_explicit = results[i]['track']['explicit']
                track_artists_followers = get_all_album_artists_followers(results[i]['track']['id'])
                """Call a list of all audio features"""
                features = sp.audio_features(track_id)
                """Create new row dict"""
                new_row = {'playlist_id': [pl_id],
                    'playlist_name': [pl_name],
                    'position': [chart_position],
                    'album_name': [album_name],
                    'album_id': [album_id],
                    'album_release_date': [album_release_date],
                    'album_artists': [album_artists],
                    'album_artists_ids': [album_artists_ids],
                    'album_artists_genres': [album_artists_genres],
                    'album_artists_popularity': [album_artists_popularity],
                    'album_artists_followers': [album_artists_followers],
                    'track_name': [track_name],
                    'track_id': [track_id],
                    'track_popularity': [track_popularity],
                    'track_artists': [track_artists],
                    'track_artists_ids': [track_artists_ids],
                    'track_artists_genres': [track_artists_genres],
                    'track_artists_popularity': [track_artists_popularity],
                    'track_explicit': [track_explicit],
                    'track_artists_followers': [track_artists_followers],
                    'danceability':[features[0]['danceability']],
                    'energy':[features[0]['energy']],
                    'key':[features[0]['key']],
                    'loudness':[features[0]['loudness']],
                    'mode':[features[0]['mode']],
                    'acousticness':[features[0]['acousticness']],
                    'instrumentalness':[features[0]['instrumentalness']],
                    'liveness':[features[0]['liveness']],
                    'valence':[features[0]['valence']],
                    'tempo':[features[0]['tempo']],
                    'duration_ms':[features[0]['duration_ms']],
                    'time_signature':[features[0]['time_signature']]
                }

                """Convert new row to dataframe and concat to working df"""
                dfs = [features_df, pd.DataFrame(new_row)]
                features_df = pd.concat(dfs, ignore_index = True)
            except:
                continue
                
    return features_df

#### Playlist data return test

In [69]:
chart2021 = get_playlist_tracks_more_than_100_songs('katiekellert', '3avCwQPH6DkhMTRsizon7N')

100%|██████████| 200/200 [05:52<00:00,  1.76s/it]


In [138]:
chart2021

Unnamed: 0,playlist_id,playlist_name,position,album_name,album_id,album_release_date,album_artists,album_artists_ids,album_artists_genres,album_artists_popularity,album_artists_followers,track_name,track_id,track_popularity,track_artists,track_artists_ids,track_artists_genres,track_artists_popularity,track_explicit,track_artists_followers,danceability,energy,key,loudness,mode,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,1,Dangerous: The Double Album,1qW1C4kDOXnrly22daHbxz,2021-01-08,Morgan Wallen,4oUHIQIBe0LHzYfvXNW4QM,[contemporary country],87,2715211,More Than My Hometown,65mMCEOu5Ll1DBAfEUmerU,35,Morgan Wallen,4oUHIQIBe0LHzYfvXNW4QM,[contemporary country],87,False,2715211,0.621,0.868,6,-5.478,1,0.617000,0.000000,0.1310,0.594,126.010,216573,4
1,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,2,SOUR,6s84u2TUpR3wdUv4NgKA2j,2021-05-21,[Olivia Rodrigo],[1McMsnEElThX1knmY4oliG],[[pop]],[92],[12169365],drivers license,5wANPM4fQCJwkGd4rN57mH,93,[Olivia Rodrigo],[1McMsnEElThX1knmY4oliG],[[pop]],[92],True,[12169365],0.561,0.431,10,-8.810,1,0.768000,0.000014,0.1060,0.137,143.875,242013,4
2,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,3,Shoot For The Stars Aim For The Moon,7e7t0MCrNDcJZsPwUKjmOc,2020-07-03,[Pop Smoke],[0eDvMgVFoNV3TpwtrVCoTj],[[brooklyn drill]],[89],[8745351],What You Know Bout Love,1tkg4EHVoqnhR6iFEXb60y,86,[Pop Smoke],[0eDvMgVFoNV3TpwtrVCoTj],[[brooklyn drill]],[89],True,[8745351],0.709,0.548,10,-8.493,1,0.650000,0.000002,0.1330,0.543,83.995,160000,4
3,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,4,evermore,2Xoteh7uEpea4TohMxjtaq,2020-12-11,[Taylor Swift],[06HL4z0CvFAxyc27GXpf02],[[pop]],[100],[47442319],willow,0lx2cLdOt3piJbcaXIV74f,82,[Taylor Swift],[06HL4z0CvFAxyc27GXpf02],[[pop]],[100],False,[47442319],0.392,0.574,7,-9.195,1,0.833000,0.001790,0.1450,0.529,81.112,214707,4
4,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,5,Certified Lover Boy,3SpBlxme9WbeQdI9kx7KAV,2021-09-03,[Drake],[3TVXtAsR1Inumwj472S9r4],"[[canadian hip hop, canadian pop, hip hop, rap...",[98],[59891739],Way 2 Sexy (with Future & Young Thug),0k1WUmIRnG3xU6fvvDVfRG,90,"[Drake, Future, Young Thug]",[3TVXtAsR1Inumwj472S9r4],"[[canadian hip hop, canadian pop, hip hop, rap...",[98],True,[59891739],0.803,0.597,11,-6.035,0,0.000619,0.000005,0.3230,0.331,136.008,257605,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,196,Hybrid Theory,2pKw6GERJVAD61449B1EEM,2000-10-24,[Linkin Park],[6XyY86QOPPrYVGvF9ch6wz],"[[alternative metal, nu metal, post-grunge, ra...",[87],[20797814],In the End,3tSmXSxaAnU1EPGKa6NytH,67,[Linkin Park],[6XyY86QOPPrYVGvF9ch6wz],"[[alternative metal, nu metal, post-grunge, ra...",[87],False,[20797814],0.555,0.844,3,-6.567,0,0.008340,0.000000,0.0895,0.464,105.195,216800,4
196,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,197,The Chaos Chapter: FREEZE,5Zdr9vactwnJH4Vpe9Mid9,2021-05-31,[TOMORROW X TOGETHER],[0ghlgldX5Dd6720Q3qFyQB],"[[k-pop, k-pop boy group]]",[81],[5280216],0X1=LOVESONG (I Know I Love You) feat. Seori,1Z8TPHiKeCUyClxV6WTTIf,81,[TOMORROW X TOGETHER],[0ghlgldX5Dd6720Q3qFyQB],"[[k-pop, k-pop boy group]]",[81],False,[5280216],0.584,0.836,0,-4.925,1,0.055800,0.000000,0.0663,0.484,104.973,202204,4
197,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,198,21,0Lg1uZvI312TPqxNWShFXL,2011-01-24,[Adele],[4dpARuHxo51G3z768sgnrY],"[[british soul, pop, pop soul, uk pop]]",[96],[32679969],Someone Like You,1zwMYTA5nlNjZxYrvBB2pV,82,[Adele],[4dpARuHxo51G3z768sgnrY],"[[british soul, pop, pop soul, uk pop]]",[96],False,[32679969],0.556,0.319,9,-8.251,1,0.893000,0.000000,0.0996,0.294,135.187,285240,4
198,3avCwQPH6DkhMTRsizon7N,Billboard 200 Top Albums 2021,199,OK ORCHESTRA,1y2AzG31F4CuCKQ1rpIzaI,2021-03-26,[AJR],[6s22t5Y3prQHyaHWUN1R1C],"[[modern rock, pop rap]]",[79],[2074176],Bang!,5FQPpxOXsvkMN6v18gtpwY,68,[AJR],[6s22t5Y3prQHyaHWUN1R1C],"[[modern rock, pop rap]]",[79],False,[2074176],0.740,0.517,1,-6.233,0,0.018300,0.000000,0.0558,0.698,139.917,170858,4


In [139]:
print(chart2021.columns)
len(chart2021.columns)

Index(['playlist_id', 'playlist_name', 'position', 'album_name', 'album_id',
       'album_release_date', 'album_artists', 'album_artists_ids',
       'album_artists_genres', 'album_artists_popularity',
       'album_artists_followers', 'track_name', 'track_id', 'track_popularity',
       'track_artists', 'track_artists_ids', 'track_artists_genres',
       'track_artists_popularity', 'track_explicit', 'track_artists_followers',
       'danceability', 'energy', 'key', 'loudness', 'mode', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms',
       'time_signature'],
      dtype='object')


32

# =====================================================

### Get artist IDs from playlists or tracks

#### Create track container dictionaries

* Note that since artists variable is not created in cell with function call, subsequent calls will be appended to the same dictionary

In [140]:
artists = []

#### Function to extract all of the tracks' artist ids from your playlist:

In [141]:
def get_artist_ids(playlist_id):
    """ Return a list of first listed track artists from playlist id"""
    artist_id_list = []
    playlist = sp.playlist(playlist_id)
    for item in playlist['tracks']['items']:
        music_track = item['track']
        artist_id_list.append(music_track['artists'][0]['id'])
    return artist_id_list 

Test with playlist id '27Wi4y5VlHr43Q6UpZMVyS'

In [142]:
get_artist_ids('27Wi4y5VlHr43Q6UpZMVyS')

['246dkjvS1zLTtiykXe5h60',
 '06HL4z0CvFAxyc27GXpf02',
 '4kYSro6naA4h99UJvo89HB',
 '15UsOTVnJzReFVN1VCnxy4',
 '2YZyLoL8N0Wb9xBt1NhZWg']

In [143]:
def get_all_track_artists_ids(track_id):
    """ Return a list of all track artists from a track id"""
    meta = sp.track(track_id)
    track_artist_id_list = []
    for item in (meta['artists']):
        track_artist = item['id']
        track_artist_id_list.append(track_artist)
    return track_artist_id_list     

Test with track id '3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a'

In [144]:
get_all_track_artists_ids('3fFBZvG777xoKyvcrBq7lc?si=f241a44d776b451a')

['1RyvyyTE3xzB2ZywiAwp0i', '4MCBfE4596Uoi2O4DtmEMz', '50co4Is1HCEo8bhOyUWKpn']

#### Function to extract all the details of each artist by passing their ID:

In [145]:
def get_artist_data(artist_id):
    """ Return artist data from a given artist id"""
    meta = sp.artist(artist_id)
    artist_details = {'artist id': meta['id'],
                    'artist name': meta['name'],
                    'genres': meta['genres'],
                    'popularity': meta['popularity'],
                    'followers': meta['followers']['total']
                    }
    return artist_details

####  Extract artist data

Extract artist data of each track

For testing:  playlist_id = '27Wi4y5VlHr43Q6UpZMVyS'

In [146]:
# Get the ids for all the songs in your playlist
playlist_id = input('Enter the playlist id')
artist_ids = get_artist_ids(playlist_id)
print(len(artist_ids))
print(artist_ids)

#  Loop over track ids and get their data points
for i in tqdm(range(len(artist_ids))):
    time.sleep(.5)
    artist = get_artist_data(artist_ids[i])
    artists.append(artist)

Enter the playlist id27Wi4y5VlHr43Q6UpZMVyS


  0%|          | 0/5 [00:00<?, ?it/s]

5
['246dkjvS1zLTtiykXe5h60', '06HL4z0CvFAxyc27GXpf02', '4kYSro6naA4h99UJvo89HB', '15UsOTVnJzReFVN1VCnxy4', '2YZyLoL8N0Wb9xBt1NhZWg']


100%|██████████| 5/5 [00:02<00:00,  1.69it/s]


#### Create dataframe

In [147]:
artist_df = pd.DataFrame(artists)
artist_df.head()

Unnamed: 0,artist id,artist name,genres,popularity,followers
0,246dkjvS1zLTtiykXe5h60,Post Malone,"[dfw rap, melodic rap, rap]",92,35414369
1,06HL4z0CvFAxyc27GXpf02,Taylor Swift,[pop],100,47442319
2,4kYSro6naA4h99UJvo89HB,Cardi B,"[dance pop, pop, pop rap, rap]",86,18871226
3,15UsOTVnJzReFVN1VCnxy4,XXXTENTACION,"[emo rap, miami hip hop]",92,33521355
4,2YZyLoL8N0Wb9xBt1NhZWg,Kendrick Lamar,"[conscious hip hop, hip hop, rap, west coast rap]",90,19003331


### Get track's audio features directly from playlist (for concept only, still a WIP)

#### Function to extract each track's data from a playlist directly

In [148]:
def get_playlist_tracks(playlist_id):
    """Returns all playlist track data in a single row of lists"""
    track_attributes = sp.playlist_tracks(playlist_id)
    return track_attributes

In [149]:
playlist_tracks_data = []
playlist_ids = ['0qfagBJB5ou0r1kwQDZ8Op']

#  Loop over playlist ids and get their data points
for i in tqdm(range(len(playlist_ids))):
    time.sleep(.5)
    playlist_track = get_playlist_tracks(playlist_ids[i])
    playlist_tracks_data.append(playlist_track)

100%|██████████| 1/1 [00:00<00:00,  1.55it/s]


In [150]:
playlist_df = pd.DataFrame(playlist_tracks_data)
playlist_df

Unnamed: 0,href,items,limit,next,offset,previous,total
0,https://api.spotify.com/v1/playlists/0qfagBJB5...,"[{'added_at': '2015-12-04T17:25:30Z', 'added_b...",100,,0,,21


### Get track's audio features directly from playlist 

#### Function to extract each track's data from a playlist directly

In [151]:
def get_playlist_data(playlist_id):
    """Returns playlist track data """
    playlist_tracks = sp.playlist_tracks(playlist_id)
    return playlist_tracks

In [152]:
playlist_tracks_data = []
playlist_ids = ['0qfagBJB5ou0r1kwQDZ8Op']

#  Loop over playlist ids list and get their data as a single row
for i in tqdm(range(len(playlist_ids))):
    time.sleep(.5)
    playlist_track_data = get_playlist_data(playlist_ids[i])
    playlist_tracks_data.append(playlist_track_data)

100%|██████████| 1/1 [00:00<00:00,  1.53it/s]


In [158]:
print(len(playlist_tracks_data))

1


In [154]:
playlist_df = pd.DataFrame(playlist_tracks_data)
playlist_df

Unnamed: 0,href,items,limit,next,offset,previous,total
0,https://api.spotify.com/v1/playlists/0qfagBJB5...,"[{'added_at': '2015-12-04T17:25:30Z', 'added_b...",100,,0,,21


# ================================================

#### Total track number and name count code

In [155]:
tracks_count = sp.playlist('3avCwQPH6DkhMTRsizon7N')['tracks']['total']
tracks_count

200

Extract name and playlist count of each track

For testing:  playlist_id = '3avCwQPH6DkhMTRsizon7N'

In [156]:
playlist_id = '3avCwQPH6DkhMTRsizon7N'
tracks_count = sp.playlist(playlist_id)['tracks']['total']
playlist_name = sp.playlist(playlist_id)['name']
print(playlist_name)
print(tracks_count)

Billboard 200 Top Albums 2021
200
