# Connecting & Interacting with the Spotify API (without Wrappers)

## Data Cleaning, Feature Engineering, Authorization Code Flow, Data Verification & Integration

### John Warlick • 10/20/2021 • john.warlick.2021@anderson.ucla.edu

This project serves as an example for using an OAuth 2.0 Authorization Code Flow to connect to the Spotify API. 


Here, I'll use the Spotify API to augment a dataset I have -- a couple of years' worth of Spotify Viral 50 chart data -- with more complete and accurate information. I'll also engineer new features that might be relevant for music business applications, and then create a Spotify playlist using some of the songs in the dataset via the API.

Note: I just reran this code the day I published this (1/9/22)

## Part 1: Connecting to the Spotify API

In [117]:
import glob
import numpy as np
import pandas as pd

In [102]:
# Concatenate all CSVs into single dataframe
files = glob.glob('spotify_top200_viral50_data/*.csv')
df = pd.concat((pd.read_csv(f) for f in files)).reset_index()

df

Unnamed: 0,index,Track,ISRC,Spotify Track Ids,Artist(s),Spotify Artist Id(s),Album(s),UPC,Spotify Album Ids,Release Date,...,Peak Position,Peak Date,Latest Position,Position Change,Latest Charting Date,Historical Positions,Streams,Spotify Track Link,Spotify Album Link,Spotify Artist Link
0,0,Someone You Loved,DEUM71807062,2TIlqbIneP0ZY1O0EzYLlc,Lewis Capaldi,4GNC7GD6oZMSxPGyXy4MNB,BREACH,602577308642,0NVQ9k3wKmuK6T02lLMl6y,2018-11-07,...,1,2019-03-02,1,0.0,2019-03-17,"198, 179, 181, 167, 166, 158, 167, 152, 127, 1...",489306.0,https://open.spotify.com/track/2TIlqbIneP0ZY1O...,https://open.spotify.com/album/0NVQ9k3wKmuK6T0...,https://open.spotify.com/artist/4GNC7GD6oZMSxP...
1,1,Giant (with Rag'n'Bone Man),GBARL1801703,5itOtNx0WxtJmi1TQ3RuRd,"Calvin Harris, RagnBone Man","7CajNmpbOovFoOoasH2HaY, 4f9iBmdUOhQWeP7dcAn1pf",Giant (with Rag'n'Bone Man),886447461051,4PwXTHenZZx7ebgsnTM65K,2019-01-10,...,2,2019-03-09,2,0.0,2019-03-17,"9, 9, 9, 8, 7, 7, 7, 8, 8, 8, 8, 7, 5, 5, 5, 5...",405203.0,https://open.spotify.com/track/5itOtNx0WxtJmi1...,https://open.spotify.com/album/4PwXTHenZZx7ebg...,https://open.spotify.com/artist/7CajNmpbOovFoO...
2,2,Don't Call Me Up,GBUM71808052,5WHTFyqSii0lmT9R21abT8,Mabel,1MIVXf74SZHmTIp4V4paH4,Ivy To Roses (Mixtape),602577411656,0syM7OUAhV7S6XmOa4nLUZ,2019-01-17,...,3,2019-02-11,3,0.0,2019-03-17,"43, 48, 37, 26, 20, 14, 12, 13, 12, 11, 10, 8,...",335986.0,https://open.spotify.com/track/5WHTFyqSii0lmT9...,https://open.spotify.com/album/0syM7OUAhV7S6Xm...,https://open.spotify.com/artist/1MIVXf74SZHmTI...
3,3,"break up with your girlfriend, i'm bored",USUM71900409,4kV4N9D1iKVxx1KLvtTpjS,Ariana Grande,66CXWjxzNUsdJxJ2JdwvnR,"thank u, next",602577490385,2fYhqwDWXjbpjaIJPEfKFw,2019-02-07,...,1,2019-02-09,4,0.0,2019-03-17,"2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...",314619.0,https://open.spotify.com/track/4kV4N9D1iKVxx1K...,https://open.spotify.com/album/2fYhqwDWXjbpjaI...,https://open.spotify.com/artist/66CXWjxzNUsdJx...
4,4,Sucker,USUG11900515,4y3OI86AEP6PQoDE6olYhO,Jonas Brothers,7gOdHgIoIKoe4i9Tta6qdD,Sucker,602577558719,4W0r9HOcuCC6Vh7aze2hwi,2019-02-28,...,5,2019-03-17,5,1.0,2019-03-17,"14, 12, 9, 7, 7, 7, 7, 11, 7, 7, 8, 7, 7, 8, 8...",312009.0,https://open.spotify.com/track/4y3OI86AEP6PQoD...,https://open.spotify.com/album/4W0r9HOcuCC6Vh7...,https://open.spotify.com/artist/7gOdHgIoIKoe4i...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1745,45,Trapanese,QM24S1801171,3eZNHwRkJ8V75psXi9Utyo,"lil ricefield, seiji oda","2lNuUKrRrHRQ6cYVmrAazL, 02hku5R1SCUiTPydXMdKBp",Trapanese,843357112268,55tzHxe6FXGmENTExMN9tZ,2018-02-01,...,39,2019-03-15,46,-7.0,2019-03-16,"39, 46",,https://open.spotify.com/track/3eZNHwRkJ8V75ps...,https://open.spotify.com/album/55tzHxe6FXGmENT...,https://open.spotify.com/artist/2lNuUKrRrHRQ6c...
1746,46,Act Up,USUG11802247,4eIT2gZ0WZyJpgfmoD6saJ,City Girls,37hAfseJWi0G3Scife12Il,Girl Code,602577328282,45TU0GIOO5AxFoiOPk08i1,2018-11-15,...,41,2019-03-15,47,-6.0,2019-03-16,"47, 50, 44, 41, 47",,https://open.spotify.com/track/4eIT2gZ0WZyJpgf...,https://open.spotify.com/album/45TU0GIOO5AxFoi...,https://open.spotify.com/artist/37hAfseJWi0G3S...
1747,47,An Irish Pub Song,AUAB01001108,1ewYoK8HL3Rflxyrjo1CAY,The Rumjacks,0w2KUuMj7dvP8dV4tzoltd,Gangs Of New Holland,602557084481,38OLdGMUHBEjvRvuWWR33f,2010-09-21,...,48,2019-03-16,48,,2019-03-16,48,,https://open.spotify.com/track/1ewYoK8HL3Rflxy...,https://open.spotify.com/album/38OLdGMUHBEjvRv...,https://open.spotify.com/artist/0w2KUuMj7dvP8d...
1748,48,Let Me Down Slowly,USAT21802284,2qxmye6gAegTMjLKEBoR3d,Alec Benjamin,5IH6FPUwQTxPSXurCrcIov,Narrated For You,75679858450,6jKZplJpy21R5lHaYHHjmZ,2018-11-15,...,25,2019-02-11,49,-1.0,2019-03-16,"44, 46, 46, 37, 38, 38, 38, 46, 42, 33, 34, 32...",,https://open.spotify.com/track/2qxmye6gAegTMjL...,https://open.spotify.com/album/6jKZplJpy21R5lH...,https://open.spotify.com/artist/5IH6FPUwQTxPSX...


In [103]:
# Drop columns I don't need for my task
cols = ['Album(s)', 'UPC', 'Latest Charting Date', 'Spotify Track Link', 'Spotify Artist Link', 'Spotify Album Link', 'Chart Cycle']
df.drop(cols, axis=1, inplace=True)

# Rename columns for readability
df.rename(columns={\
                   'Track': 'track',
                   'ISRC': 'isrc',
                   'Spotify Track Ids': 'track_id',
                   'Artist(s)': 'artist',
                   'Spotify Artist Id(s)': 'artist_id',
                   'Spotify Album Ids':'album_id',
                   'Release Date': 'r_date',
                   'Record Label': 'label',
                   'Country': 'country',
                   'Days on Chart': 'days_chart',
                   'Chart Type': 'chart_type',
                   'Peak Position': 'peak_pos',
                   'Peak Date': 'peak_date',
                   'Latest Position': 'latest_pos',
                   'Position Change': 'pos_chg',
                   'Historical Positions': 'hist_pos'
                  },
         inplace=True)

# Cast artist and track columns to appropriate dtype
df[['artist','track']] = df[['artist','track']].astype(str)

In [104]:
# Check all columns for nulls
df.isnull().sum()

index           0
track           0
isrc            0
track_id        0
artist          0
artist_id       0
album_id        0
r_date          0
label          36
country         0
days_chart      0
chart_type      0
peak_pos        0
peak_date       0
latest_pos      0
pos_chg        27
hist_pos        0
Streams       350
dtype: int64

In [105]:
# Fill null non-numeric columns-- i.e. 'label'-- with 'UNKNOWN'
df.loc[df['label'].isnull(), 'label'] = 'UNKNOWN'

In [106]:
# Define 'prev_pos' feature as second to last value of 'hist_pos' column, or 0 for rows with no previous positions
df['hist_pos'] = df['hist_pos'].apply(lambda x: x[:].split(', '))
df['prev_pos'] = df['hist_pos'].str[-2].fillna(0).astype('int')
df['hist_pos'] = df['hist_pos'].apply(lambda x: ', '.join(map(str, x)))

In [107]:
# 'pos_chg' feature has nulls, so redefine 'pos_chg' col as difference of 'latest_pos' col from 'prev_pos' col
df['pos_chg'] = df['prev_pos'].where(df['prev_pos'] == 0, df['prev_pos'] - df['latest_pos'])

In [108]:
# Disable chained assignment warnings, then define 'freq' column as number of times an 'artist_id' appears within df
pd.options.mode.chained_assignment = None 
freqmap = dict(df.groupby('artist_id')['artist_id'].agg(['count']).reset_index().values)
df['freq'] = df['artist_id'].map(freqmap)

In [109]:
# Rearrange column order for convenience
order = [
    'artist', 
    'track', 
    'r_date', 
    'label', 
    'country', 
    'chart_type',
    'latest_pos',
    'days_chart', 
    'freq', 
    'peak_pos', 
    'peak_date', 
    'pos_chg', 
    'hist_pos',  
    'track_id', 
    'artist_id', 
    'album_id', 
    'isrc'
]
df = df[order]

## Part 2: Accessing the Spotify API

In [21]:
import base64
import json
import requests
import secrets
import string
import time

In [86]:
# Credentials (hidden, because this is a public document -- see https://developer.spotify.com/documentation/general/guides/authorization/code-flow/)
client_id = 
client_secret = 
redirect = "http://localhost/"

In [87]:
# Create state value, define scope
state = ''.join(secrets.choice(string.ascii_uppercase + string.digits) for _ in range(16))
scope = ["user-library-read", "user-read-email", "user-read-private", "playlist-modify-private", "playlist-modify-public", "playlist-read-collaborative"]

In [None]:
# Visit link below, authenticate, and obtain authentication code from redirected URL
print("https://accounts.spotify.com/authorize/?client_id=" + client_id + "&response_type=code&redirect_uri=" + redirect + "&state=" + state + "&scope=" + "%20".join(scope))

In [89]:
# Define our token code: paste section of redirected URL after 'code=' and before '&state' below (again, hidden because this is a public document)
code = 

In [90]:
# Endpoints
authURL = 'https://accounts.spotify.com/authorize'
tokenURL = 'https://accounts.spotify.com/api/token'
meURL = 'https://api.spotify.com/v1/me'

# Encode credentials
creds = f'{client_id}:{client_secret}'
creds64 = base64.b64encode(creds.encode())

# Define token data and header
tokenData = {
    'grant_type': "authorization_code",
    'code': code,
    'redirect_uri': 'http://localhost/'
}

tokenHeaders = {
    'Authorization': f'Basic {creds64.decode()}',
    'Content-Type': 'application/x-www-form-urlencoded'
}

In [None]:
# Connect to Spotify API
req = requests.post(tokenURL, data=tokenData, headers=tokenHeaders)
tokenResponseData = req.json()
accessToken = tokenResponseData['access_token']
refreshToken = tokenResponseData['refresh_token']
tokenResponseData

In [92]:
# Define request header
requestheader = {"Accept": "application/json", "Content-Type":"application/json", "Authorization": f"Bearer {accessToken}"}

In [29]:
# Create lists for scraping track data: for each track in dataset, the Spotify URI, its 'popularity', and its release date
track_uri = []
song_pop = []
release_date = []

# Wrap API scraping in try statements to prevent timeouts
for track in df['track_id']:
    try:
        track_result = requests.get(f'https://api.spotify.com/v1/tracks/{track}', headers=requestheader)
        track_data = json.loads(track_result.text) 
        track_uri.append(track_data['uri'])
        song_pop.append(track_data['popularity'])
        release_date.append(track_data['album']['release_date'])
    except KeyError:
        track_result = requests.get(f'https://api.spotify.com/v1/tracks/{track}', headers=requestheader)
        track_data = json.loads(track_result.text) 
        track_uri.append(track_data['uri'])
        song_pop.append(track_data['popularity'])
        release_date.append(track_data['album']['release_date'])
    # Sleep to avoid rate limit
    time.sleep(1)

In [48]:
# For each track in dataset, scrape label that album was released on
label = []

for album in df['album_id']:
    try:
        album_result = requests.get(f'https://api.spotify.com/v1/albums/{album}', headers=requestheader)
        label.append(json.loads(album_result.text)['label'])
    except KeyError:
        album_result = requests.get(f'https://api.spotify.com/v1/albums/{album}', headers=requestheader)
        label.append(json.loads(album_result.text)['label'])
    time.sleep(1)

In [60]:
# For each track in dataset, scrape the artist's 'popularity'
artist_pop = []

for artists in df['artist_id']:
    pops = []
    for artist in artists:
        try:
            artist_result = requests.get(f'https://api.spotify.com/v1/artists/{artist}', headers=requestheader)
            pops.append(json.loads(artist_result.text)['popularity'])
        except KeyError:
            artist_result = requests.get(f'https://api.spotify.com/v1/artists/{artist}', headers=requestheader)
            pops.append(json.loads(artist_result.text)['popularity'])
    artist_pop.append(pops)
    time.sleep(1)

In [111]:
# Fix release date formatting
import datetime
release_date = [date if len(date)==10 else (date + '-01-01') if len(date)==4 else (date + '-01') for date in release_date]
release_date = [datetime.datetime.strptime(date, "%Y-%m-%d").date() for date in release_date]

# Update 'r_date' and 'label' columns
df.loc[:, 'r_date'] = release_date
df.loc[:,'label'] = label

# Create new columns from scraped data
df['track_uri'] = track_uri
df['sp'] = song_pop
df['ap'] = artist_pop

In [112]:
# Drop old rows: i.e. rows that have songs released prior to 2018-06-01
df = df[df['r_date'] >= datetime.date(2018,6,1)]

In [113]:
# Sort dataframe by desired features
df = df.sort_values(['latest_pos','days_chart','freq'], ascending=[True, False, False])

In [114]:
# Rearrange columns
order2 = [
    'artist', 
    'track', 
    'r_date', 
    'label', 
    'sp', 
    'ap', 
    'country', 
    'chart_type',
    'latest_pos',
    'days_chart', 
    'freq', 
    'peak_pos', 
    'peak_date', 
    'pos_chg', 
    'hist_pos', 
    'track_uri' 
]
df = df[order2]

In [115]:
# Delete duplicates in track URIs
df = df.drop_duplicates(subset=['track_uri'])

# Only keep first 40 songs, so our final playlist isn't prohibitively huge
df = df.iloc[0:40,:]

tracks = df['track_uri'].tolist()

In [93]:
# Obtain Spotify User ID
user = requests.get(f'https://api.spotify.com/v1/me', headers=requestheader)
userID = json.loads(user.text)['id']

In [94]:
# Create new playlist, and obtain Playlist ID
playlistinfo = {
    'name':'Test Playlist',
    'description':"This playlist was made via Spotify's API. Hi! :-)"
}
newplaylist = requests.post(f'https://api.spotify.com/v1/users/{userID}/playlists', headers=requestheader, json=playlistinfo)
playlistID = json.loads(newplaylist.text)['uri'][17:]

In [98]:
# Obtain URIs of songs remaining in df, and add songs to playlist
songsinfo = {'uris': tracks}
addsongs = requests.post(f'https://api.spotify.com/v1/users/{userID}/playlists/{playlistID}/tracks', headers=requestheader, json=songsinfo)

In [116]:
# Final df
df

Unnamed: 0,artist,track,r_date,label,sp,ap,country,chart_type,latest_pos,days_chart,freq,peak_pos,peak_date,pos_chg,hist_pos,track_uri
700,Lewis Capaldi,Someone You Loved,2018-11-08,Vertigo Berlin,78,[83],Ireland,regional,1,123,12,1,2019-02-23,0,"139, 184, 172, 124, 134, 144, 165, 172, 165, 1...",spotify:track:2TIlqbIneP0ZY1O0EzYLlc
1500,"Post Malone, Swae Lee",Sunflower - Spider-Man: Into the Spider-Verse,2018-12-14,Universal Records,87,"[93, 83]",United States,regional,1,75,8,1,2018-12-16,3,"1, 1, 1, 1, 1, 1, 1, 1, 20, 20, 2, 2, 2, 2, 2,...",spotify:track:3KkXRkHbMCARz0aVfEt68P
1250,Ariana Grande,7 rings,2019-02-08,Republic Records,87,[96],Canada,regional,1,39,79,1,2019-02-07,0,"1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1...",spotify:track:6ocbgoVGwYJhOv1GgI9NsF
400,"Khalid, Disclosure",Talk,2019-02-07,"Right Hand Music Group, LLC/RCA Records",73,"[90, 77]",New Zealand,regional,1,39,14,1,2019-03-05,0,"50, 37, 24, 17, 9, 8, 7, 6, 6, 6, 6, 6, 6, 6, ...",spotify:track:0rTV5WefWd1J3OwIheTzxM
950,Jonas Brothers,Sucker,2019-03-01,Jonas Brothers Recording,5,[82],Australia,regional,1,18,7,1,2019-03-09,0,"59, 22, 8, 6, 6, 6, 6, 5, 5, 1, 1, 1, 1, 1, 1,...",spotify:track:4y3OI86AEP6PQoDE6olYhO
600,Lil Nas X,Old Town Road,2019-03-14,Columbia,20,[92],Ireland,viral,1,2,14,1,2019-03-15,0,"1, 1",spotify:track:53CJANUxooaqGOtdsBTh7O
1,"Calvin Harris, RagnBone Man",Giant (with Rag'n'Bone Man),2019-01-11,Columbia,76,"[87, 73]",United Kingdom,regional,2,66,9,2,2019-03-09,0,"9, 9, 9, 8, 7, 7, 7, 8, 8, 8, 8, 7, 5, 5, 5, 5...",spotify:track:5itOtNx0WxtJmi1TQ3RuRd
1501,J. Cole,MIDDLE CHILD,2019-01-23,"Dreamville, Inc., Under exclusive license to R...",82,[90],United States,regional,2,53,18,1,2019-01-23,-1,"1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1...",spotify:track:2JvzF1RMd7lE3KmFlsyZD8
201,"Daddy Yankee, Snow",Con Calma,2019-01-24,El Cartel Records (EC3),80,"[89, 67]",Global,regional,2,52,14,2,2019-03-08,0,"119, 114, 114, 91, 74, 64, 56, 42, 31, 29, 28,...",spotify:track:5w9c2J52mkdntKOmRLeM2m
1201,FLETCHER,Undrunk,2019-01-25,Capitol Records,0,[73],United States,viral,2,28,10,2,2019-03-10,0,"29, 27, 24, 26, 25, 22, 17, 22, 16, 14, 14, 9,...",spotify:track:5SHhPFh68OhUmuRPymKX9d


The resultant playlist made from the process outlined in this document can be found at the link below:

https://open.spotify.com/playlist/5wyViSqX2kt3wXd6coDX83?si=50fa41e5febd4d67

Thanks for reading, and enjoy your music streaming!