# BonnaRythms project

I wanted to see if I could make a playlist for Bonnaroo 2019 using the Spotify API (in Python) and some unsupervised machine learning (in R)

In [1]:
SPOTIFY_CLIENT_ID='#'
SPOTIFY_CLIENT_SECRET='#'
SPOTIFY_REDIRECT_URI='http://localhost:8888/callback/'
SPOTIFY_USER_ID='abc' # id of user creating the playlist

In [2]:
# Imports for R magics

# import warnings
# warnings.filterwarnings('ignore')
# %load_ext rpy2.ipython

# # Install R packages from CRAN
# from rpy2.robjects.packages import importr
# utils = importr('utils')
# utils.chooseCRANmirror(ind=1)
# utils.install_packages('tidyverse')
# utils.install_packages('cluster')

In [3]:
import pandas as pd
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

# Spotify authorization
scope = 'playlist-modify-public'
username = SPOTIFY_USER_ID
token = util.prompt_for_user_token(username, scope, SPOTIFY_CLIENT_ID, SPOTIFY_CLIENT_SECRET, SPOTIFY_REDIRECT_URI)
spotify = spotipy.Spotify(auth=token)

### Matching Bonnaroo 2019 artists to their Spotify IDs

Using the spotipy package, I find the URI associated with each artist in Spotify (if they have one)

In [4]:
# Type in the names of artists as they appear on the lineup
artist_names = ['GRAND OLE OPRY',
               'SABA',
               'SPACE JESUS',
               'EPROM',
               'SHLUMP',
               '12TH PLANET',
               'SUNSQUABI',
               'ALL THEM WITCHES',
               'MAGIC CITY HIPPIES',
               'THE NUDE PARTY',
               'ROLLING BLACKOUTS COASTAL FEVER',
               'THE COMET IS COMING',
               'JACK HARLOW',
               'CAROLINE ROSE',
               'DONNA MISSAL',
               'PEACH PIT',
               'HEKLER',
               'DORFEX BOS',
               'PHISH',
               'CHILDISH GAMBINO',
               'SOLANGE',
               'THE AVETT BROTHERS',
               'BROCKHAMPTON',
               'GRIZ',
               'RL GRIME',
               'BEACH HOUSE',
               'NGHTMRE',
               'GOJIRA',
               'COURTNEY BARNETT',
               'GIRL TALK',
               'AJR',
               'CATFISH AND THE BOTTLEMEN',
               'K.FLAY',
               'ANOUSHKA SHANKAR',
               'NAHKO & MEDICINE FOR THE PEOPLE',
               'LIQUID STRANGER',
               'DEAFHEAVEN',
               'PARQUET COURTS',
               'RIVAL SONS',
               'IBEYI',
               'JADE CICADA',
               'LAS CAFETERAS',
               'CHERRY GLAZERR',
               'THE TESKEY BROTHERS',
               'MEDASIN',
               'TYLA YAWEH',
               # 'DUCKY', # returns 'the rubber ducky band'
               'MONSIEUR PERINE',
               'MERSIV',
               'CROOKED COLOURS',
               'POST MALONE',
               'ODESZA',
               'HOZIER',
               'KACEY MUSGRAVES',
               'THE NATIONAL',
               'THE LONELY ISLAND',
               'ZHU',
               'JOHN PRINE',
               'JUICE WRLD',
               # "JOE RUSSO'S ALMOST DEAD", # weird apostrophe
               'GUCCI MANE',
               'JIM JAMES',
               'MAREN MORRIS',
               'GRAMATIK',
               'SHOVELS & ROPE',
               'UNKNOWN MORTAL ORCHESTRA',
               'QUINN XCII',
               'CLAIRO',
               'BISHOP BRIGGS',
               'HIPPO CAMPUS',
               'SPACE JESUS',
               'TOKIMONSTA',
               'CHELSEA CUTLER',
               'THE RECORD COMPANY',
               'SNBRN',
               'RUSTON KELLY',
               'WHIPPED CREAM',
               'RUBBLEBUCKET',
               'LITTLE SIMZ',
               'MEMBA',
               'DEVA MAHAL',
               'DJ MEL',
               'THE LUMINEERS',
               'CARDI B',
               'BRANDI CARLILE',
               'ILLENIUM',
               'WALK THE MOON',
               'MAC DEMARCO',
               'KING PRINCESS',
               'LIL DICKY',
               'G JONES',
               'TRAMPLED BY TURTLES',
               'THE WOOD BROTHERS',
               # 'HOBO JOHNSON AND THE LOVEMAKERS', # no results...
               # 'PRINCESS', # no results...
               'MAYA RUDOLPH',
               'GRETCHEN LIEBERUM',
               'THE SOUL REBELS',
               'THE LEMON TWIGS',
               'TWO FEET',
               'AC SLATER',
               'CID',
               'DOMBRESKY',
               'BOMBINO',
               'FAYE WEBSTER',
               'RIPE',
               'KIKAGAKU MOYO',
               'IGLOOGHOST']



In [5]:
artists = {}

for name in artist_names:
    results = spotify.search(q='artist:' + name, type='artist')
    items = results['artists']['items']
    try:
        artists[items[0]['name']] = items[0]['id'] # add key-value pair of each artist name and their spotify ID
    except:
        print(name)

print('Identified '+str(len(artists))+' artists')


Identified 103 artists


### Identifying top 10 tracks for each artist

Each artist contributes their 10 top tracks to the "pool" of music that will be considered for my playlist

In [6]:
tracks = []

for value in artists:
    uri = 'spotify:artist:'+artists[value]
    results = spotify.artist_top_tracks(uri)
    for track in results['tracks'][:10]:
        tracks.append('spotify:track:'+track['id']) # append the URIs of each artist's top tracks

print('Selected '+str(len(tracks))+' songs')

Selected 1009 songs


### Gather audio features for all Roo tracks

Collect and posture data attributes for all of the track URIs

In [7]:
# Obtain audio features
features = pd.DataFrame()
for track_chunk in [tracks[i:i + 20] for i in range(0, len(tracks), 20)]: # process in chunks of 20, to circumvent API limits
    features = features.append(pd.DataFrame(spotify.audio_features(track_chunk)))

features.to_csv('features.csv')
features.describe()

Unnamed: 0,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence
count,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0,1009.0
mean,0.23106,0.594934,235191.617443,0.647522,0.143896,5.393459,0.194028,-6.998004,0.616452,0.088621,121.050939,3.923687,0.434965
std,0.268624,0.159115,70459.726436,0.200614,0.270289,3.558022,0.168174,2.969636,0.486491,0.088964,29.828652,0.349495,0.22892
min,3e-06,0.106,42000.0,0.0128,0.0,0.0,0.0188,-25.623,0.0,0.0229,60.352,1.0,0.0286
25%,0.0175,0.488,194067.0,0.511,4e-06,2.0,0.0952,-8.153,0.0,0.0363,95.981,4.0,0.246
50%,0.116,0.603,223948.0,0.672,0.00151,6.0,0.122,-6.507,1.0,0.0509,121.993,4.0,0.426
75%,0.354,0.713,260088.0,0.806,0.101,9.0,0.231,-5.054,1.0,0.0981,142.621,4.0,0.603
max,0.994,0.965,737408.0,0.999,0.972,11.0,0.99,1.28,1.0,0.501,219.331,5.0,0.969


### In R, identify 100 representative songs from the universe of Bonnaroo music

Using PCA and PAM clustering to identify 100 songs that best represent the auditory attributes of artist tracks at Roo '19

In [8]:
# %%R -i features, -o representatives

# # drop duplicate tracks
# features = read.csv('/Users/i868290/Documents/bonnarhythm/features.csv' )
# features = features[!duplicated(features['uri']),]

# # extract auditory features with track uri as row name
# audio = features[c('acousticness', 'danceability', 'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness', 'tempo', 'valence')]
# rownames(audio) = unlist(features['uri'])

# # standardize data and apply principal components analysis (90% variance explained)
# audio_std = data.frame(apply(audio,2,function(x){center = min(x); spread = max(x) - min(x);list = (x - center)/spread;}))
# audio_pca = prcomp(audio_std)

# # create euclidian distance matrix
# pca_cutoff = function(pca_sdev, cutoff=.9){
#   num_pc = 0;
#   for(i in 1:length(pca_sdev)) {
#     num_pc = num_pc+1;
#     if((sum(pca_sdev[1:i]^2)/sum(pca_sdev^2)) > cutoff) return(num_pc);
#   }
# }
# df_pca = data.frame(pca$x[,1:pca_cutoff(pca$sdev, cutoff=.9)]) # use scores of principal components for clustering
# dist = as.matrix(dist(df_pca,method = "euclidean"))

# # identify 100 representative songs through PAM clustering
# pam_fit = pam(dist, diss = TRUE, k = 100)
# representatives = pam_fit$medoids

In [9]:
# Machine learning
representatives = pd.read_csv('representatives.csv')
representatives.drop_duplicates(inplace=True)
representatives.dropna(inplace=True)
representatives.reset_index(inplace=True, drop=True)
bonnarhythms = representatives['x']
print('Identified '+str(len(bonnarhythms))+' representative Bonnarhythms')

Identified 100 representative Bonnarhythms


### Generate a playlist of 100 Bonnarhythms to share with friends

Creating the perfect Bonnaroo playlist to jam along with on the drive to Manchester, TN

In [10]:
playlist_name = 'Bonnarhythms 2019'
playlist_description = 'Representative rhythms from the Bonnaroo 2019 lineup'

playlist_json = spotify.user_playlist_create(SPOTIFY_USER_ID, playlist_name)

for track_chunk in [bonnarhythms[i:i + 20] for i in range(0, len(bonnarhythms), 20)]:
    spotify.user_playlist_add_tracks(SPOTIFY_USER_ID, playlist_id=playlist_json['id'], tracks=track_chunk)
    