Rickord, Jake JJ<br>
Project 3: Spotify Song Classifications<br>

Import needed packages

In [118]:
import os
import pandas as pd
import numpy as np
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support as score

Authenticate requests

In [119]:
scope = "user-library-read"

auth_manager = SpotifyClientCredentials(client_id='######', client_secret='#########')
sp = spotipy.Spotify(client_credentials_manager = auth_manager)

Extract track listings from liked playlist

In [120]:
track_uris = []
playlist_uri='3gfQ9CScIyMxMZwphYM4cZ'
pl_pull = sp.playlist_tracks(playlist_uri)['items']
for track in pl_pull:
    track_uris.append(track['track']['uri'])

In [121]:
print(track_uris)

['spotify:track:36lV4io8Gc69fDinNaazAg', 'spotify:track:0YOOXnCJihgyluizqhAcrz', 'spotify:track:49YQ9Mq8CKelQsmjdNiICu', 'spotify:track:1vXHRgrkWnCMab8dhVPh0f', 'spotify:track:3h5AZDf5z7D18plaLtHTfi', 'spotify:track:1tQGRq2WOBXjL3JWdWMONg', 'spotify:track:3UCnS94uOqgvTmHohbgH9W', 'spotify:track:1WSEWiuB7KeEchBhvRb7qX', 'spotify:track:4z5kF7eIQhc6cvdb2MxEDD', 'spotify:track:1kStecVGAvK2RLhf0S9ywc', 'spotify:track:5tN0LIuQDapSFedlYJuG0k', 'spotify:track:3R6yNicsZrWF8ybl02imcB', 'spotify:track:0YnbCBn2UdoM5Y1l6Am5EY', 'spotify:track:4a2uqVlpRChHj32EjJLu7G', 'spotify:track:7rwKNqgVO0OfdOLNSKfYR4', 'spotify:track:4Oz6xl8qhJTdjYZeT98V4m', 'spotify:track:48nZLOpJOINkICPWpUuhaN', 'spotify:track:6P7kZB7hSJFyaYJcu3cYQJ', 'spotify:track:0Ld3o5VR3A5c8Je92xFG4e', 'spotify:track:3BCsNM2w6HcCbrVWlhjTpk', 'spotify:track:3Ozx6IrGdoQyAworJzvBDE', 'spotify:track:3t5UYY2uXE6CIEd6FYXrho', 'spotify:track:5Os2lSlPvvc40Wmun45Tz8', 'spotify:track:62EcdBCqjB62CxnhgBkkJT', 'spotify:track:5XpJVr7FpYkZrgPCHbFF4E',

In [122]:
print(len(track_uris))

100


Looks like we only grabbed the first 100, let's try to grab the rest of the entries as well, looks like from documentation on spotipy we'll need to continue with other pages of return.

In [123]:
track_uris = []
playlist_uri='3gfQ9CScIyMxMZwphYM4cZ'
#grabs return
pl_pull = sp.playlist_tracks(playlist_uri)
#page 1 of return
track_list = pl_pull['items']
#grabs all pages, adds them to track_list
while pl_pull['next']:
    pl_pull = sp.next(pl_pull)
    track_list.extend(pl_pull['items'])
#just grab the uris from it
for track in track_list:
    track_uris.append(track['track']['uri'])

In [124]:
print(len(track_uris))

568


Now it looks like we've grabbed the full playlist. Let's grab their sound details now for each track, and add these details into a dataframe.

First let's set up the dataframe with the style we'd like it in

In [125]:
track_features = list(sp.audio_features(track_uris[0])[0].keys())
track_features.insert(0, 'track_uri')
track_features.append('Liked?')
df = pd.DataFrame(columns = track_features)
print(df)

Empty DataFrame
Columns: [track_uri, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, type, id, uri, track_href, analysis_url, duration_ms, time_signature, Liked?]
Index: []


Looks great, now let's add our songs to it

In [126]:
for track in track_uris:
    i_tf = list(sp.audio_features(track)[0].values())
    i_tf.insert(0, track[track.index('track:')+len('track:'):])
    i_tf.append(1)
    df.loc[len(df)] = i_tf
print(df)

                  track_uri  danceability  energy  key  loudness  mode  \
0    36lV4io8Gc69fDinNaazAg         0.591   0.411    9    -7.496     1   
1    0YOOXnCJihgyluizqhAcrz         0.495   0.434    2    -8.079     1   
2    49YQ9Mq8CKelQsmjdNiICu         0.470   0.220    3   -11.727     1   
3    1vXHRgrkWnCMab8dhVPh0f         0.490   0.180    7   -13.332     1   
4    3h5AZDf5z7D18plaLtHTfi         0.632   0.254    3   -10.898     1   
..                      ...           ...     ...  ...       ...   ...   
563  1UVaHJQCmYxBK4XpYnYazU         0.519   0.691    9   -10.254     0   
564  4ma2AbPdsy5ijCyJn16mAv         0.515   0.418    2   -11.794     1   
565  5i4cie6F2AD20KiZ3XxUTq         0.371   0.640    9    -9.154     0   
566  3e9dlCDa68YOOJ5kFSiEdp         0.524   0.733    0    -7.390     1   
567  77R5bi9hfAUKZmu6tjOcxg         0.470   0.133   11   -13.678     1   

     speechiness  acousticness  instrumentalness  liveness  valence    tempo  \
0         0.0299        0.7130 

Looking spectacular. Now we have our dataset for songs we like. Now let's grab a dataset of songs we dislike. We've sampled some here in a collection: "6kT6nn9Vd1e52XRG7OS2IE"

In [127]:
bad_track_uris = []
bad_playlist_uri='6kT6nn9Vd1e52XRG7OS2IE'
bad_pl_pull = sp.playlist_tracks(bad_playlist_uri)['items']
for track in bad_pl_pull:
    bad_track_uris.append(track['track']['uri'])

And let's add these to that dataframe

In [128]:
for track in bad_track_uris:
    i_tf = list(sp.audio_features(track)[0].values())
    i_tf.insert(0, track[track.index('track:')+len('track:'):])
    i_tf.append(0)
    df.loc[len(df)] = i_tf
print(df)

                  track_uri  danceability  energy  key  loudness  mode  \
0    36lV4io8Gc69fDinNaazAg         0.591   0.411    9    -7.496     1   
1    0YOOXnCJihgyluizqhAcrz         0.495   0.434    2    -8.079     1   
2    49YQ9Mq8CKelQsmjdNiICu         0.470   0.220    3   -11.727     1   
3    1vXHRgrkWnCMab8dhVPh0f         0.490   0.180    7   -13.332     1   
4    3h5AZDf5z7D18plaLtHTfi         0.632   0.254    3   -10.898     1   
..                      ...           ...     ...  ...       ...   ...   
613  1LJYn86ysceH708AIkw0VZ         0.975   0.482    9    -7.940     1   
614  7FERnkDYqD9D9DJdNjQlnf         0.761   0.246    0   -11.676     1   
615  4F0flTfcEU2ZcStYhJsaRy         0.862   0.717    9    -7.158     1   
616  15hJmqqEtASVXl6sM7i4UF         0.615   0.600   10    -5.620     1   
617  0lTqQTHcYb0RzGa5STjVrf         0.821   0.714    2    -5.977     1   

     speechiness  acousticness  instrumentalness  liveness  valence    tempo  \
0         0.0299       0.71300 

And now our dataset is complete with song info! Let's drop some of those auxilliary columns that aren't related to song information now.

In [129]:
df.columns

Index(['track_uri', 'danceability', 'energy', 'key', 'loudness', 'mode',
       'speechiness', 'acousticness', 'instrumentalness', 'liveness',
       'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 'analysis_url',
       'duration_ms', 'time_signature', 'Liked?'],
      dtype='object')

In [130]:
df = df.drop(['id', 'type', 'uri', 'track_href', 'analysis_url'], axis=1)

In [131]:
print(df)

                  track_uri  danceability  energy  key  loudness  mode  \
0    36lV4io8Gc69fDinNaazAg         0.591   0.411    9    -7.496     1   
1    0YOOXnCJihgyluizqhAcrz         0.495   0.434    2    -8.079     1   
2    49YQ9Mq8CKelQsmjdNiICu         0.470   0.220    3   -11.727     1   
3    1vXHRgrkWnCMab8dhVPh0f         0.490   0.180    7   -13.332     1   
4    3h5AZDf5z7D18plaLtHTfi         0.632   0.254    3   -10.898     1   
..                      ...           ...     ...  ...       ...   ...   
613  1LJYn86ysceH708AIkw0VZ         0.975   0.482    9    -7.940     1   
614  7FERnkDYqD9D9DJdNjQlnf         0.761   0.246    0   -11.676     1   
615  4F0flTfcEU2ZcStYhJsaRy         0.862   0.717    9    -7.158     1   
616  15hJmqqEtASVXl6sM7i4UF         0.615   0.600   10    -5.620     1   
617  0lTqQTHcYb0RzGa5STjVrf         0.821   0.714    2    -5.977     1   

     speechiness  acousticness  instrumentalness  liveness  valence    tempo  \
0         0.0299       0.71300 

Okay, now with our dataset all rev'd up, time to split it into training and test sets!

In [169]:
X = df.drop(['track_uri', 'Liked?'], axis=1)
y = df['Liked?']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
print(X_train)
print(y_train)

     danceability  energy  key  loudness  mode  speechiness  acousticness  \
89          0.575   0.765    9    -6.608     1       0.0345       0.00703   
285         0.484   0.793    2    -6.797     1       0.2110       0.00069   
310         0.625   0.715    7    -4.539     0       0.0942       0.04130   
367         0.529   0.934    0    -4.808     1       0.0602       0.00153   
46          0.410   0.233    0   -12.878     1       0.0286       0.83100   
..            ...     ...  ...       ...   ...          ...           ...   
277         0.674   0.204    5   -13.535     1       0.0384       0.88700   
9           0.544   0.224    7   -14.410     1       0.0351       0.78100   
359         0.397   0.903    2    -4.577     1       0.0457       0.00575   
192         0.708   0.413    7   -10.856     1       0.1010       0.78300   
559         0.499   0.160    7   -13.632     1       0.0362       0.89900   

     instrumentalness  liveness  valence    tempo  duration_ms  time_signat

Alrighty, now with that being broken up, we shall train our model and evaluate

Let's try out Logistic Regression, which will likely make the most sense considering our output of Liked? is binary, either yes or no.

In [170]:
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)

In [171]:
print(classification_report(y_test,predictions));

              precision    recall  f1-score   support

           0       0.00      0.00      0.00        13
           1       0.92      1.00      0.96       142

    accuracy                           0.92       155
   macro avg       0.46      0.50      0.48       155
weighted avg       0.84      0.92      0.88       155



Interesting, it predicted that I don't dislike any songs. My tastes are decently diverse, but I think this is more likely due to the severe imbalance between the size of the liked songs playlist (about 550) and the size of the disliked songs playlist (about 60). Let's see if we can't use another algorithm to take into account some of this imbalance. The first one that comes to mind if the SVC model.

In [172]:
from sklearn.svm import SVC

In [173]:
svcmodel = SVC(kernel='linear', class_weight = {0:0.92, 1:0.08})
svcmodel.fit(X_train, y_train)
predictions = svcmodel.predict(X_test)
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.08      0.62      0.14        13
           1       0.91      0.37      0.52       142

    accuracy                           0.39       155
   macro avg       0.50      0.49      0.33       155
weighted avg       0.84      0.39      0.49       155



Looks like even with weightings inverse to their proportions our f-1 and accuracy score aren't great. Let's test a few different weights out and see what the ideal weighting is and the resulting f-1 score.

In [174]:
weight0 = 0
weight1 = 1
bestweight0 = 0
besweight1 = 1
bestf1 = 0
for asweight in np.arange(0, 1, 0.05):
    svcmodel = SVC(kernel='linear', class_weight = {0:weight0+asweight, 1:weight1-asweight})
    svcmodel.fit(X_train, y_train)
    predictions = svcmodel.predict(X_test)
    precision,recall,fscore,support=score(y_test,predictions,average='macro')
    if fscore>bestf1:
        bestf1=fscore
        bestweight0 = weight0+asweight
        bestweight1 = weight1-asweight

print(bestf1)
print(bestweight0)
print(bestweight1)

0.4894598155467721
0.8500000000000001
0.1499999999999999


In [181]:
svcmodel = SVC(kernel = 'linear', class_weight = {0:0.85, 1:0.15})
svcmodel.fit(X_train, y_train)
predictions = svcmodel.predict(X_test)
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.08      0.15      0.11        13
           1       0.92      0.84      0.88       142

    accuracy                           0.78       155
   macro avg       0.50      0.50      0.49       155
weighted avg       0.85      0.78      0.81       155



In [153]:
import warnings
warnings.filterwarnings('ignore')

Looks like even with some balancing involved, we're still getting model outputs with best accuracy when disliked songs are simply excluded. Let's try taking a random sampling of our liked songs to try to even out the number of songs liked versus disliked.

In [175]:
bdf = pd.DataFrame(columns = df.columns)

Grab random 100 rows from liked playlist

In [176]:
bdf = df.loc[df['Liked?']==1].sample(n=50)
bdf

Unnamed: 0,track_uri,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,Liked?
104,6mPpHS9ocuKjsehu8NgD7r,0.43,0.501,9,-8.864,1,0.0378,0.735,0.00254,0.111,0.407,115.027,308373,4,1
521,4urpFW2JCm486dHJmsQ8Ps,0.607,0.336,3,-11.854,1,0.0284,0.603,8e-06,0.101,0.337,86.042,198219,4,1
526,7c6TRb4cvgHWVhTJDfRqth,0.598,0.898,10,-5.155,1,0.113,0.0433,0.0,0.0976,0.422,77.639,211933,4,1
38,5AeoHJUx0PJXAzN425xryh,0.413,0.1,0,-16.349,1,0.0332,0.784,0.381,0.0819,0.186,110.696,184907,3,1
469,0M3adYbGtyRHACP86dey1H,0.426,0.888,0,-3.72,0,0.0987,0.000455,0.0,0.306,0.387,144.111,191520,4,1
153,3VGjFO89Sh53UjTRGJXWmi,0.529,0.497,2,-8.979,1,0.0436,0.196,4e-06,0.101,0.449,135.315,285307,4,1
424,1Jj6MF0xDOMA3Ut2Z368Bx,0.724,0.436,0,-9.321,1,0.0282,0.576,1e-06,0.0908,0.324,130.439,243067,4,1
102,2Zgnaip1c876zmBhz9HifI,0.544,0.745,3,-3.401,1,0.0276,0.439,0.0,0.348,0.455,103.971,195547,4,1
248,2gA74HvN6NKFrhgzpd5oNE,0.645,0.486,0,-7.299,1,0.0425,0.274,0.0,0.135,0.453,72.536,208187,4,1
191,5zXNvJgCNNqmC262XJp8TW,0.663,0.649,9,-4.808,1,0.155,0.0862,1e-06,0.0744,0.121,85.497,201253,4,1


In [177]:
bdf = bdf.append(df.loc[df['Liked?']==0], ignore_index=True)
bdf

Unnamed: 0,track_uri,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,Liked?
0,6mPpHS9ocuKjsehu8NgD7r,0.430,0.501,9,-8.864,1,0.0378,0.735000,0.002540,0.1110,0.4070,115.027,308373,4,1
1,4urpFW2JCm486dHJmsQ8Ps,0.607,0.336,3,-11.854,1,0.0284,0.603000,0.000008,0.1010,0.3370,86.042,198219,4,1
2,7c6TRb4cvgHWVhTJDfRqth,0.598,0.898,10,-5.155,1,0.1130,0.043300,0.000000,0.0976,0.4220,77.639,211933,4,1
3,5AeoHJUx0PJXAzN425xryh,0.413,0.100,0,-16.349,1,0.0332,0.784000,0.381000,0.0819,0.1860,110.696,184907,3,1
4,0M3adYbGtyRHACP86dey1H,0.426,0.888,0,-3.720,0,0.0987,0.000455,0.000000,0.3060,0.3870,144.111,191520,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,1LJYn86ysceH708AIkw0VZ,0.975,0.482,9,-7.940,1,0.2460,0.007160,0.000000,0.0596,0.7370,125.953,146598,4,0
96,7FERnkDYqD9D9DJdNjQlnf,0.761,0.246,0,-11.676,1,0.2450,0.132000,0.000000,0.0800,0.2260,130.012,132989,4,0
97,4F0flTfcEU2ZcStYhJsaRy,0.862,0.717,9,-7.158,1,0.4600,0.166000,0.000000,0.1970,0.7750,77.620,137806,4,0
98,15hJmqqEtASVXl6sM7i4UF,0.615,0.600,10,-5.620,1,0.2700,0.107000,0.000002,0.2830,0.0661,130.027,270671,4,0


And now that we have a our balanced sample, let's proceed with our data split

In [178]:
bX = bdf.drop(['track_uri', 'Liked?'], axis=1)
by = bdf['Liked?']
bX_train,bX_test,by_train,by_test=train_test_split(bX,by,test_size=0.25,random_state=0)
print(X_train)
print(y_train)

     danceability  energy  key  loudness  mode  speechiness  acousticness  \
89          0.575   0.765    9    -6.608     1       0.0345       0.00703   
285         0.484   0.793    2    -6.797     1       0.2110       0.00069   
310         0.625   0.715    7    -4.539     0       0.0942       0.04130   
367         0.529   0.934    0    -4.808     1       0.0602       0.00153   
46          0.410   0.233    0   -12.878     1       0.0286       0.83100   
..            ...     ...  ...       ...   ...          ...           ...   
277         0.674   0.204    5   -13.535     1       0.0384       0.88700   
9           0.544   0.224    7   -14.410     1       0.0351       0.78100   
359         0.397   0.903    2    -4.577     1       0.0457       0.00575   
192         0.708   0.413    7   -10.856     1       0.1010       0.78300   
559         0.499   0.160    7   -13.632     1       0.0362       0.89900   

     instrumentalness  liveness  valence    tempo  duration_ms  time_signat

And now our modelings to see if that performance is better.

In [179]:
blogmodel = LogisticRegression()
blogmodel.fit(bX_train,by_train)
bpredictions = blogmodel.predict(bX_test)
print(classification_report(by_test,bpredictions));

              precision    recall  f1-score   support

           0       0.47      0.67      0.55        12
           1       0.50      0.31      0.38        13

    accuracy                           0.48        25
   macro avg       0.49      0.49      0.47        25
weighted avg       0.49      0.48      0.46        25



In [180]:
bsvcmodel = SVC(kernel='linear')
bsvcmodel.fit(bX_train, by_train)
bpredictions = bsvcmodel.predict(bX_test)
print(classification_report(by_test,bpredictions))

              precision    recall  f1-score   support

           0       0.33      0.25      0.29        12
           1       0.44      0.54      0.48        13

    accuracy                           0.40        25
   macro avg       0.39      0.39      0.38        25
weighted avg       0.39      0.40      0.39        25



Now we have quite a few applications of classification models, which seems to be most accurate? The short answer is none of them are fantastic. However, the best of those shown depends on if FPs or FNs matter most. In our case neither really are all that great, we either listen to a song that we don't like or we miss songs we like that aren't recommended. I'd say as a result, the F-1 Score is the most important element of that evaluation. As such, we'd default to picking the modeling with the best F-1. This would be the SVC model with the weightings of 0.85 for our disliked and .15 for our liked songs to offset the imbalance between playlists.