#### Christopher Kramer
#### 2021-01-04
## Spotify energy score via KNN

In [1]:
import pandas as pd
import numpy as np
from itertools import repeat
from typing import Union
import seaborn as sns
from collections import Counter

Find your own dataset suitable for classification or regression with at least three input variables and 200 or more cases: Depending on the target variable of interest, you would build a k-nearest neighbor classifier or regressor using the appropriate sklearn estimator. Find some interesting unique dataset that is not popularly used in the internet. 
Address the following and include code/output snippets from b) to f). Include the response under each sub question. 


a)	State your research question 


Can Spotify's "energy" score (user-generated aggregate metric) be predicted from Spotify song metadata using KNN regression?

Data from: https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks

b)	Data pre-processing (to the extent deemed necessary: remember the knn algorithm depends on distances, so you need to rescale, normalize or standardize your input values to make sure no variable influences the predictions due to it scale). 


In [56]:
spotify = pd.read_csv('data.csv')

In [57]:
spotify.sample(5)

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,valence,year
18674,0.22,"['Ty Dolla $ign', 'The Weeknd', 'Wiz Khalifa',...",0.805,242983,0.33,1,7t2bFihaDvhIrd2gn2CWJO,0.0,1,0.105,-8.712,0,"Or Nah (feat. The Weeknd, Wiz Khalifa & DJ Mus...",79,2014-06-10,0.1,121.97,0.211,2014
29587,0.0574,['Rod Stewart'],0.546,359733,0.859,0,7cYuolexJpsjD91G6i7XIm,0.0,7,0.0855,-9.745,1,Every Picture Tells A Story,47,1971-05-18,0.0401,143.677,0.742,1971
15551,0.422,"['Billy Bragg', 'Wilco']",0.622,298533,0.736,0,38paDDziQ57k1f4VfKTeGk,1.2e-05,9,0.0829,-9.229,1,California Stars,61,1998,0.0292,110.24,0.723,1998
45074,0.629,"['Richard Strauss', 'Fritz Reiner', 'Chicago S...",0.199,93587,0.207,0,2BdJKDe828AtUAEuZa50ZT,0.793,5,0.0634,-13.703,1,"Also sprach Zarathustra, Op. 30: I. Sunrise",13,1954,0.0332,132.457,0.0839,1954
120806,0.564,['Queen'],0.638,262133,0.576,0,0sA5xCFx2bF3jrz5Y5r0m1,6e-06,9,0.156,-14.736,0,I'm Going Slightly Mad,30,1991-02-05,0.0532,115.96,0.364,1991


Check nulls

In [58]:
spotify.isna().sum()

acousticness        0
artists             0
danceability        0
duration_ms         0
energy              0
explicit            0
id                  0
instrumentalness    0
key                 0
liveness            0
loudness            0
mode                0
name                0
popularity          0
release_date        0
speechiness         0
tempo               0
valence             0
year                0
dtype: int64

Drop string columns

In [59]:
spotify.dtypes

acousticness        float64
artists              object
danceability        float64
duration_ms           int64
energy              float64
explicit              int64
id                   object
instrumentalness    float64
key                   int64
liveness            float64
loudness            float64
mode                  int64
name                 object
popularity            int64
release_date         object
speechiness         float64
tempo               float64
valence             float64
year                  int64
dtype: object

In [60]:
spotify = spotify.select_dtypes(['float64', 'int64'])

In [61]:
spotify.dtypes

acousticness        float64
danceability        float64
duration_ms           int64
energy              float64
explicit              int64
instrumentalness    float64
key                   int64
liveness            float64
loudness            float64
mode                  int64
popularity            int64
speechiness         float64
tempo               float64
valence             float64
year                  int64
dtype: object

Baseline 'year' column

In [62]:
spotify['year'] = pd.Timestamp.now().year - spotify['year']

Get dummies

In [63]:
spotify = pd.get_dummies(spotify, columns = ['explicit', 'key', 'mode'])

In [64]:
spotify.dtypes

acousticness        float64
danceability        float64
duration_ms           int64
energy              float64
instrumentalness    float64
liveness            float64
loudness            float64
popularity            int64
speechiness         float64
tempo               float64
valence             float64
year                  int64
explicit_0            uint8
explicit_1            uint8
key_0                 uint8
key_1                 uint8
key_2                 uint8
key_3                 uint8
key_4                 uint8
key_5                 uint8
key_6                 uint8
key_7                 uint8
key_8                 uint8
key_9                 uint8
key_10                uint8
key_11                uint8
mode_0                uint8
mode_1                uint8
dtype: object

Scale Dummies

In [65]:
spotify[spotify.select_dtypes('uint8').columns] = spotify[spotify.select_dtypes('uint8').columns].replace({0:-1})

Scale Floats

In [66]:
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.preprocessing import MinMaxScaler

In [67]:
X = spotify[spotify.columns[~spotify.columns.isin(['energy'])]]
y = spotify['energy']

In [68]:
X

Unnamed: 0,acousticness,danceability,duration_ms,instrumentalness,liveness,loudness,popularity,speechiness,tempo,valence,...,key_4,key_5,key_6,key_7,key_8,key_9,key_10,key_11,mode_0,mode_1
0,0.991000,0.598,168333,0.000522,0.3790,-12.628,12,0.0936,149.976,0.6340,...,-1,1,-1,-1,-1,-1,-1,-1,1,-1
1,0.643000,0.852,150200,0.026400,0.0809,-7.261,7,0.0534,86.889,0.9500,...,-1,1,-1,-1,-1,-1,-1,-1,1,-1
2,0.993000,0.647,163827,0.000018,0.5190,-12.098,4,0.1740,97.600,0.6890,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,1
3,0.000173,0.730,422087,0.801000,0.1280,-7.311,17,0.0425,127.997,0.0422,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,1
4,0.295000,0.704,165224,0.000246,0.4020,-6.036,2,0.0768,122.076,0.2990,...,-1,-1,-1,-1,-1,-1,1,-1,1,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
174384,0.009170,0.792,147615,0.000060,0.1780,-5.089,0,0.0356,125.972,0.1860,...,-1,-1,1,-1,-1,-1,-1,-1,1,-1
174385,0.795000,0.429,144720,0.000000,0.1960,-11.665,0,0.0360,94.710,0.2280,...,1,-1,-1,-1,-1,-1,-1,-1,-1,1
174386,0.806000,0.671,218147,0.920000,0.1130,-12.393,0,0.0282,108.058,0.7140,...,1,-1,-1,-1,-1,-1,-1,-1,1,-1
174387,0.920000,0.462,244000,0.000000,0.1130,-12.077,69,0.0377,171.319,0.3200,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,1


In [69]:
ct = make_column_transformer((StandardScaler(), X.columns[:list(X.columns).index('year')+1]),remainder='passthrough')

In [71]:
X = pd.DataFrame(ct.fit_transform(X), columns=X.columns)
X

Unnamed: 0,acousticness,danceability,duration_ms,instrumentalness,liveness,loudness,popularity,speechiness,tempo,valence,...,key_4,key_5,key_6,key_7,key_8,key_9,key_10,key_11,mode_0,mode_1
0,1.294358,0.347919,-0.434495,-0.588004,0.930106,-0.154111,-0.626050,-0.066549,1.089753,0.413903,...,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0
1,0.378411,1.790898,-0.556689,-0.510657,-0.721489,0.788862,-0.854645,-0.287113,-0.995485,1.608718,...,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0
2,1.299622,0.626289,-0.464860,-0.589511,1.705763,-0.060991,-0.991803,0.374580,-0.641450,0.621861,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0
3,-1.313529,1.097814,1.275491,1.804534,-0.460536,0.780077,-0.397454,-0.346918,0.363273,-1.823729,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0
4,-0.537536,0.950107,-0.455446,-0.588829,1.057535,1.004092,-1.083241,-0.158725,0.167564,-0.852753,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0,1.0,-1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
174384,-1.289849,1.450037,-0.574108,-0.589385,-0.183516,1.170478,-1.174679,-0.384776,0.296340,-1.280013,...,-1.0,-1.0,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0
174385,0.778480,-0.612173,-0.593617,-0.589564,-0.083788,0.015086,-1.174679,-0.382582,-0.736975,-1.121208,...,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0
174386,0.807432,0.762634,-0.098811,2.160212,-0.543642,-0.112822,-1.174679,-0.425378,-0.295778,0.716387,...,1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0,-1.0
174387,1.107484,-0.424699,0.075406,-0.589564,-0.543642,-0.057301,1.979941,-0.373254,1.795212,-0.773351,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0


c)	Data splitting 


In [72]:
from sklearn.model_selection import train_test_split

In [73]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)

In [74]:
X_train.shape, y_train.shape

((139511, 27), (139511,))

In [75]:
X_test.shape, y_test.shape

((34878, 27), (34878,))

d)	Model construction 


In [76]:
knn = KNeighborsRegressor(n_jobs=-1)

In [77]:
knn.fit(X_train, y_train)

KNeighborsRegressor(n_jobs=-1)

Base metrics

In [78]:
from sklearn.metrics import r2_score

In [79]:
y_pred_train = knn.predict(X_train)

In [80]:
y_pred_test = knn.predict(X_test)

Train

In [81]:
r2_score(y_train, y_pred_train)

0.8692657365579515

In [82]:
mean_squared_error(y_train, y_pred_train, squared=False)

0.09855464044834011

Test

In [83]:
r2_score(y_test, y_pred_test)

0.8029904002733431

In [84]:
mean_squared_error(y_test, y_pred_test, squared=False)

0.12123105702934114

e)	Hyperparameter turning (choose whatever approach your like)


In [None]:
# trying tune-sklearn which is supposed to be faster than built-in tuners. 
from tune_sklearn import TuneSearchCV

In [88]:
# This dataset takes a long time to train, so I've limited my choices and number of folds
params = {
    'n_neighbors': list(range(1, 52, 10))
}

In [96]:
# Below can be replaced with "GridSearchCV" for classic tuning
grid = TuneSearchCV(knn, params, n_jobs=10, verbose=2, cv=4, use_gpu = True)

In [97]:
grid.fit(X_train, y_train)

Checkpointing the experiment state took 1.796 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.


The `start_trial` operation took 0.643 s, which may be a performance bottleneck.


Trial _Trainable_db558_00000 reported split0_test_score=0.8070193020045853,split1_test_score=0.8047274625036641,split2_test_score=0.8035835462868018,split3_test_score=0.8030203857801017,average_test_score=0.8045876741437882,objective=0.8045876741437882 with parameters={'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000008000000), 'y_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000009000000), 'groups': None, 'cv': KFold(n_splits=4, random_state=None, shuffle=False), 'fit_params': {}, 'scoring': {'score': <function _passthrough_scorer at 0x000002986C6F50D0>}, 'max_iters': 1, 'return_train_score': False, 'n_jobs': 1, 'metric_name': 'average_test_score', 'n_neighbors': 11, 'estimator_list': [KNeighborsRegressor(n_jobs=-1)]}. This trial completed.


The `process_trial_result` operation took 1.558 s, which may be a performance bottleneck.
Processing trial results took 1.560 s, which may be a performance bottleneck. Please consider reporting results less frequently to Ray Tune.
The `process_trial` operation took 1.561 s, which may be a performance bottleneck.
Checkpointing the experiment state took 113.195 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.
Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


The `on_step_begin` operation took 0.945 s, which may be a performance bottleneck.
The `get_next_failed_trial` operation took 0.955 s, which may be a performance bottleneck.


Trial _Trainable_db558_00002 reported split0_test_score=0.8047373708677723,split1_test_score=0.8022934609645567,split2_test_score=0.8026419798556756,split3_test_score=0.8011765557721433,average_test_score=0.8027123418650369,objective=0.8027123418650369 with parameters={'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000008000000), 'y_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000009000000), 'groups': None, 'cv': KFold(n_splits=4, random_state=None, shuffle=False), 'fit_params': {}, 'scoring': {'score': <function _passthrough_scorer at 0x000002986C6F50D0>}, 'max_iters': 1, 'return_train_score': False, 'n_jobs': 1, 'metric_name': 'average_test_score', 'n_neighbors': 21, 'estimator_list': [KNeighborsRegressor(n_jobs=-1)]}. This trial completed.


Checkpointing the experiment state took 15.020 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.
Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


Trial _Trainable_db558_00003 reported split0_test_score=0.798619731128189,split1_test_score=0.7962843298862314,split2_test_score=0.7959358894703314,split3_test_score=0.7953211430923094,average_test_score=0.7965402733942654,objective=0.7965402733942654 with parameters={'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000008000000), 'y_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000009000000), 'groups': None, 'cv': KFold(n_splits=4, random_state=None, shuffle=False), 'fit_params': {}, 'scoring': {'score': <function _passthrough_scorer at 0x000002986C6F50D0>}, 'max_iters': 1, 'return_train_score': False, 'n_jobs': 1, 'metric_name': 'average_test_score', 'n_neighbors': 41, 'estimator_list': [KNeighborsRegressor(n_jobs=-1)]}. This trial completed.


Checkpointing the experiment state took 11.359 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.
Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


Trial _Trainable_db558_00004 reported split0_test_score=0.8015592205895563,split1_test_score=0.7995203593510085,split2_test_score=0.7990281333235592,split3_test_score=0.7981037617861465,average_test_score=0.7995528687625676,objective=0.7995528687625676 with parameters={'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000008000000), 'y_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000009000000), 'groups': None, 'cv': KFold(n_splits=4, random_state=None, shuffle=False), 'fit_params': {}, 'scoring': {'score': <function _passthrough_scorer at 0x000002986C6F50D0>}, 'max_iters': 1, 'return_train_score': False, 'n_jobs': 1, 'metric_name': 'average_test_score', 'n_neighbors': 31, 'estimator_list': [KNeighborsRegressor(n_jobs=-1)]}. This trial completed.


Checkpointing the experiment state took 12.651 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.
Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


Trial _Trainable_db558_00001 reported split0_test_score=0.798619731128189,split1_test_score=0.7962843298862314,split2_test_score=0.7959358894703314,split3_test_score=0.7953211430923094,average_test_score=0.7965402733942654,objective=0.7965402733942654 with parameters={'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000008000000), 'y_id': ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000009000000), 'groups': None, 'cv': KFold(n_splits=4, random_state=None, shuffle=False), 'fit_params': {}, 'scoring': {'score': <function _passthrough_scorer at 0x000002986C6F50D0>}, 'max_iters': 1, 'return_train_score': False, 'n_jobs': 1, 'metric_name': 'average_test_score', 'n_neighbors': 41, 'estimator_list': [KNeighborsRegressor(n_jobs=-1)]}. This trial completed.


Checkpointing the experiment state took 13.801 s, which may be a performance bottleneck. Please ensure the `TUNE_GLOBAL_CHECKPOINT_S` environment variable is something significantly higher than this duration to ensure compute time is mostly spent on the main training loop.
Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


Trial Runner checkpointing failed: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\.tmp_generator' -> 'C:\\Users\\KittheKat\\ray_results\\_Trainable_2021-02-15_17-28-14\\basic-variant-state-2021-02-15_17-28-14.json'


Trial name,status,loc,n_neighbors,iter,total time (s),split0_test_score,split1_test_score,split2_test_score
_Trainable_db558_00000,TERMINATED,,11,1,301.131,0.807019,0.804727,0.803584
_Trainable_db558_00001,TERMINATED,,41,1,456.843,0.79862,0.796284,0.795936
_Trainable_db558_00002,TERMINATED,,21,1,443.254,0.804737,0.802293,0.802642
_Trainable_db558_00003,TERMINATED,,41,1,452.329,0.79862,0.796284,0.795936
_Trainable_db558_00004,TERMINATED,,31,1,453.878,0.801559,0.79952,0.799028


[2m[36m(pid=24860)[0m Windows fatal exception: access violation
[2m[36m(pid=24860)[0m 
[2m[36m(pid=24860)[0m Windows fatal exception: access violation
[2m[36m(pid=24860)[0m 


TuneSearchCV(cv=4, estimator=KNeighborsRegressor(n_jobs=-1),
             loggers=[<class 'ray.tune.logger.CSVLogger'>,
                      <class 'ray.tune.logger.JsonLogger'>],
             n_jobs=10, n_trials=5,
             param_distributions={'n_neighbors': [1, 11, 21, 31, 41]},
             scoring={'score': <function _passthrough_scorer at 0x000002986C6F50D0>},
             sk_n_jobs=1, use_gpu=True, verbose=2)

In [94]:
import joblib

In [95]:
joblib.dump(grid, 'grid_search.joblib')

['grid_search.joblib']

In [98]:
grid.best_params_

{'n_neighbors': 11}

Micro-tuning

In [111]:
# This dataset takes a long time to train, so I've limited my choices and number of folds
params = {
    'n_neighbors': [11, 12, 13, 14, 15, 16]
}

In [112]:
microgrid = GridSearchCV(KNeighborsRegressor(), params, n_jobs=-1, verbose=2, cv=4)

In [113]:
microgrid.fit(X_train, y_train)

Fitting 4 folds for each of 6 candidates, totalling 24 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   6 out of  24 | elapsed:  5.3min remaining: 16.0min
[Parallel(n_jobs=-1)]: Done  19 out of  24 | elapsed:  8.6min remaining:  2.3min
[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:  8.8min finished


GridSearchCV(cv=4, estimator=KNeighborsRegressor(), n_jobs=-1,
             param_grid={'n_neighbors': [11, 12, 13, 14, 15, 16]}, verbose=2)

In [114]:
joblib.dump(microgrid, 'microgrid_search.joblib')

['microgrid_search.joblib']

In [115]:
microgrid.best_params_

{'n_neighbors': 13}

f)	Use the best or optimal parameter values to build a model, then compute the accuracy score for your estimator. 


In [116]:
knn_grid = microgrid.best_estimator_

In [117]:
y_pred_train = knn_grid.predict(X_train)
y_pred_test = knn_grid.predict(X_test)

Training

In [118]:
r2_score(y_train, y_pred_train)

0.8380230790180042

In [119]:
mean_squared_error(y_train, y_pred_train, squared=False)

0.10970058290931115

Test

In [120]:
r2_score(y_test, y_pred_test)

0.8112129507922109

In [121]:
mean_squared_error(y_test, y_pred_test, squared=False)

0.11867419561831806

Discuss about overfitting for the model 


Given the most recent (KNN Grid) test scores, I do not believe this model is overfit. While the training RMSE and R2 have gone down from the out-of-the-box model, the test scores have improved slightly. The gap between test and training scores is not enough significant at this point to make a determination that the model is overfit.

Overall, I was very impressed with this model. It is wildly computationally intensive, but I'm surprised that the model predicted so well using non-text (no artist name, etc.) features and without access to actual mp3/song data files. 