#Spotify Song Recommendation


The problem we are aiming to solve is classifying songs and providing song recommendations based on the user’s liked and disliked songs on Spotify. In order for us to implement this recommendation algorithm, we will need to follow a series of steps. First, the user will need to input a list of their liked songs, as well as a list of their disliked songs. The model will be trained based on the audio features of the user’s selected songs, and will then predict whether the user will like a given song or not. In order to implement this training and recommendation algorithm, we will use the Scikit-learn toolkit. This toolkit provides a multitude of algorithms that can be easily integrated into our training model. We will begin by using a Binary Classifier to provide individual song recommendations for each of the five members of our group. It will classify a song based on whether an individual user will like or dislike it.

In this notebook, we will be importing each group member's audio features for their liked and disliked songs.  Next, we will create and train a model for each member based on their preferences.  Lastly, we will use these models to predict user preference (like or dislike) on a group of test songs

# Prerequisites

First, we imported all the relevant libraries that we plan to use for our project.

In [None]:
import pandas as pd
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import MaxAbsScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVR
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import MaxAbsScaler
from sklearn.preprocessing import PowerTransformer
from sklearn.preprocessing import RobustScaler
import numpy as np
from sklearn.linear_model import LinearRegression

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#Data Set

## Data Preparation

In this section, we imported data from our drive that we previously prepared.  The preparation and preprocessing of the data can be found in this notebook:
https://colab.research.google.com/drive/1xnEQLrNbTqDThdCqaXgDgdr_73w0fVmk#scrollTo=PuVZ5vWiIrNJ

To summarize our data extraction, we used the Spotify API and Spotipy library to extract audio features for songs in 3 different playlists: each member's "likes", "dislikes", and "test". To do so, we established our credentials with Spotify, imported our playlist data, and then finally went through every song in the playlist to retrieve their audio features. 

We will first demonstrate how we prepared data for a single user, Kelly.  After that, we will import all the other group member's data already which has already been prepared. 


### Preparing Data for a Single User

First, we imported Kelly's audio features for her "liked" songs and added the column "Kelly's Preference", filling each row with "likes".  Next, we imported Kelly's audio features for her "disliked" songs and added the column "Kelly's Preference", filing each row with "DISLIKES".  The reason the class names are inconsistent is so that the output is easier to view ("likes" and "dislikes" look too simliar).  We then appended the two dataframes together.


In [None]:
path = "drive/My Drive/Colab Notebooks/playlist_song_features.csv"
df_kelly_likes = pd.read_csv(path)
df_kelly_likes = df_kelly_likes.drop(columns=['Unnamed: 0'])
df_kelly_likes["Kelly Preference"] = "likes"

#this takes the last 400 of my likes, so that i have around the same number 
#of liked and disliked songs
df_kelly_part = df_kelly_likes[0:400]

In [None]:
path = "drive/My Drive/Colab Notebooks/disliked_features.csv"
df_kelly_disliked = pd.read_csv(path)
df_kelly_disliked = df_kelly_disliked.drop(columns=['Unnamed: 0'])
df_kelly_disliked["Kelly Preference"] = "DISLIKES"

In [None]:
#adding my liked and disliked songs into one dataframe
df_kelly = df_kelly_part.append(df_kelly_disliked, ignore_index=True)
df_kelly

Unnamed: 0,track.name,track.id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature,Kelly Preference
0,It Goes In Waves,6vNUlpx3Lxy3Ilr61kFkC8,0.795,0.645,2,-7.589,0,0.0871,0.239000,0.043600,0.3400,0.353,129.971,audio_features,6vNUlpx3Lxy3Ilr61kFkC8,spotify:track:6vNUlpx3Lxy3Ilr61kFkC8,https://api.spotify.com/v1/tracks/6vNUlpx3Lxy3...,https://api.spotify.com/v1/audio-analysis/6vNU...,214154,4,likes
1,Sunburn - Reimagined,0i27kJRbxmdzQzhVDJVgzO,0.828,0.690,8,-4.723,1,0.0338,0.011600,0.261000,0.1140,0.495,105.996,audio_features,0i27kJRbxmdzQzhVDJVgzO,spotify:track:0i27kJRbxmdzQzhVDJVgzO,https://api.spotify.com/v1/tracks/0i27kJRbxmdz...,https://api.spotify.com/v1/audio-analysis/0i27...,247530,4,likes
2,Sweet,3vA6H5yARRohQkpcHKjZN9,0.662,0.766,9,-5.941,0,0.0448,0.014900,0.004010,0.1160,0.638,113.316,audio_features,3vA6H5yARRohQkpcHKjZN9,spotify:track:3vA6H5yARRohQkpcHKjZN9,https://api.spotify.com/v1/tracks/3vA6H5yARRoh...,https://api.spotify.com/v1/audio-analysis/3vA6...,237176,4,likes
3,Figure A (NASAYA Remix),5COquaK9Wx28EMPLydTVPI,0.744,0.450,0,-6.522,1,0.0573,0.285000,0.000013,0.0908,0.651,94.038,audio_features,5COquaK9Wx28EMPLydTVPI,spotify:track:5COquaK9Wx28EMPLydTVPI,https://api.spotify.com/v1/tracks/5COquaK9Wx28...,https://api.spotify.com/v1/audio-analysis/5COq...,210798,4,likes
4,Only One,6ZILYi8SaRLbGIdgej1WIA,0.393,0.542,1,-7.254,1,0.3070,0.291000,0.000000,0.0966,0.579,178.318,audio_features,6ZILYi8SaRLbGIdgej1WIA,spotify:track:6ZILYi8SaRLbGIdgej1WIA,https://api.spotify.com/v1/tracks/6ZILYi8SaRLb...,https://api.spotify.com/v1/audio-analysis/6ZIL...,208120,4,likes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
737,Casey Jones - 2013 Remaster,7LbfuQVct78YoghmoPtsQ8,0.671,0.405,0,-10.052,1,0.0392,0.385000,0.000000,0.1320,0.828,99.678,audio_features,7LbfuQVct78YoghmoPtsQ8,spotify:track:7LbfuQVct78YoghmoPtsQ8,https://api.spotify.com/v1/tracks/7LbfuQVct78Y...,https://api.spotify.com/v1/audio-analysis/7Lbf...,265067,4,DISLIKES
738,Born in the U.S.A.,0dOg1ySSI7NkpAe89Zo0b9,0.398,0.952,4,-6.042,1,0.0610,0.000373,0.000077,0.1000,0.584,122.093,audio_features,0dOg1ySSI7NkpAe89Zo0b9,spotify:track:0dOg1ySSI7NkpAe89Zo0b9,https://api.spotify.com/v1/tracks/0dOg1ySSI7Nk...,https://api.spotify.com/v1/audio-analysis/0dOg...,278680,4,DISLIKES
739,London Calling - Remastered,5jzma6gCzYtKB1DbEwFZKH,0.651,0.801,0,-7.340,1,0.0513,0.123000,0.000000,0.0825,0.776,133.763,audio_features,5jzma6gCzYtKB1DbEwFZKH,spotify:track:5jzma6gCzYtKB1DbEwFZKH,https://api.spotify.com/v1/tracks/5jzma6gCzYtK...,https://api.spotify.com/v1/audio-analysis/5jzm...,200480,4,DISLIKES
740,Give A Little Bit,6XUHsYE38CEbYunT983O9G,0.531,0.818,2,-5.358,1,0.0452,0.069400,0.009600,0.2630,0.471,90.767,audio_features,6XUHsYE38CEbYunT983O9G,spotify:track:6XUHsYE38CEbYunT983O9G,https://api.spotify.com/v1/tracks/6XUHsYE38CEb...,https://api.spotify.com/v1/audio-analysis/6XUH...,248173,4,DISLIKES


###Retrieving Each Member's Data

Now that we've demonstrated how the data was prepared for a single member, we will import the prepared data for the rest of the members. 

In [None]:
#getting file paths for each member
path1 = "drive/My Drive/Colab Notebooks/480 Final Project Notebooks/ritu_playlist_songs.csv"
path2 = "drive/My Drive/Colab Notebooks/480 Final Project Notebooks/df_nikita.csv"
path3 = "drive/My Drive/Colab Notebooks/480 Final Project Notebooks/riti_playlist_songs.csv"
path4 = "drive/My Drive/Colab Notebooks/480 Final Project Notebooks/df_kaile.csv"

#reading csv files to dataframes
df_ritu = pd.read_csv(path1)
df_nikita = pd.read_csv(path2)
df_riti = pd.read_csv(path3)
df_kaile = pd.read_csv(path4)

#cleaning up the dataframe columns
df_ritu = df_ritu.drop(columns=['Unnamed: 0'])
df_nikita = df_nikita.drop(columns=['Unnamed: 0'])
df_riti = df_riti.drop(columns=['Unnamed: 0'])
df_kaile = df_kaile.drop(columns=['Unnamed: 0'])

df_kaile

## Get Training Data

For the training data, we will be using the following traits: danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo duration_ms, and time signature. 

Each model will use these traits to determine the preference type for their respective user.

In [None]:
#KELLY

X_train = df_kelly[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

y_train = df_kelly['Kelly Preference']


In [None]:
#RITU

X_train2 = df_ritu[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

y_train2 = df_ritu['Ritu Preference']

In [None]:
#NIKITA

X_train3 = df_nikita[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

y_train3 = df_nikita['Nikita Preference']

In [None]:
#RITI

X_train4 = df_riti[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

y_train4 = df_riti['Riti Preference']

In [None]:
#KAILE

X_train5 = df_kaile[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

y_train5 = df_kaile['Kaile Preference']

# Model

## Model Selection

In order to determine the best classifier and scaler to use, we created a list of classifiers as well as a list of scalers, and created a model using every possible combination of the two.  For every model, we calculated the F1 score, which indicates which classifier/scaler combo performed the best. We only used a single member's data for a training data for simplicity. 

In [None]:
classifiers = [
    KNeighborsClassifier(10),
    SVC(kernel="linear", C=0.025),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    MLPClassifier(alpha=1, max_iter=1000),
    AdaBoostClassifier(),
    GaussianNB()]

scalers = [
           StandardScaler(with_mean = False),
           Normalizer(),
           MaxAbsScaler(),
           RobustScaler(with_centering=False)
]

ct = make_column_transformer((OneHotEncoder(), ['key' ,"mode", "time_signature"]), remainder = 'passthrough')


In [None]:
for classifier in classifiers:

    for scaler in scalers:

        testmodel = make_pipeline(
            ct,
            scaler,
            classifier #so far random forest has done the best
        )

        print('Classifier: ' + str(classifier))
        print('Scaler: ' + str(scaler))

        testmodel.fit(X_train, y_train)
        
        is_popular = (y_train == "likes")
        precision = cross_val_score(testmodel, X_train, is_popular, 
                                    cv=10, scoring="precision").mean()
        recall = cross_val_score(testmodel, X_train, is_popular, 
                                cv=10, scoring="recall").mean()
        accuracy = cross_val_score(testmodel, X_train, is_popular, 
                                cv=10, scoring="accuracy").mean()

        f1_score = 2 * (precision * recall) / (precision + recall)
        print('F1 Score: ' + str(f1_score))

        print('\n')

Classifier: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=10, p=2,
                     weights='uniform')
Scaler: StandardScaler(copy=True, with_mean=False, with_std=True)
F1 Score: 0.7137317088344713


Classifier: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=10, p=2,
                     weights='uniform')
Scaler: Normalizer(copy=True, norm='l2')
F1 Score: 0.7007382478225342


Classifier: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=10, p=2,
                     weights='uniform')
Scaler: MaxAbsScaler(copy=True)
F1 Score: 0.7188438943584993


Classifier: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=10, p=2,
        

## Model Design & Training

Here, we created a pipeline for each model.  We passed in a column transformer, using OneHotEncoder() to account for categorical variables such as key, mode, and time signature. Next, we passed in a scaler - StandardScaler, to scale the data.  Lastly, we passed the best performing Classifier into the pipeline - RandomForestClassifier.

In [None]:
# Kelly's Model


ct = make_column_transformer((OneHotEncoder(), ['key' ,"mode", "time_signature"]), remainder = 'passthrough')

#make the pipeline, pass in the column transformer, a data scaler, and a classifier
model_kelly = make_pipeline(
    ct,
    StandardScaler(with_mean=False),
    RandomForestClassifier() #so far random forest has done the best
)

#fit the model with the training data
model_kelly.fit(X_train, y_train)


In [None]:
#Ritu's Model

model_ritu = make_pipeline(
    ct,
    StandardScaler(with_mean=False),
    RandomForestClassifier()
)

model_ritu.fit(X_train2, y_train2)

In [None]:
#Nikita's Model

model_nikita = make_pipeline(
    ct,
    StandardScaler(with_mean=False),
    RandomForestClassifier()
)

model_nikita.fit(X_train3, y_train3)

In [None]:
#Riti's Model

model_riti = make_pipeline(
    ct,
    StandardScaler(with_mean=False),
    RandomForestClassifier()
)

model_riti.fit(X_train4, y_train4)

In [None]:
#Kaile's Model

model_kaile = make_pipeline(
    ct,
    StandardScaler(with_mean=False),
    RandomForestClassifier()
)

model_kaile.fit(X_train5, y_train5)

#Model Evaluation

We fit each user's model with their respective training data and used cross validation with 10 folds. We determined precision, recall, accuracy, and F1 score for each user's model. 

###Kelly's Model

In [None]:
print("Kelly's Model Evaluation:")
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

is_popular = (y_train == "likes")

precision = cross_val_score(model_kelly, X_train, is_popular, 
                            cv=10, scoring="precision").mean()
recall = cross_val_score(model_kelly, X_train, is_popular, 
                         cv=10, scoring="recall").mean()

accuracy = cross_val_score(model_kelly, X_train, is_popular, 
                         cv=10, scoring="accuracy").mean()

f1_score = 2 * (precision * recall) / (precision + recall)


print("Precision:   " + str(precision))
print("Recall:      " + str(recall))
print("Accuracy:    " + str(accuracy))
print("F1 Score:    " + str(f1_score))

Kelly's Model Evaluation:
Precision:   0.8043057649301095
Recall:      0.8424999999999999
Accuracy:    0.79390990990991
F1 Score:    0.822959964537622


###Ritu's Model

In [None]:
print("Ritu's Model Evaluation:")

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

is_popular = (y_train2 == "likes")

precision = cross_val_score(model_ritu, X_train2, is_popular, 
                            cv=10, scoring="precision").mean()
recall = cross_val_score(model_ritu, X_train2, is_popular, 
                         cv=10, scoring="recall").mean()

accuracy = cross_val_score(model_ritu, X_train2, is_popular, 
                         cv=10, scoring="accuracy").mean()

f1_score = 2 * (precision * recall) / (precision + recall)

print("Precision:   " + str(precision))
print("Recall:      " + str(recall))
print("Accuracy:    " + str(accuracy))
print("F1 Score:    " + str(f1_score))

Ritu's Model Evaluation:
Precision:   0.8047641670257004
Recall:      0.785
Accuracy:    0.79125
F1 Score:    0.7947592280899132


###Nikita's Model

In [None]:
print("Nikita's Model Evaluation:")

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

is_popular = (y_train3 == "likes")

precision = cross_val_score(model_nikita, X_train3, is_popular, 
                            cv=10, scoring="precision").mean()
recall = cross_val_score(model_nikita, X_train3, is_popular, 
                         cv=10, scoring="recall").mean()

accuracy = cross_val_score(model_nikita, X_train3, is_popular, 
                         cv=10, scoring="accuracy").mean()

f1_score = 2 * (precision * recall) / (precision + recall)

print("Precision:   " + str(precision))
print("Recall:      " + str(recall))
print("Accuracy:    " + str(accuracy))
print("F1 Score:    " + str(f1_score))

Nikita's Model Evaluation:
Precision:   0.7928691293451019
Recall:      0.7849999999999999
Accuracy:    0.7825
F1 Score:    0.7889149422604324


###Riti's Model

In [None]:
print("Riti's Model Evaluation:")

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

is_popular = (y_train4 == "likes")

precision = cross_val_score(model_riti, X_train4, is_popular, 
                            cv=10, scoring="precision").mean()
recall = cross_val_score(model_riti, X_train4, is_popular, 
                         cv=10, scoring="recall").mean()

accuracy = cross_val_score(model_riti, X_train4, is_popular, 
                         cv=10, scoring="accuracy").mean()

f1_score = 2 * (precision * recall) / (precision + recall)

print("Precision:   " + str(precision))
print("Recall:      " + str(recall))
print("Accuracy:    " + str(accuracy))
print("F1 Score:    " + str(f1_score))

Riti's Model Evaluation:
Precision:   0.7664225583385735
Recall:      0.7424999999999999
Accuracy:    0.745
F1 Score:    0.7542716442558512


In [None]:
print("Kaile's Model Evaluation:")

from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

is_popular = (y_train5 == "likes")

precision = cross_val_score(model_kaile, X_train5, is_popular, 
                            cv=10, scoring="precision").mean()
recall = cross_val_score(model_kaile, X_train5, is_popular, 
                         cv=10, scoring="recall").mean()

accuracy = cross_val_score(model_kaile, X_train5, is_popular, 
                         cv=10, scoring="accuracy").mean()

f1_score = 2 * (precision * recall) / (precision + recall)

print("Precision:   " + str(precision))
print("Recall:      " + str(recall))
print("Accuracy:    " + str(accuracy))
print("F1 Score:    " + str(f1_score))

Kaile's Model Evaluation:
Precision:   0.8089268465334112
Recall:      0.86
Accuracy:    0.7929166666666667
F1 Score:    0.8336819429368637


# Model Validation

## Predicting on Test Data

For our testing data, we have audio features for around 150 songs (that were not used in the training set).

In [None]:
path = "drive/My Drive/Colab Notebooks/test_features.csv"
df_test_tracks = pd.read_csv(path)
df_test_tracks = df_test_tracks.drop(columns=['Unnamed: 0'])
df_test_tracks

In [None]:
#all the traits from the test tracks that we'll test on
X_test = df_test_tracks[['danceability', 'energy', 'key', 'loudness', 'mode', 
                         'speechiness', 'acousticness', 'instrumentalness', 
                         'liveness', 'valence', 'tempo', 'duration_ms', 
                         'time_signature']]

We used the model to predict on the test data, and then added the results from each model as a new column "Predicted Preference" to our test track dataframe.

In [None]:
#for every song in the test tracks, it predicts whether it's more simliar to
#my dislikes or likes

kelly_results = model_kelly.predict(X_test)
ritu_results = model_ritu.predict(X_test)
nikita_results = model_nikita.predict(X_test)
riti_results = model_riti.predict(X_test)
kaile_results = model_kaile.predict(X_test)

In [None]:
df_test_tracks["Kelly Predicted Preference"] = kelly_results
df_test_tracks["Ritu Predicted Preference"] = ritu_results
df_test_tracks["Nikita Predicted Preference"] = nikita_results
df_test_tracks["Riti Predicted Preference"] = riti_results
df_test_tracks["Kaile Predicted Preference"] = kaile_results
df_test_tracks

In [None]:
df_predictions = df_test_tracks[["track.name", "Kelly Predicted Preference", "Ritu Predicted Preference","Nikita Predicted Preference","Riti Predicted Preference","Kaile Predicted Preference"]]
df_predictions

Unnamed: 0,track.name,Kelly Predicted Preference,Ritu Predicted Preference,Nikita Predicted Preference,Riti Predicted Preference,Kaile Predicted Preference
0,Beer Can’t Fix,DISLIKES,likes,DISLIKES,DISLIKES,DISLIKES
1,Cheatin’ Songs,DISLIKES,likes,DISLIKES,DISLIKES,likes
2,Break Things,DISLIKES,DISLIKES,DISLIKES,DISLIKES,DISLIKES
3,She's Mine,DISLIKES,DISLIKES,DISLIKES,DISLIKES,DISLIKES
4,I Hope You’re Happy Now,likes,DISLIKES,DISLIKES,DISLIKES,DISLIKES
...,...,...,...,...,...,...
148,Like Mike,DISLIKES,likes,DISLIKES,likes,likes
149,Scares,likes,DISLIKES,likes,DISLIKES,DISLIKES
150,I'm Bad,likes,likes,DISLIKES,likes,likes
151,Believe I'm Leaving,likes,DISLIKES,DISLIKES,likes,likes


In [None]:
df_predictions.to_csv("drive/My Drive/Colab Notebooks/480 Final Project Notebooks/predictions.csv")

#Conclusion


Overall, we are trying to solve is classifying songs based on a user’s liked and disliked songs on Spotify. In order to implement song classification, we decided to use a Binary Classifier, which labels data with one of two tags. In our project, the two tags represent liking a song or disliking a song. We explored a multitude of binary classifiers, specifically K-Nearest Neighbor, MLP, and Random Forest. The toolkit that was crucial to our implementation of Binary Classification is Scikit-learn, which is Python’s machine learning library. Since Scikit-learn provides various algorithms to use, we were able to train our model with each of the classifiers within Binary Classification to understand which was the best suited for our project. Our investigation from utilizing Scikit-Learn revealed that the Random Forest Classifier was the best method for this task due to its high accuracy rate. Our exploration of Scikit-learn and Binary Classification proved to be very helpful for our overall project. We were successful in implementing these tools and our model accurately predicts liked and disliked songs.