Final Project Machine Learning Course

Spotify dataset classification

Dataset link

Mark: 96

Code

Our main project is in the spotify_classification.ipynb notebook, also see the utils.py file (for our utils functions).

Goal

The goal of this project is to train machine learning models (supervised) that will classified the popularity of a spotify song to three classes:

high popular
medium popular
non popular

Dataset:

Features:

acousticness (Ranges from 0 to 1)
artists (List of artists mentioned)
danceability (Ranges from 0 to 1)
duration_ms (Integer typically ranging from 200k to 300k)
energy (Ranges from 0 to 1)
explicit (0 = No explicit content, 1 = Explicit content) - Categorical.
id (Id of track generated by Spotify) - Numerical.
id_artists.
instrumentalness (Ranges from 0 to 1).
key (All keys on octave encoded as values ranging from 0 to 11, starting on C as 0, C# as 1 and so on…).
liveness (Ranges from 0 to 1).
loudness (Float typically ranging from -60 to 0).
mode (0 = Minor, 1 = Major).
name (Name of the song).
popularity (Ranges from 0 to 100).
release_date (Date of release mostly in yyyy-mm-dd format, however precision of date may vary).
speechiness (Ranges from 0 to 1).
tempo (Float typically ranging from 50 to 150).
time_signature.
valence (Ranges from 0 to 1).

Correlations between popularity and others features

NOTE: during the ordering of the data we applay the popularity for classification to be in the following format:

class	real value	class value
high popular	70 <= x	2
medium popular	40 <= x < 70	1
non popular	x < 40	0

In order to see the distribution between the number of popularitry classes (unbalanced number of features in data):

Models result (unbalanced number of features in data):

model	accuracy
KNeighbors Classifier	74.20 %
Logistic Regression	72.32 %
XGB Classifier	77.74 %
MLP Classifier	70.82 %

Models result (balanced number of features in data):

model	accuracy
KNeighbors Classifier	59.35 %
Logistic Regression	60.06 %
XGB Classifier	65.41 %
MLP Classifier	64.16 %

Models result between regular and balanced data trained (shown in the notebook):

Conclusions:

It's very difficult to precdict popularity of spotify tracks with the data we have in our data set, Even after we cleaned & normalized our data, and creation balanced and non-balanced training data for our models, We still see that the accuracy of our models is moderate.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
spotify_classification.ipynb		spotify_classification.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project Machine Learning Course

Dataset link

Code

Goal

Dataset:

Correlations between popularity and others features

NOTE: during the ordering of the data we applay the popularity for classification to be in the following format:

In order to see the distribution between the number of popularitry classes (unbalanced number of features in data):

Models result (unbalanced number of features in data):

Models result (balanced number of features in data):

Models result between regular and balanced data trained (shown in the notebook):

Conclusions:

About

Languages

License

kggold4/final-project-machine-learning

Folders and files

Latest commit

History

Repository files navigation

Final Project Machine Learning Course

Dataset link

Code

Goal

Dataset:

Correlations between popularity and others features

NOTE: during the ordering of the data we applay the popularity for classification to be in the following format:

In order to see the distribution between the number of popularitry classes (unbalanced number of features in data):

Models result (unbalanced number of features in data):

Models result (balanced number of features in data):

Models result between regular and balanced data trained (shown in the notebook):

Conclusions:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages