# Classification of songs based on mood
This notebook is used to classify songs based on mood, using most of the metrics Spotify provides for each song.
The data used to train the model were found in [this repository](https://github.com/cristobalvch/Spotify-Machine-Learning/blob/master/data/data_moods.csv).
The data is from the warehouse we created and was exported to an Excel file called songs_warehouse.xlsx.

## Import libraries

In [1]:
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

## Load the data

In [2]:
# Load the data
df = pd.read_csv('data/data_moods.csv')
df.head(100)

Unnamed: 0,name,album,artist,id,release_date,popularity,length,danceability,acousticness,energy,instrumentalness,liveness,valence,loudness,speechiness,tempo,key,time_signature,mood
0,1999,1999,Prince,2H7PHVdQ3mXqEHXcvclTB0,1982-10-27,68,379266,0.866,0.13700,0.730,0.000000,0.0843,0.6250,-8.201,0.0767,118.523,5,4,Happy
1,23,23,Blonde Redhead,4HIwL9ii9CcXpTOTzMq0MP,2007-04-16,43,318800,0.381,0.01890,0.832,0.196000,0.1530,0.1660,-5.069,0.0492,120.255,8,4,Sad
2,9 Crimes,9,Damien Rice,5GZEeowhvSieFDiR8fQ2im,2006-11-06,60,217946,0.346,0.91300,0.139,0.000077,0.0934,0.1160,-15.326,0.0321,136.168,0,4,Sad
3,99 Luftballons,99 Luftballons,Nena,6HA97v4wEGQ5TUClRM0XLc,1984-08-21,2,233000,0.466,0.08900,0.438,0.000006,0.1130,0.5870,-12.858,0.0608,193.100,4,4,Happy
4,A Boy Brushed Red Living In Black And White,They're Only Chasing Safety,Underoath,47IWLfIKOKhFnz1FUEUIkE,2004-01-01,60,268000,0.419,0.00171,0.932,0.000000,0.1370,0.4450,-3.604,0.1060,169.881,1,4,Energetic
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Clear Skies,Clear Skies,Dhyana Thomas,7bKMZYXVYzGMvnHiRDCmjy,2019-04-26,51,162085,0.421,0.94000,0.136,0.890000,0.1130,0.0505,-23.521,0.0305,110.036,11,4,Calm
96,Click Click Boom,Every Six Seconds,Saliva,1LMVGL3030W3mGmRrd2hCm,2001-01-01,0,252400,0.607,0.00259,0.935,0.000221,0.1950,0.5100,-2.918,0.0506,95.970,6,4,Energetic
97,Coda,Endeavour,Jakob Ahlbom,2TFQPFUqRtgj1auq9b5PlR,2019-10-04,58,180614,0.351,0.93000,0.240,0.904000,0.1010,0.0968,-15.363,0.0340,79.919,9,3,Calm
98,Cold Arms,Wilder Mind,Mumford & Sons,7kpZ9isu48poYKpaAb0wiR,2015-05-01,0,169906,0.431,0.81500,0.144,0.000001,0.1240,0.0491,-12.070,0.0390,137.683,11,4,Sad


In [3]:
# read the data from songs_warehouse.xlsx
df2 = pd.read_excel('data/songs_warehouse.xlsx', sheet_name='song_dimension')
artistID = pd.read_excel('data/songs_warehouse.xlsx', sheet_name='artist_of_song')

## Drop unnecessary columns

In [4]:
# drop columns label_key, label_mode, label_time_signature, label_duration_ms
df2 = df2.drop(['label_key', 'label_mode', 'label_time_signature', 'label_duration_ms'], axis=1)
df2.head()

Unnamed: 0,id_song,label_spotify_id,label_name,label_is_explicit,label_album_name,label_album_release_date,label_danceability,label_energy,label_loudness,label_speechiness,label_acousticness,label_instrumentalness,label_liveness,label_valence,label_tempo
0,42,03Dpt8Z4Zww4NGJb8503zb,Do They Know It's Christmas? - 2014,False,Pop Christmas Songs,2018-11-09,0.626,0.541,-7.615,0.0308,0.352,0.0,0.119,0.255,112.000999
1,128,0a0zPUrwviAua4IhhaYUsP,Ajándék,False,Duett Karácsony,2009-01-01,0.668,0.864,-4.404,0.0343,0.0473,0.0,0.105,0.593,102.014
2,182,0cVyQfDyRnMJ0V3rjjdlU3,Lil Boo Thang,False,Lil Boo Thang,2023-08-18,0.85,0.699,-3.292,0.0776,0.152,0.0,0.32,0.915,114.481003
3,270,0gq4UgDPGFdqpsWshU7dmv,Vanavond (Uit M'n Bol),False,Vanavond (Uit M'n Bol),2022-03-25,0.799,0.705,-7.582,0.0698,0.0554,0.0,0.426,0.88,106.978996
4,289,0hI4TphLTs4ar0mQ8t0dLf,Мой счастливый билет,False,Мой счастливый билет,2023-02-24,0.783,0.524,-6.772,0.13,0.633,0.0,0.189,0.47,110.035004


In [5]:
# rename the columns to match the ones in df
df2 = df2.rename(columns={'label_danceability': 'danceability', 'label_energy': 'energy', 'label_speechiness': 'speechiness', 'label_acousticness': 'acousticness', 'label_instrumentalness': 'instrumentalness', 'label_liveness': 'liveness', 'label_valence': 'valence', 'label_tempo': 'tempo', 'label_loudness': 'loudness'})

# rearrange the columns
df2 = df2[['id_song', 'label_spotify_id', 'label_name', 'label_is_explicit', 'label_album_name','label_album_release_date', 'danceability', 'acousticness', 'energy', 'instrumentalness','liveness', 'valence', 'loudness', 'speechiness','tempo']]

df2.head()

Unnamed: 0,id_song,label_spotify_id,label_name,label_is_explicit,label_album_name,label_album_release_date,danceability,acousticness,energy,instrumentalness,liveness,valence,loudness,speechiness,tempo
0,42,03Dpt8Z4Zww4NGJb8503zb,Do They Know It's Christmas? - 2014,False,Pop Christmas Songs,2018-11-09,0.626,0.352,0.541,0.0,0.119,0.255,-7.615,0.0308,112.000999
1,128,0a0zPUrwviAua4IhhaYUsP,Ajándék,False,Duett Karácsony,2009-01-01,0.668,0.0473,0.864,0.0,0.105,0.593,-4.404,0.0343,102.014
2,182,0cVyQfDyRnMJ0V3rjjdlU3,Lil Boo Thang,False,Lil Boo Thang,2023-08-18,0.85,0.152,0.699,0.0,0.32,0.915,-3.292,0.0776,114.481003
3,270,0gq4UgDPGFdqpsWshU7dmv,Vanavond (Uit M'n Bol),False,Vanavond (Uit M'n Bol),2022-03-25,0.799,0.0554,0.705,0.0,0.426,0.88,-7.582,0.0698,106.978996
4,289,0hI4TphLTs4ar0mQ8t0dLf,Мой счастливый билет,False,Мой счастливый билет,2023-02-24,0.783,0.633,0.524,0.0,0.189,0.47,-6.772,0.13,110.035004


## Data selection

In [6]:
# chose the columns to use as features
col_features = df.columns[7:-3]
print(col_features)

Index(['danceability', 'acousticness', 'energy', 'instrumentalness',
       'liveness', 'valence', 'loudness', 'speechiness', 'tempo'],
      dtype='object')


## Data preprocessing

In [7]:
# show the number of songs for each mood
df['mood'].value_counts()

Sad          197
Calm         195
Energetic    154
Happy        140
Name: mood, dtype: int64

## Encode the mood column

In [8]:
# encode the mood column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['mood'] = le.fit_transform(df['mood'])
# show the encoded values corresponding to each mood
le.classes_

array(['Calm', 'Energetic', 'Happy', 'Sad'], dtype=object)

## Normalize the data and split into train and test

In [9]:
# normalize the data
scaler = StandardScaler()
df[col_features] = scaler.fit_transform(df[col_features])
df2[col_features] = scaler.fit_transform(df2[col_features])

# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(df[col_features], df['mood'], test_size=0.2, random_state=0)

## Create and train the model

In [10]:
clf = svm.SVC(gamma=0.001, C=1000)
# fit the model
clf.fit(X_train, y_train)

## Find the accuracy of the model

In [11]:
# find the accuracy
accuracy = clf.score(X_test, y_test)
print('The model has an accuracy of:', round(accuracy*100, 2), '%')

The model has an accuracy of: 84.06 %


In [12]:
# show analytic report of the model
from sklearn.metrics import classification_report
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.97      1.00      0.99        36
           1       0.72      0.85      0.78        27
           2       0.73      0.59      0.66        32
           3       0.88      0.88      0.88        43

    accuracy                           0.84       138
   macro avg       0.83      0.83      0.83       138
weighted avg       0.84      0.84      0.84       138


As expected, the model has a satisfactory accuracy, with the worst f1-score being 0.66 in the 'happy' songs since there are fewer songs of this mood in the data.

## Predict the mood of the songs of our warehouse

In [13]:
# create a new column mood in df2 with the predicted values
df2['mood'] = clf.predict(df2[df2.columns[6:]])
# decode the encoded values
df2['mood'] = le.inverse_transform(df2['mood'])

In [14]:
# join artistID and df2 on id_song
df2 = df2.join(artistID.set_index('song_id'), on='id_song')
# keep only the first instance of each song
df2 = df2.drop_duplicates(subset='id_song', keep='first')
# show only the columns label_name, label_album_name, mood
df2 = df2[['label_name', 'label_album_name', 'mood']]
df2

Unnamed: 0,label_name,label_album_name,mood
0,Do They Know It's Christmas? - 2014,Pop Christmas Songs,Sad
1,Ajándék,Duett Karácsony,Energetic
2,Lil Boo Thang,Lil Boo Thang,Happy
3,Vanavond (Uit M'n Bol),Vanavond (Uit M'n Bol),Happy
4,Мой счастливый билет,Мой счастливый билет,Sad
...,...,...,...
5493,Melancolia,X Amor,Sad
5494,Desi Kalakaar,Desi Kalakaar,Happy
5495,Шукав тебе. Знайшов тебе,Шукав тебе. Знайшов тебе,Calm
5496,Thath'Indawo (Live),"Spirit of Praise, Vol. 8 (Live)",Sad
