# Multiclass classification to predict Genre of the song
The way this classification works is that:

1. Essentially, the songs are registered with their artists on spotify. Spotify does not provide a genre database for each song. So with this multiclassification, we will use the artist genre as the **Target Value** of the Song's prediction in the latter.
2. Should the song have more than 1 artist which is registered with more than 1 genre, the song will be duped into several values, with different target values from each genre. This is to train the model to understand the scope of such genre better.

## Procedure
In this multiclass classification, we will use several algorithm: 
- Random Forest
- Gradient Boosting
- SVM

In [7]:
# Loading the Data
import pandas as pd
from configparser import ConfigParser

parser = ConfigParser()
_ = parser.read('notebook.cfg')

main_df = pd.read_csv(parser.get('database', 'main_database'))
main_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 897 entries, 0 to 896
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   songID            897 non-null    object 
 1   songTitle         897 non-null    object 
 2   SongArtistTitle   897 non-null    object 
 3   popularity        897 non-null    int64  
 4   danceability      897 non-null    float64
 5   energy            897 non-null    float64
 6   key               897 non-null    int64  
 7   loudness          897 non-null    float64
 8   mode              897 non-null    int64  
 9   speechiness       897 non-null    float64
 10  acousticness      897 non-null    float64
 11  instrumentalness  897 non-null    float64
 12  liveness          897 non-null    float64
 13  valence           897 non-null    float64
 14  tempo             897 non-null    float64
 15  type              897 non-null    object 
 16  id                897 non-null    object 
 1

In [59]:
# Features
X_values = main_df.iloc[:, 4:]

# Target Space
y_value = main_df.iloc[:, 2]
y_value

0          ['reggaeton', 'trap latino', 'urbano latino']
1                                  ['pop', 'r&b', 'rap']
2      ['afrobeats', 'nigerian pop', 'pop', 'post-tee...
3                                                ['pop']
4      ['pop', 'r&b', 'conscious hip hop', 'hip hop',...
                             ...                        
892                     ['anime', 'anime rock', 'j-pop']
893                                                   []
894    ['j-acoustic', 'j-pop', 'japanese singer-songw...
895              ['j-rock', 'japanese alternative rock']
896                        ['anime', 'j-pop girl group']
Name: SongArtistTitle, Length: 897, dtype: object

# Genre Divisions
As we can see from the output of the previous block, one row may have either **NO Genre** (not ideal), **one genre** (ideal), or more than one genre. In this function, we will make a duplicate of each song that has more than one genre and seperate it into multiple values with a unique genre.

In [58]:
singularValue = str(main_df.iloc[:1, 2].values.tolist())

singularValue = singularValue.replace('[', '').replace(']', '').replace('"', '').replace("'", "")
singularValue = singularValue.split(', ')





3


In [None]:
# Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# Assuming X_train, X_test are the feature matrices, and y_train, y_test are the corresponding class labels


# Encode categorical variables if present
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)

# Create a Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100)  # Specify the number of trees as needed

# Train the model
rf_classifier.fit(X_train, y_train_encoded)

# Predict on the test set
y_pred_encoded = rf_classifier.predict(X_test)

# Decode the predicted labels
y_pred = label_encoder.inverse_transform(y_pred_encoded)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
