# Music genre classifier
## Classifiers
This notebook should be see as the second step in a series of notebooks aimed to build an ML audio classifier.

We will start by looking into traditional ML classifiers, such as SVM, KNN, and Random Forest.
Then, we can move on to more complex models, such as CNNs and RNNs.
Lastly, we will see how our results compare against a pretrained model. 

## Goal
Train a high-performing ML classifier to predict the genre of a song.

## Dataset
The dataset contains 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks were all 22050Hz Mono 16-bit audio files in .wav format.
In [preprocess.py](preprocess.py), we convert the .wav fiels to MFCC features, and store them as PyTorch tensors (`mfcc.pt`). Labels and file paths are stored as numpy-arrays. 

## Source
https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/ (accessed 2023-10-20)

# Load data


In [23]:
from functools import partial
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

import numpy as np

import torch 
import torch.nn as nn

import os 
import plotly.express as ex

In [8]:

mfcc_tensor = torch.load("mfcc.pt")
covariance_tensor =  torch.load("covariance.pt")
file_paths = np.load("file_paths.npy")
labels = np.load("labels.npy")


In [9]:
mfcc_tensor.shape

torch.Size([999, 2986, 13])

In [10]:
covariance_tensor.shape

torch.Size([999, 13, 13])

In [11]:
labels.shape

(999,)

In [12]:
file_paths.shape

(999,)

## Train test split
To be fair, we need to train-test split early, and use the same sets across all classifiers. 
We will use a 80-20 split, and stratify on the labels to ensure that the distribution of labels is the same in both sets.

Further, we can take 10% of the training set as a validation set, to be used for hyperparameter tuning.

In [16]:
# Reshape the data into a 2D array (num_samples, num_features)
num_samples, num_frames, num_mfcc = mfcc_tensor.shape
mfcc_tensor_2d = np.reshape(mfcc_tensor, (num_samples, num_frames * num_mfcc))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(mfcc_tensor_2d, labels, test_size=0.2, random_state=42)

# Get validation set
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)

## Traditional ML classifiers
In this section, we will build three simple, traiditional ML classifiers: SVM, KNN, and Random Forest. We will use the MFCC features as input, and the genre-label as output.


## Random Forest
A random forest classifier is a tree based model that leverages the power of multiple decision trees to make predictions. 
Each tree is trained on a random subset of the data, and the final prediction is the average of the predictions of all the trees. 
This helps to reduce overfitting, and makes the model more robust.

These models often perfrom well on tabular data, and are relatively easy to train.
We will use the `RandomForestClassifier` from `sklearn.ensemble` to train our model.

In [39]:
# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(random_state=42)

# Train the classifier
rf_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = rf_classifier.predict(X_test)

# Evaluate 
rf_accuracy = accuracy_score(y_test, predictions)
rf_f1_score= f1_score(y_test, predictions, average="weighted")
rf_confusion_matrix = confusion_matrix(y_test, predictions, labels=list(set(labels)))
rf_classification_report = classification_report(y_test, predictions)
print("Accuracy:", rf_accuracy)
print("Classification Report:\n", rf_classification_report)

Accuracy: 0.5
Classification Report:
               precision    recall  f1-score   support

       blues       0.42      0.45      0.43        22
   classical       0.96      0.93      0.95        28
     country       0.46      0.27      0.34        22
       disco       0.31      0.21      0.25        24
      hiphop       0.54      0.35      0.42        20
        jazz       0.42      0.42      0.42        19
       metal       0.41      0.92      0.56        12
         pop       0.58      0.90      0.70        21
      reggae       0.50      0.33      0.40        15
        rock       0.17      0.18      0.17        17

    accuracy                           0.50       200
   macro avg       0.48      0.50      0.47       200
weighted avg       0.50      0.50      0.48       200



In [42]:
def plot_confusion_matrix(cm, classes=list(set(labels)), title='Confusion matrix', cmap=ex.colors.sequential.Blues):
    """
    This function prints and plots the confusion matrix.
    """
    fig = ex.imshow(cm, x=classes, y=classes, color_continuous_scale=cmap)
    fig.update_layout(title=title, xaxis_title="Predicted", yaxis_title="Actual")
    fig.show()

plot_confusion_matrix(rf_confusion_matrix, title="Confusion Matrix for Random Forest Classifier")


## SVM 

An SVM is a kernel-based supervised learning model that tries to find a (set of) hyperplane(s) that separates the data into classes.
The hyperplane should maximizes the margin between the classes.
Kernels can be linear, polynomial, or radial basis functions (RBF).
A deep-dive on SVMs is beyond the scope of this notebook, but you can read more about them [here](https://queirozf.com/entries/choosing-c-hyperparameter-for-svm-classifiers-examples-with-scikit-learn).

Our implementation will make use of the `SVC` module from `sklearn.svm` to train our model. 

In [43]:
# Initialize the SVM classifier
svm_classifier = SVC(kernel='linear', C=1.0, random_state=42)

# Train the classifier
svm_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = svm_classifier.predict(X_test)

# Evaluate
svm_accuracy = accuracy_score(y_test, predictions)
svm_f1_score = f1_score(y_test, predictions, average='weighted')
svm_confusion_matrix = confusion_matrix(y_test, predictions)
svm_classification_report = classification_report(y_test, predictions)

print("Accuracy:", svm_accuracy)
print("F1 Score:", svm_f1_score)
print("Classification Report:\n", svm_classification_report)
plot_confusion_matrix(svm_confusion_matrix, title="Confusion Matrix for SVM Classifier")


Accuracy: 0.585
F1 Score: 0.5796226458334879
Classification Report:
               precision    recall  f1-score   support

       blues       0.42      0.45      0.43        22
   classical       0.93      0.96      0.95        28
     country       0.53      0.36      0.43        22
       disco       0.54      0.54      0.54        24
      hiphop       0.53      0.45      0.49        20
        jazz       0.53      0.47      0.50        19
       metal       0.53      0.75      0.62        12
         pop       0.76      0.90      0.83        21
      reggae       0.62      0.53      0.57        15
        rock       0.26      0.29      0.28        17

    accuracy                           0.58       200
   macro avg       0.56      0.57      0.56       200
weighted avg       0.58      0.58      0.58       200



## KNN

A KNN classifier is a simple model that classifies a data point based on the class of its nearest neighbors.
The number of neighbors to consider is a hyperparameter, and should be tuned to find the optimal value.
The first music genre classification blogpost I came over used a KNN classifier, so I thought it would be fun to try it here, and compare against the other simple models.

We will use the `KNeighborsClassifier` from `sklearn.neighbors` to train our model.

In [44]:
# Initialize the KNN classifier with a specified number of neighbors (e.g., 5)
knn_classifier = KNeighborsClassifier()

# Train the classifier
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = knn_classifier.predict(X_test)

# Evaluate 
knn_accuracy = accuracy_score(y_test, predictions)
knn_f1_score = f1_score(y_test, predictions, average='weighted')
knn_confusion_matrix = confusion_matrix(y_test, predictions)
knn_classification_report = classification_report(y_test, predictions)

print("Accuracy:", knn_accuracy)
print("F1 Score:", knn_f1_score)
print("Classification Report:\n", knn_classification_report)
plot_confusion_matrix(knn_confusion_matrix, title="Confusion Matrix for KNN Classifier")

Accuracy: 0.31
F1 Score: 0.2670479172726008
Classification Report:
               precision    recall  f1-score   support

       blues       1.00      0.18      0.31        22
   classical       0.75      0.96      0.84        28
     country       0.50      0.09      0.15        22
       disco       0.00      0.00      0.00        24
      hiphop       1.00      0.10      0.18        20
        jazz       0.15      0.47      0.23        19
       metal       0.18      1.00      0.31        12
         pop       0.67      0.10      0.17        21
      reggae       1.00      0.07      0.12        15
        rock       0.13      0.18      0.15        17

    accuracy                           0.31       200
   macro avg       0.54      0.31      0.25       200
weighted avg       0.55      0.31      0.27       200



## Out-of-the-box analysis
We've now trained three traditional ML classifiers out-of-the-box to classify music genres. 
This means, we just use the default hyperparameter values for each of our three classifiers. 


Of course, there is quite a lot of optimization that can be done to improve model performance. 
That'll be our next step. 


# Hyperparameter optimization

We'll now use `sklearn`'s `GridSearchCV` to tune our hyperparameters for each of these classifier types. 
Here's a breif overview of each classifiers' tunable hyperparameter:

**RandomForestClassifier**
- `n_esimators`: number of tree in the forest
- `max_depth`: max number of layers per tree
- `min_samples_leaf`: minimum number of samples to be at each leaf node
- `min_samples_split`: minimum number of samples needed before a node splits 
_(For simplicity, we'll reduce the scope of our search to only `n_estimators` and `max_depth` )_


**SVC**
- `C`: how much to punish model for misclassifications
- `kernel`: shape of ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']
_(For simplicity, we'll stick to these two)_

**KNN**
- `neighbors`: number of neighbors to consider when determining a category
- `algorithm`: ['auto', 'ball_tree', 'kd_tree', 'brute']
_(For simplicity, we'll stick to these two)_



In [None]:

rfc_42 = partial(RandomForestClassifier, random_state=42)


In [None]:
def grid_search(classifier, params, cv=5, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test):
    estimator = partial(classifier, random_state=42)

    # Create a GridSearchCV object with the specified parameter grid and classifier
    grid_search = GridSearchCV(estimator=estimator, param_grid=params, cv=cv, n_jobs=-1)

    # Perform grid search on your data
    grid_search.fit(X_train, y_train)

    # Print the best parameters found by the grid search
    print("Best Parameters:", grid_search.best_params_)

    # Make predictions using the best estimator
    predictions = grid_search.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_test, predictions)
    f1_score = f1_score(y_test, predictions, average='weighted')
    confusion_matrix = confusion_matrix(y_test, predictions)
    classification_report = classification_report(y_test, predictions)

    print("Accuracy:", accuracy)
    print("F1 Score:", f1_score)
    print("Classification Report:\n", classification_report)
    plot_confusion_matrix(confusion_matrix, title="Confusion Matrix for SVM Classifier")

    return 