# 🎵 Music Genre Classification

This notebook presents a machine learning pipeline for classifying music genres using Support Vector Machines (SVM) with One-vs-Rest and One-vs-One strategies.

We use the GTZAN dataset and audio features (MFCC). The model used is RBF Binary Kernel SVM with both multiclass classification strategies One-vs-Rest and One-vs-One.


In [45]:
%load_ext autoreload
%autoreload 2

import numpy as np
import matplotlib.pyplot as plt
import sys
import os

# Add the repo root (two levels up from this notebook) to sys.path
sys.path.insert(0, os.path.abspath("../../"))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Loading
 
The GTZAN dataset has data in 2 formats:
1. 30s fragments of each song
2. Each fragment from 1. is split into 10 segments with 3s length

We will work with the 30s file.

In [46]:
import courselib.utils.loaders as loaders

df = loaders.load_music_30_sec()
#df = loaders.load_music_3_sec()

Loading from `features_30_sec.csv`...


### Train-Test Split

We will train the models on all MFCC features and do the train-test split proportion 80/20.

In [47]:
from courselib.utils.splits import train_test_split

# Extract only MFCC columns from df
mfcc_columns = [col for col in df.columns if col.startswith('mfcc')]
features = mfcc_columns + ['label']

# Do train test split
X, Y, X_train, Y_train, X_test, Y_test = train_test_split(
    df[features],
    training_data_fraction=0.8,
    class_column_name='label',
    shuffle=True,
    return_numpy=True
)

print('Training data split as follows:')
print(f'  Training data samples: {len(X_train)}')
print(f'      Test data samples: {len(X_test)}')

Training data split as follows:
  Training data samples: 800
      Test data samples: 200


## Preprocessing

We standardize our features using z-score normalization. This step rescales all features to have zero mean and unit variance.

We also compute a reasonable value for the kernel width parameter $\sigma$ using the *median heuristic*:
$$
\sigma = \sqrt{ \frac{ \text{median}( \| x_i - x_j \|^2 ) }{2} }
$$

This tunning was done because of low accuracy on not preprocessed data.

In [None]:
from sklearn.preprocessing import StandardScaler

# Standardize features (z-score)
scaler = StandardScaler().fit(X_train)
X_train_z = scaler.transform(X_train)
X_test_z  = scaler.transform(X_test)

# Compute kernel width using median heuristic (for RBF kernel)
subset = X_train_z[np.random.choice(len(X_train_z), 500, replace=False)]
d2 = np.sum((subset[:, None, :] - subset[None, :, :])**2, axis=-1)
sigma = np.sqrt(0.5 * np.median(d2[d2 > 0]))  # avoid zero distances

X_train = X_train_z
X_test = X_test_z

### OvR model

First we train the OvR classification strategy with the RBF Binary Kernel SVM model.

Then evaluate the performance ovr the test data. Both single model as well as overall performance statistics are calculated.

In [50]:
from courselib.models.multiclass_svm import KernelMulticlassOvR

svmOvR = KernelMulticlassOvR(kernel='rbf', sigma=sigma)
svmOvR.fit(X_train, Y_train)

In [51]:
svmOvR.evaluate_models(X_test, Y_test)
svmOvR.evaluate_accuracy(X_test, Y_test)

📊 Accuracy of each binary model (One-vs-Rest):
  - Class 'blues': 93.0000 (Support vectors: 223)
  - Class 'classical': 94.0000 (Support vectors: 122)
  - Class 'country': 92.5000 (Support vectors: 220)
  - Class 'disco': 92.0000 (Support vectors: 225)
  - Class 'hiphop': 94.0000 (Support vectors: 215)
  - Class 'jazz': 91.5000 (Support vectors: 188)
  - Class 'metal': 95.0000 (Support vectors: 171)
  - Class 'pop': 94.5000 (Support vectors: 155)
  - Class 'reggae': 91.5000 (Support vectors: 210)
  - Class 'rock': 91.0000 (Support vectors: 242)
🎯 Overall accuracy (OvR): 70.0000 %


### OvO model

First we train the OvO classification strategy with the RBF Binary Kernel SVM model.

Then evaluate the performance ovr the test data. Both single model as well as overall performance statistics are calculated.

In [52]:
from courselib.models.multiclass_svm import KernelMulticlassOvO

svmOvO = KernelMulticlassOvO(kernel='rbf', sigma=sigma)
svmOvO.fit(X_train, Y_train)

In [53]:
svmOvO.evaluate_models(X_test, Y_test)
svmOvO.evaluate_accuracy(X_test, Y_test)

📊 Accuracy of each OvO binary classifier:
  - Classifier 'blues' vs 'classical': 100.0000 (Support vectors: 60)
  - Classifier 'blues' vs 'country': 89.1892 (Support vectors: 113)
  - Classifier 'blues' vs 'disco': 91.1765 (Support vectors: 114)
  - Classifier 'blues' vs 'hiphop': 96.7742 (Support vectors: 98)
  - Classifier 'blues' vs 'jazz': 95.3488 (Support vectors: 87)
  - Classifier 'blues' vs 'metal': 89.7436 (Support vectors: 89)
  - Classifier 'blues' vs 'pop': 97.6744 (Support vectors: 57)
  - Classifier 'blues' vs 'reggae': 87.5000 (Support vectors: 94)
  - Classifier 'blues' vs 'rock': 86.1111 (Support vectors: 127)
  - Classifier 'classical' vs 'country': 100.0000 (Support vectors: 67)
  - Classifier 'classical' vs 'disco': 94.8718 (Support vectors: 53)
  - Classifier 'classical' vs 'hiphop': 100.0000 (Support vectors: 45)
  - Classifier 'classical' vs 'jazz': 87.5000 (Support vectors: 89)
  - Classifier 'classical' vs 'metal': 100.0000 (Support vectors: 37)
  - Classifier 