### Description
This Notebook shows how to train and use the library for music genre classification.

### Download data
For this part you will need to have kaggle installed. `pip install kaggle`.
Alternativaly you could download the dataset from `https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification` and manually extract it into `../data/gtzan-dataset-music-genre-classification`

In [1]:
import os
import subprocess

dataset_path = "data/gtzan-dataset-music-genre-classification"

if not os.path.exists(dataset_path):
    print("Dataset not found. Downloading...")
    os.makedirs(dataset_path, exist_ok=True)
    subprocess.run(["kaggle", "datasets", "download", "-d", "andradaolteanu/gtzan-dataset-music-genre-classification", "-p", dataset_path, "--unzip"])
    print("Download complete.")
else:
    print("Dataset already exists.")

Dataset already exists.


### Data format
We first have to create a dataframe which stores all the data (file paths and labels)

In [2]:
import pandas as pd

In [3]:
genres_path = os.path.join(dataset_path, "Data/genres_original")
paths = []
labels = []
for genre in os.listdir(genres_path):
    folder_path = os.path.join(genres_path, genre)
    for filename in os.listdir(folder_path):
        paths.append(os.path.join(folder_path, filename))
        labels.append(genre)
df = pd.DataFrame({"file_path": paths, "label": labels})
df = df[df["file_path"] != "data/gtzan-dataset-music-genre-classification\Data/genres_original\jazz\jazz.00054.wav"] # remove corrupted file
df.sample(5)

Unnamed: 0,file_path,label
777,data/gtzan-dataset-music-genre-classification\...,pop
865,data/gtzan-dataset-music-genre-classification\...,reggae
752,data/gtzan-dataset-music-genre-classification\...,pop
622,data/gtzan-dataset-music-genre-classification\...,metal
288,data/gtzan-dataset-music-genre-classification\...,country


In [5]:
import sys
import os
sys.path.insert(0, os.path.abspath('../src'))
sys.path.insert(0, os.path.abspath('../'))

### Model training

In [7]:
from auto_audio.auto_audio_model import AutoAudioModel
from auto_audio.hyperparameter_tuner import HyperparameterTuner
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)
tuner = HyperparameterTuner(search_method="random",n_iter=4)
model = AutoAudioModel()
model.fit(df_train, time_limit=600, tuner=tuner)

Preprocessing audio files.
Finished preprocessing files.
Cuda not available. Not training transformer model.
Training SVM
SVM achieved 33.1% accuracy.
Tuning model hyperparameters.
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best parameters found: {'kernel': 'linear', 'gamma': 'scale', 'degree': 3, 'C': 100}
Tuned SVM achieved 55.0% accuracy.
Training KNN
KNN achieved 38.8% accuracy.
Tuning model hyperparameters.
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best parameters found: {'weights': 'distance', 'n_neighbors': 7, 'metric': 'manhattan'}
Tuned KNN achieved 43.1% accuracy.
Training Gradient Boosting
Gradient Boosting achieved 59.4% accuracy.
Tuning model hyperparameters.
Fitting 5 folds for each of 10 candidates, totalling 50 fits


KeyboardInterrupt: 

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score
import matplotlib.pyplot as plt
import numpy as np

y_test = df_test["label"]
y_pred = model.predict(df_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

cm = confusion_matrix(y_test, y_pred, labels=np.unique(y_test))
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=np.unique(y_test))
disp.plot()
plt.xticks(rotation=90)
plt.show()