# Music genre classifier
## Neural networks as classifiers
This notebook should be see as the third step in a series of notebooks aimed to build an ML audio classifier.

We continue our journey of music classification by trainging more complex models, such as CNNs and RNNs.
After this, we will see how our results compare against a pretrained model. 
If you missed our previous stesp, you can find them here:

- [preprocessing](https://github.com/pmhalvor/public-data/blob/master/notebooks/music-genre/preprocess.py)
- [traditional classifiers](https://github.com/pmhalvor/public-data/blob/master/notebooks/music-genre/classifiers.py) (note: currently only on branch [add/classifiers](https://github.com/pmhalvor/public-data/blob/add/classifiers/notebooks/music-genre/classifiers.ipynb))


## Goal
Train neural net classifiers to predict the genre of a song.

## Dataset
The dataset contains 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks were all 22050Hz Mono 16-bit audio files in .wav format.
In [preprocess.py](preprocess.py), we convert the .wav fiels to MFCC features, and store them as PyTorch tensors (`mfcc.pt`). Labels and file paths are stored as numpy-arrays. 

## Source
https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/ (accessed 2023-10-20)

In [170]:
from functools import partial
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, f1_score
from tqdm import tqdm

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Load data

In [3]:
mfcc_tensor = torch.load("mfcc.pt")
covariance_tensor =  torch.load("covariance.pt")
file_paths = np.load("file_paths.npy")
labels = np.load("labels.npy")

In [4]:
mfcc_tensor.shape

torch.Size([999, 2986, 13])

In [5]:
labels.shape

(999,)

In [6]:
# for plotting
file_paths.shape

(999,)

In [7]:
labels_to_idx = {label: idx for idx, label in enumerate(np.unique(labels))}
idx_to_labels = {idx: label for idx, label in enumerate(np.unique(labels))}
labels_to_idx

{'blues': 0,
 'classical': 1,
 'country': 2,
 'disco': 3,
 'hiphop': 4,
 'jazz': 5,
 'metal': 6,
 'pop': 7,
 'reggae': 8,
 'rock': 9}

# Build simple classifiers

Asked ChatGPT to generate a classifier for us. I fed it this prompt:

Me: _I want to test a basic feed-forward neural network, a CNN, and an RNN. I will be using PyTorch as my ML framework. Could you help me generate the basic boilerplate code?_ 


It came up with this the following three models:


In [8]:
features = 13
measurements = 2986
input_size = measurements * features  # Number of MFCC coefficients
hidden_size = 128  # Number of neurons in the hidden layer
num_classes = 10  # Number of music genres
criterion = nn.CrossEntropyLoss()

# Hyperparameters
num_epochs = 10
batch_size = 100
learning_rate = 0.001
rnn_layers = 2

In [182]:
class FFN(nn.Module):
    def __init__(self, input_size=input_size, hidden_size=hidden_size, num_classes=num_classes, num_layers=2):
        super(FFN, self).__init__()
        self.fc_first = nn.Linear(input_size, hidden_size)
        self.fc_last = nn.Linear(hidden_size, num_classes)
        self.num_layers = num_layers

        if num_layers > 1:
            self.fc_hidden = nn.ModuleList()
            for i in range(num_layers - 1):
                self.fc_hidden.append(nn.Linear(hidden_size, hidden_size))
    
    def forward(self, x):
        x = F.relu(self.fc_first(x))
        
        # hidden layers
        if self.num_layers > 1:
            for i in range(self.num_layers - 1):
                x = F.relu(self.fc_hidden[i](x))

        # output layer
        x = self.fc_last(x)
        
        return x
    
    # def predict(self, x):
    #     logits = self.forward(x)
    #     return F.softmax(logits, dim=1)
    
    # def fit(self, X, y, num_epochs=num_epochs, batch_size=batch_size, learning_rate=learning_rate):
    #     optimizer = optim.Adam(self.parameters(), lr=learning_rate)
    #     for epoch in tqdm(range(num_epochs)):
    #         for i in range(0, X.shape[0], batch_size):
    #             optimizer.zero_grad()
    #             batch_x = X[i:i + batch_size]
    #             batch_y = y[i:i + batch_size]
    #             outputs = self.forward(batch_x)
    #             loss = criterion(outputs, batch_y)
    #             loss.backward()
    #             optimizer.step()



In [175]:
class CNN(nn.Module):
    def __init__(self, num_channels=13, num_classes=10, out_channels=32, measurements=2986, verbose=False):
        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=num_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc = nn.Linear(out_channels * measurements * 3, num_classes)  # Adjust the input size based on your data

        self.measurements = measurements
        
        self.verbose = verbose
    
    def forward(self, x):
        print("input", x.shape) if self.verbose else None
        x = self.conv1(x)
        print("conv1", x.shape) if self.verbose else None
        x = self.relu(x)
        print("relu", x.shape) if self.verbose else None
        x = self.maxpool(x)
        print("maxpool", x.shape) if self.verbose else None
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        print("fc", x.shape) if self.verbose else None
        print("-"*10) if self.verbose else None
        return x


In [176]:
class RNN(nn.Module):
    def __init__(self, input_size=13, hidden_size=15, num_layers=2, num_classes=10, verbose=False):
        super(RNN, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

        self.verbose = verbose
    
    def forward(self, x):
        print("input", (x.shape)) if self.verbose else None
        _, (hn, _) = self.rnn(x)
        print("hidden", (hn.shape)) if self.verbose else None
        x = self.fc(hn[-1, :, :])
        print("output", (x.shape)) if self.verbose else None
        return x


# Train test split

In [12]:
# Reshape the data into a 2D array (num_samples, num_features)
num_samples, num_frames, num_mfcc = mfcc_tensor.shape
mfcc_tensor_2d = np.reshape(mfcc_tensor, (num_samples, num_frames * num_mfcc))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(mfcc_tensor_2d, labels, test_size=0.2, random_state=42)

# Get validation set
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42)

In [13]:
uniques, counts = np.unique(y_train, return_counts=True)
dict(zip(uniques, counts))

{'blues': 73,
 'classical': 61,
 'country': 68,
 'disco': 71,
 'hiphop': 73,
 'jazz': 71,
 'metal': 80,
 'pop': 72,
 'reggae': 77,
 'rock': 73}

# Train methods

In [41]:
X_train.reshape(*[-1, 1, measurements, features]).shape

torch.Size([719, 1, 2986, 13])

In [42]:
type([])

list

In [91]:
def train_batch(batch, model, criterion, optimizer):
    # Get the batch of data
    batch_X, batch_y = batch
    # convert strings to ids
    batch_y = np.array([labels_to_idx[x] for x in batch_y])
    batch_X = batch_X.float()
    
    # Zero out the gradients
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(batch_X)
    loss = criterion(outputs, torch.tensor(batch_y))
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    return loss.item()  


def eval_batch(X_test, y_test, model, criterion):
     # Evaluate
    model.eval()
    with torch.no_grad():
        X_test = X_test.float()
        y_test = torch.tensor([labels_to_idx[x] for x in y_test])
        outputs = model(X_test)
        loss = criterion(outputs, y_test)
    
    return loss.item()


def train_epoch(X_train, y_train, model, criterion, optimizer, batch_size=100):
    # Shuffle the training data
    indices = np.arange(len(X_train))
    np.random.shuffle(indices)
    
    # Create batches
    num_batches = len(X_train) // batch_size
    batches = [(X_train[i*batch_size:(i+1)*batch_size], y_train[i*batch_size:(i+1)*batch_size]) for i in range(num_batches)]
    
    # Train each batch
    losses = []
    for batch in batches:
        loss = train_batch(batch, model, criterion, optimizer)
        losses.append(loss)
    
    return losses


def train_model(X_train, y_train, X_test, y_test, model, criterion, optimizer, reshape=None, num_epochs=10, batch_size=100, verbose=False, n=10):
    train_losses = []
    eval_losses = []
    average_loss = []

    if reshape:
        X_train = X_train.reshape(*reshape)
        X_test = X_test.reshape(*reshape)
    
    for epoch in tqdm(range(num_epochs)):
        
        # Train
        model.train()
        losses = train_epoch(X_train, y_train, model, criterion, optimizer, batch_size=batch_size)
        train_losses.extend(losses)
        average_loss.append(np.mean(losses))
        
        # Evaluate
        model.eval()
        with torch.no_grad():
            eval_loss = eval_batch(X_test, y_test, model, criterion)
            eval_losses.append(eval_loss)
        
        if verbose or epoch % n == 0:
            print(
                'Epoch: {}'.format(epoch),
                'Train loss: {:.4f}'.format(losses[-1]),
                'Test  loss: {:.4f}'.format(eval_losses[-1])
            )
        

    return train_losses, eval_losses, average_loss


X_train.numpy().shape


(719, 38818)

In [15]:
def plot_losses(train_losses, val_losses, model=""):
    """Plot using Plotly Express"""
    import plotly.express as px
    import pandas as pd
    pd.options.plotting.backend = "plotly"
    
    df = pd.DataFrame({
        'epoch': np.arange(len(train_losses)),
        'train_loss': train_losses,
        'val_loss': val_losses
    })
    
    fig = px.line(df, x='epoch', y=['train_loss', 'val_loss'], title=f'Losses {model}')
    fig.show()

# Training

## Feed Forward 

In [114]:
input_size = 38818  # fixed
num_classes = 10    # fixed

hidden_size = 128*7 # tunable
num_layers = 2      # tunable
lr=0.00001          # tunable

In [116]:
ffn_model = FFN(input_size, hidden_size, num_classes, num_layers)
optimizer_ffn = optim.Adam(ffn_model.parameters(), lr=lr, weight_decay=1e-7)  # double check optimizer set-up

ffn_train_losses, ffn_val_losses, ffn_avg_losses = train_model(
    X_train, y_train, X_test, y_test, ffn_model, criterion, optimizer_ffn, 
    num_epochs=10, batch_size=batch_size, verbose=False
)

plot_losses(ffn_avg_losses, ffn_val_losses, "FFN")

 10%|█         | 1/10 [00:00<00:05,  1.67it/s]

Epoch: 0 Train loss: 2.2988 Test  loss: 2.2523


100%|██████████| 10/10 [00:06<00:00,  1.66it/s]


In [124]:
ffn_model = FFN(input_size, 128*5, num_classes)
optimizer_ffn = optim.Adam(ffn_model.parameters(), lr=0.001)  # double check optimizer set-up

ffn_train_losses, ffn_val_losses, ffn_avg_losses = train_model(
    X_train, y_train, X_test, y_test, ffn_model, criterion, optimizer_ffn, 
    num_epochs=15, batch_size=batch_size, verbose=False
)

plot_losses(ffn_avg_losses, ffn_val_losses, "FFN")

  7%|▋         | 1/15 [00:00<00:05,  2.73it/s]

Epoch: 0 Train loss: 385.2233 Test  loss: 345.3228


 73%|███████▎  | 11/15 [00:04<00:01,  2.66it/s]

Epoch: 10 Train loss: 0.1193 Test  loss: 1.9306


100%|██████████| 15/15 [00:05<00:00,  2.64it/s]


## CNN

In [18]:
# Initialize the CNN model
num_channels = 1  # Since each feature is treated as a channel
num_classes = 10  # Number of output classes

In [19]:
# Initialize the CNN model, loss function, and optimizer
cnn_model = CNN(num_channels, num_classes, out_channels=32, measurements=2986, features=13)
optimizer_cnn = optim.Adam(cnn_model.parameters(), lr=0.0001)

cnn_train_losses, cnn_val_losses, cnn_avg_losses = train_model(
    X_train, y_train, X_test, y_test, cnn_model, criterion, optimizer_cnn, 
    num_epochs=15, batch_size=batch_size, verbose=False
)

plot_losses(cnn_avg_losses, cnn_val_losses, "CNN")

  0%|          | 0/15 [00:00<?, ?it/s]

  7%|▋         | 1/15 [00:03<00:47,  3.37s/it]

Epoch: 0
Train loss: 493.1981
Val loss: 410.7944


 40%|████      | 6/15 [00:19<00:29,  3.31s/it]

Epoch: 5
Train loss: 39.7292
Val loss: 29.5560


 73%|███████▎  | 11/15 [00:36<00:13,  3.35s/it]

Epoch: 10
Train loss: 7.2381
Val loss: 13.9088


100%|██████████| 15/15 [00:50<00:00,  3.36s/it]


## RNN

In [50]:
features = 13  # Number of features
hidden_size = 13*4  # Number of hidden units in the RNN layer
num_layers = 2  # Number of RNN layers
num_classes = 10  # Number of output classes (genres)
batch_size = 100  # Number of examples in a batch

In [54]:
# Initialize the RNN model, loss function, and optimizer
rnn_model = RNN(features, hidden_size, num_layers, num_classes, verbose=False)
optimizer_rnn = optim.Adam(rnn_model.parameters(), lr=0.1)

rnn_train_losses, rnn_val_losses, rnn_avg_losses = train_model(
    X_train, y_train, X_test, y_test, rnn_model, criterion, optimizer_rnn, 
    reshape=[-1, measurements, features],
    num_epochs=10, batch_size=int(batch_size/2), verbose=False
)

plot_losses(rnn_avg_losses, rnn_val_losses, "RNN")

 10%|█         | 1/10 [00:25<03:49, 25.45s/it]

Epoch: 0
Train loss: 2.2789
Val loss: 2.1273


 60%|██████    | 6/10 [02:35<01:43, 25.96s/it]

Epoch: 5
Train loss: 2.0412
Val loss: 1.9351


100%|██████████| 10/10 [04:20<00:00, 26.02s/it]


In [31]:
# Initialize the RNN model, loss function, and optimizer
rnn_model = RNN(features, hidden_size, num_layers, num_classes, verbose=False)
optimizer_rnn = optim.Adam(rnn_model.parameters(), lr=0.1)

rnn_train_losses, rnn_val_losses, rnn_avg_losses = train_model(
    X_train, y_train, X_test, y_test, rnn_model, criterion, optimizer_rnn, 
    num_epochs=10, batch_size=int(batch_size/2), verbose=False
)

plot_losses(rnn_avg_losses, rnn_val_losses, "RNN")

  0%|          | 0/10 [00:00<?, ?it/s]

Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.S

 10%|█         | 1/10 [01:34<14:07, 94.14s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Epoch: 0
Train loss: 4.2229
Val loss: 3.3218
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.

 20%|██        | 2/10 [03:09<12:40, 95.07s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 30%|███       | 3/10 [36:31<1:52:39, 965.60s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 40%|████      | 4/10 [3:38:04<8:08:30, 4885.06s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 50%|█████     | 5/10 [6:52:51<10:11:28, 7337.73s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 60%|██████    | 6/10 [7:31:21<6:15:12, 5628.21s/it] 

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Epoch: 5
Train loss: 2.5163
Val loss: 2.4869
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.

 70%|███████   | 7/10 [7:42:57<3:20:47, 4015.67s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 80%|████████  | 8/10 [7:44:37<1:32:18, 2769.21s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

 90%|█████████ | 9/10 [7:46:12<32:13, 1933.38s/it]  

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size([50, 2986, 13])
Shape of hidden state: torch.Size([2, 50, 256])
Shape of output: torch.Size([50, 10])
Shape of input: torch.Size([50, 38818])
Shape of input: torch.Size(

100%|██████████| 10/10 [7:47:59<00:00, 2807.97s/it]

Shape of hidden state: torch.Size([2, 200, 256])
Shape of output: torch.Size([200, 10])





In [None]:
# oi, the above took 468 minutes to run on my laptop (7h 48m)

# Evaluate


In [156]:
def metrics(predictions, y_labels, verbose=False):
    """Calculate accuracy, F1 score, and confusion matrix"""
    accuracy = accuracy_score(y_labels, predictions)
    f1 = f1_score(y_labels, predictions, average='weighted', zero_division=0)
    confusion = confusion_matrix(y_labels, predictions)
    report = classification_report(y_labels, predictions, zero_division=0)

    if verbose:
        print("Accuracy:", accuracy)
        print("F1 Score:", f1) 
        if verbose > 1:
            print("Classification Report:\n", report)
    return accuracy, f1, confusion, report

In [157]:
def plot_confusion_matrix(cm, classes=list(set(labels_to_idx)), name="", cmap=px.colors.sequential.Blues):
    """
    This function prints and plots the confusion matrix.
    """
    fig = px.imshow(cm, x=classes, y=classes, color_continuous_scale=cmap)
    fig.update_layout(title="Confusion matrix "+name, xaxis_title="Predicted", yaxis_title="Actual")
    fig.show()


In [160]:
def evaluate(model, X_val=X_val, y_val=y_val, name="", reshape=None, plot=True, verbose=True):
    """Evaluate the model on the validation set"""
    model.eval()
    with torch.no_grad():
        X_val = X_val.reshape(*reshape).float() if reshape else X_val.float()
        # y_val = torch.tensor([labels_to_idx[x] for x in y_val])

        outputs = model(X_val)

        # ffn_preds = np.argmax(ffn_outputs, axis=1)
        _, predictions = torch.max(outputs.data, 1)
        predictions = np.array([idx_to_labels[pred.item()] for pred in predictions])
        accuracy, f1, confusion, report = metrics(predictions, y_val, verbose=verbose)

        print("(Prediction, Label): ", list(zip(predictions, y_val))) if verbose>1 else None

        plot_confusion_matrix(confusion, name=name) if plot else None
    
    return accuracy, f1, confusion, report

In [162]:
ffn_eval = evaluate(ffn_model, name="FFN", verbose=2)

Accuracy: 0.4875
F1 Score: 0.49532102314610055
Classification Report:
               precision    recall  f1-score   support

       blues       0.60      0.60      0.60         5
   classical       0.70      0.64      0.67        11
     country       0.33      0.30      0.32        10
       disco       0.00      0.00      0.00         5
      hiphop       0.57      0.57      0.57         7
        jazz       0.33      0.44      0.38         9
       metal       0.62      0.62      0.62         8
         pop       0.71      0.71      0.71         7
      reggae       0.56      0.62      0.59         8
        rock       0.60      0.30      0.40        10

    accuracy                           0.49        80
   macro avg       0.50      0.48      0.49        80
weighted avg       0.52      0.49      0.50        80

(Prediction, Label):  [('disco', 'hiphop'), ('disco', 'metal'), ('blues', 'blues'), ('country', 'jazz'), ('classical', 'country'), ('classical', 'classical'), ('jazz', 'c

In [97]:
# # # 7 layers *5 hidden size 0 weigh decay
# # Accuracy: 0.625
# # F1 Score: 0.5968102918258138
# ffn_eval = evaluate(ffn_model, plot=True)


Accuracy: 0.625
F1 Score: 0.5968102918258138
Classification Report:
               precision    recall  f1-score   support

           0       0.50      0.60      0.55         5
           1       0.83      0.91      0.87        11
           2       0.33      0.10      0.15        10
           3       0.33      0.20      0.25         5
           4       0.56      0.71      0.63         7
           5       0.70      0.78      0.74         9
           6       0.67      0.75      0.71         8
           7       0.67      0.86      0.75         7
           8       0.50      0.62      0.56         8
           9       0.67      0.60      0.63        10

    accuracy                           0.62        80
   macro avg       0.58      0.61      0.58        80
weighted avg       0.59      0.62      0.60        80



In [125]:
ffn_eval = evaluate(ffn_model, plot=True)


Accuracy: 0.4875
F1 Score: 0.49532102314610055
Classification Report:
               precision    recall  f1-score   support

           0       0.60      0.60      0.60         5
           1       0.70      0.64      0.67        11
           2       0.33      0.30      0.32        10
           3       0.00      0.00      0.00         5
           4       0.57      0.57      0.57         7
           5       0.33      0.44      0.38         9
           6       0.62      0.62      0.62         8
           7       0.71      0.71      0.71         7
           8       0.56      0.62      0.59         8
           9       0.60      0.30      0.40        10

    accuracy                           0.49        80
   macro avg       0.50      0.48      0.49        80
weighted avg       0.52      0.49      0.50        80



In [36]:
cnn_outputs = cnn_model(X_val).detach().numpy()
cnn_preds = np.argmax(cnn_outputs, axis=1)


cnn_outputs = cnn_model(X_val).detach().numpy()
cnn_preds = np.argmax(cnn_outputs, axis=1)


cnn_acc, cnn_f1, cnn_cm, cnn_cr = metrics(cnn_preds, y_labels, verbose=True)

plot_confusion_matrix(cnn_cm)

Accuracy: 0.425
F1 Score: 0.377282396294493
Classification Report:
               precision    recall  f1-score   support

           0       0.60      0.60      0.60         5
           1       0.80      0.73      0.76        11
           2       0.00      0.00      0.00        10
           3       0.00      0.00      0.00         5
           4       1.00      0.14      0.25         7
           5       0.50      0.33      0.40         9
           6       0.28      1.00      0.43         8
           7       0.80      0.57      0.67         7
           8       0.33      0.12      0.18         8
           9       0.29      0.60      0.39        10

    accuracy                           0.42        80
   macro avg       0.46      0.41      0.37        80
weighted avg       0.46      0.42      0.38        80



In [57]:
rnn_outputs = rnn_model(X_val.reshape(*[-1, measurements, features])).detach().numpy()
rnn_preds = np.argmax(rnn_outputs, axis=1)

rnn_acc, rnn_f1, rnn_cm, rnn_cr = metrics(rnn_preds, y_labels, verbose=True)

plot_confusion_matrix(rnn_cm)

Accuracy: 0.3
F1 Score: 0.2341564685314685
Classification Report:
               precision    recall  f1-score   support

           0       0.13      1.00      0.23         5
           1       0.82      0.82      0.82        11
           2       0.33      0.10      0.15        10
           3       0.00      0.00      0.00         5
           4       1.00      0.14      0.25         7
           5       0.00      0.00      0.00         9
           6       0.00      0.00      0.00         8
           7       0.37      1.00      0.54         7
           8       0.00      0.00      0.00         8
           9       0.33      0.10      0.15        10

    accuracy                           0.30        80
   macro avg       0.30      0.32      0.21        80
weighted avg       0.32      0.30      0.23        80



# Hyperparameter tuning


In [166]:
def get_module_name(model):
    """Get the name of the model"""
    return model.__class__.__name__

In [196]:
def grid_search(
        classifier, params, cv=5, 
        X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, 
        return_full_metrics=False, verbose=False, plot=True, name="", plot_cm=True
    ):
    # Create a GridSearchCV object with the specified parameter grid and classifier
    grid_search = GridSearchCV(estimator=classifier, param_grid=params, cv=cv, n_jobs=-1, scoring="f1_macro")  #scoring= f1 with average='weighted'?

    # Perform grid search on your data
    label_idx_train = np.array([labels_to_idx[x] for x in y_train])
    grid_search.fit(X_train, label_idx_train)

    # Print the best parameters found by the grid search
    print("Best Parameters:", grid_search.best_params_)

    # Make predictions using the best estimator
    predictions = grid_search.predict(X_test)
    predictions = np.array([idx_to_labels[pred] for pred in predictions])
    print("(Prediction, Label): ", list(zip(predictions, y_test))) if verbose>1 else None

    accuracy, f1, cm, cr = metrics(predictions, y_test, verbose=verbose)
    
    # append classifier name and params for plotting
    grid_search.name = get_module_name(classifier)
    grid_search.params = params
    grid_search.f1 = f1
    grid_search.accuracy = accuracy

    if return_full_metrics:
        return accuracy, f1, cm, cr, grid_search
    
    plot_confusion_matrix(cm, name=name) if plot_cm else None
    plot_grid_seach(grid_search, verbose=verbose) if plot else None


    return grid_search


def plot_grid_seach(gs, score_col="mean_test_score", param_cols=None, verbose=0):
    """#d plot of grid search with params on x and y and score on z axis"""
    gs_df = pd.DataFrame(gs.cv_results_)

    # get score column
    if score_col is None:
        score_col = [x for x in gs_df.columns if "score" in x].pop()
        print("score_col", score_col) if verbose>1 else None

    # get param cols
    if param_cols is None:
        param_cols = [x for x in gs_df.columns if "param_" in x]
        print("param_cols", param_cols) if verbose >3 else None

    # get sizes
    x_size = len(gs_df[param_cols[0]].unique())
    y_size = len(gs_df[param_cols[1]].unique())

    # get x, y, z  # need smart way of finding df size..
    x = gs_df[param_cols[0]].values.reshape(x_size, y_size).T[0]
    y = gs_df[param_cols[1]].values.reshape(y_size, x_size)[0]
    z = gs_df[score_col].values.reshape(x_size, y_size).T

    fig = go.Figure(
        data=[go.Surface(
            x=x, y=y, z=z, 
            hovertemplate=f"{param_cols[0]}: {'%{x}'}<br>{param_cols[1]}: {'%{y}'}<br>{score_col}: {'%{z}'}<extra></extra>",
        )]
    )
    fig.update_layout(
        title=f"GridSearchCV Results for {gs.name} Classifier", 
        scene=dict(
            xaxis_title=param_cols[0], 
            yaxis_title=param_cols[1], 
            zaxis_title=score_col,
            # xaxis_type="log" if "x" in log else "linear",
            # yaxis_type="log" if "y" in log else "linear",
            # zaxis_type="log" if "z" in log else "linear",
        ),
        height=750,
    )

    if verbose > 2:
        print("x", x)
        print("y", y)
        print("z", z)

    return fig.show()



## Feed Forward

In [197]:
from skorch import NeuralNetClassifier

# Initialize the FFN model, loss function, and optimizer
ffn_net = NeuralNetClassifier(
    FFN, 
    criterion=nn.CrossEntropyLoss, 
    optimizer=optim.Adam, 
    lr=0.00001, 
    max_epochs=10, 
    batch_size=100, 
    # device="cuda" if torch.cuda.is_available() else "cpu",
    verbose=0
)

In [198]:
ffn_gs = grid_search(
    ffn_net,
    params={
        "module__hidden_size": [128*3, 128*5, 128*7, 128*9],
        "module__num_layers": [1, 3, 5],
        "optimizer__lr": [0.0001, 0.001, 0.01, 0.1],
        "optimizer": [optim.Adam, optim.SGD],
    },
    name="FFN",
    verbose=2,
)


A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.

