### *In this question, we will determine the optimal batch size for mini-batch gradient descent. Find the optimal batch size for mini-batch gradient descent by training the neural network and evaluating the performances for different batch sizes. Note: Use 5-fold cross-validation on the training partition to perform hyperparameter selection. You will have to reconsider the scaling of the dataset during the 5-fold cross validation.*

### *Plot mean cross-validation accuracies on the final epoch for different batch sizes as a scatter plot. Limit search space to batch sizes {64, 128, 256, 512}. Next, create a table of time taken to train the network on the last epoch against different batch sizes. Finally, select the optimal batch size and state a reason for your selection.*


This might take a while to run, so plan your time carefully.

In [1]:
import tqdm
import time
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from scipy.io import wavfile as wav

from sklearn import preprocessing
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix

from common_utils import set_seed, preprocess_dataset, preprocess, initialise_loaders, loss_fn

# setting seed
set_seed()

1. To reduce repeated code, place your

- network (MLP defined in QA1)

- torch datasets (CustomDataset defined in QA1)
- loss function (loss_fn defined in QA1)
in a separate file called common_utils.py

Import them into this file. You will not be repenalised for any error in QA1 here as the code in QA1 will not be remarked.

The following code cell will not be marked.

In [2]:
from common_utils import MLP, CustomDataset, loss_fn

2. Define different folds for different batch sizes to get a dictionary of training and validation datasets. Preprocess your datasets accordingly.

In [3]:
df = pd.read_csv('/content/drive/MyDrive/a - csv file/simplified.csv')
df.head()

Unnamed: 0,filename,tempo,total_beats,average_beats,chroma_stft_mean,chroma_stft_var,chroma_cq_mean,chroma_cq_var,chroma_cens_mean,chroma_cens_var,...,mfcc15_mean,mfcc15_var,mfcc16_mean,mfcc16_var,mfcc17_mean,mfcc17_var,mfcc18_mean,mfcc18_var,mfcc19_mean,mfcc19_var
0,app_3001_4001_phnd_neg_0000.wav,184.570312,623,69.222222,0.515281,0.093347,0.443441,0.082742,0.249143,0.021261,...,-10.669799,63.340282,1.811605,58.117188,-3.286546,54.268448,-2.719069,59.548176,-4.559987,70.774803
1,app_3001_4001_phnd_neg_0001.wav,151.999081,521,74.428571,0.487201,0.094461,0.542182,0.073359,0.274423,0.008025,...,-5.666375,90.256195,1.573594,105.070496,-0.742024,82.417496,-1.961745,119.312355,1.51366,101.014572
2,app_3001_4001_phnd_neg_0002.wav,112.347147,1614,146.727273,0.444244,0.099268,0.442014,0.083224,0.26443,0.01341,...,-5.50239,73.07975,0.202623,72.04055,-4.021009,73.844353,-5.916223,103.834824,-2.939086,113.598824
3,app_3001_4001_phnd_neg_0003.wav,107.666016,2060,158.461538,0.454156,0.100834,0.42437,0.084435,0.257672,0.016938,...,-8.812989,93.791893,-0.429413,60.002579,-4.013513,82.54454,-5.858006,84.402092,0.686969,90.126389
4,app_3001_4001_phnd_neg_0004.wav,75.99954,66,33.0,0.47878,0.1,0.414859,0.089313,0.252143,0.019757,...,-6.584204,64.973305,0.744403,68.908516,-6.354805,66.414391,-6.555534,47.85284,-4.809713,73.033966


In [4]:
df['label'] = df['filename'].str.split('_').str[-2]

X_train_scaled, y_train, X_test_scaled, y_test = preprocess(df)

  X_train_scaled = torch.tensor(X_train_scaled, dtype = torch.float32)
  X_test_scaled = torch.tensor(X_test_scaled, dtype = torch.float32)


In [5]:
X_train_scaled[0]

tensor([-0.8909,  0.2810,  0.7870, -0.0200,  0.6522,  0.0762,  1.4336,  0.6836,
        -0.6838, -0.7305, -0.3847, -1.0345,  1.2228,  0.2109,  0.4033, -0.9080,
        -0.7260, -0.0090,  0.3511,  0.6449,  2.7772,  0.1403,  0.7525,  0.1575,
         0.1575, -1.0339, -1.0339, -0.2987, -0.4455, -0.2508, -0.1117, -0.0942,
        -0.7146,  0.3838, -0.6509, -1.0404, -0.5998, -1.3200, -0.0771, -0.5295,
         0.5232,  1.1781,  0.4453, -1.2156, -0.5857, -0.2313,  0.0222, -0.1016,
        -0.2348, -0.4169, -0.3871,  0.5204, -0.0208, -0.3946,  0.1022,  0.1414,
        -0.6550,  0.2237, -0.6387, -0.5820,  0.1988,  0.4884, -0.1805, -0.0791,
         0.0422, -1.0661,  0.6090, -0.2500,  0.1916, -0.2100,  2.0314, -0.4746,
         1.3177,  0.1768,  1.0361,  0.8663,  0.7021])

In [6]:
y_train[:5]

tensor([0., 0., 0., 1., 0.])

In [7]:
torch.unique(y_train, return_counts = True)

(tensor([0., 1.]), tensor([4684, 4961]))

In [8]:
print("Training Labels Distribution:")
print(pd.Series(y_train).value_counts())

print("Validation Labels Distribution:")
print(pd.Series(y_test).value_counts())

Training Labels Distribution:
1.0    4961
0.0    4684
Name: count, dtype: int64
Validation Labels Distribution:
1.0    1241
0.0    1171
Name: count, dtype: int64


In [9]:
k = 5
batch_sizes = [64, 128, 256, 512]

def generate_cv_folds_for_batch_sizes(parameters, X_train_scaled, y_train):
    """
    returns:
    X_train_scaled_dict(dict) where X_train_scaled_dict[batch_size] is a list of the preprocessed training matrix for the different folds.
    X_val_scaled_dict(dict) where X_val_scaled_dict[batch_size] is a list of the processed validation matrix for the different folds.
    y_train_dict(dict) where y_train_dict[batch_size] is a list of labels for the different folds
    y_val_dict(dict) where y_val_dict[batch_size] is a list of labels for the different folds
    """

    X_train_scaled_dict = {}
    y_train_dict = {}
    X_val_scaled_dict = {}
    y_val_dict = {}

    kf = KFold(n_splits = k, shuffle = True, random_state = 42)

    for i in range(len(parameters)):
        X_train_scaled_dict[parameters[i]] = []
        y_train_dict[parameters[i]] = []
        X_val_scaled_dict[parameters[i]] = []
        y_val_dict[parameters[i]] = []

        for train_index, val_index in kf.split(X_train_scaled):
            X_train_splited, X_val_splited = X_train_scaled[train_index], X_train_scaled[val_index]
            y_train_splited, y_val_splited = y_train[train_index], y_train[val_index]

            X_train_splited_preprocess, X_val_splited_preprocess = preprocess_dataset(X_train_splited, X_val_splited)

            X_train_scaled_dict[parameters[i]].append(X_train_splited_preprocess)
            y_train_dict[parameters[i]].append(y_train_splited)
            X_val_scaled_dict[parameters[i]].append(X_val_splited_preprocess)
            y_val_dict[parameters[i]].append(y_val_splited)


    return X_train_scaled_dict, X_val_scaled_dict, y_train_dict, y_val_dict


X_train_scaled_dict, X_val_scaled_dict, y_train_dict, y_val_dict = generate_cv_folds_for_batch_sizes(batch_sizes, X_train_scaled, y_train)

In [10]:
len(X_train_scaled_dict)

4

In [11]:
len(X_train_scaled_dict[64])

5

In [12]:
len(X_train_scaled_dict[64][0])

7716

In [13]:
len(X_train_scaled_dict[64][0][0])

77

3. Perform hyperparameter tuning for the different batch sizes with 5-fold cross validation.

In [14]:
model = MLP(no_features = X_train_scaled.shape[1], no_hidden = 128, no_labels = 1)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)

num_epochs = 100

In [19]:
def train_model(model, optimizer, num_epochs, X_train, y_train, X_val, y_val):
    train_dataloader, test_dataloader = initialise_loaders(X_train, y_train, X_val, y_val)

    for epoch in range(num_epochs):
        # start time
        start_time = time.time()

        model.train()

        # for loop for each batch
        for batch_index, (X_train_batch, y_train_batch) in enumerate(train_dataloader):

            optimizer.zero_grad()
            y_train_predicted = model(X_train_batch)

            # Calculate loss
            loss = loss_fn(y_train_predicted, y_train_batch.unsqueeze(1))

            loss.backward()
            optimizer.step()
        # end time
        end_time = time.time()

        model.eval()

        if epoch == num_epochs - 1:
            # Calculate accuracy after each fold of each batch size
            with torch.no_grad():
                # correct test predictions -> calculate val accuracies
                correct_test_predictions = 0
                # for loop for each batch
                for batch_index, (X_val_batch, y_val_batch) in enumerate(test_dataloader):
                    y_val_predicted = model(X_val_batch)
                    # Calculate test accuracy
                    y_val_predicted_labels = (y_val_predicted >= 0.5).float()
                    correct_test_predictions += (y_val_predicted_labels == y_val_batch).sum().item()

                test_accuracy = correct_test_predictions / len(test_dataloader.dataset)

            # Append accuracy and training time of the final epoch
            cross_validation_accuracies = test_accuracy
            cross_validation_times = end_time - start_time

    return cross_validation_accuracies, cross_validation_times

In [20]:
def find_optimal_hyperparameter(X_train_scaled_dict, X_val_scaled_dict, y_train_dict, y_val_dict, parameters):
    # Initialize cv accuracies and times dictionary
    cross_validation_accuracies = {}
    cross_validation_times = {}

    for batch_index in tqdm.tqdm(range(len(parameters)), desc = "Batch sizes"):
        batch_size = parameters[batch_index]
        cross_validation_accuracies[parameters[batch_index]] = []
        cross_validation_times[parameters[batch_index]] = []

        for fold in tqdm.tqdm(range(len(X_train_scaled_dict[parameters[batch_index]])), desc = f"Fold for batch size {batch_size}", leave = False):

            X_train = X_train_scaled_dict[parameters[batch_index]][fold].type(torch.float32)
            y_train = y_train_dict[parameters[batch_index]][fold].type(torch.float32)
            X_val = X_val_scaled_dict[parameters[batch_index]][fold].type(torch.float32)
            y_val = y_val_dict[parameters[batch_index]][fold].type(torch.float32)

            cross_validation_accuracies_item, cross_validation_times_item = train_model(model, optimizer, num_epochs,
                                                                                        X_train, y_train, X_val, y_val)

            # Append cross_validation_accuracies[batch_size], cross_validation_times[batch_size]
            cross_validation_accuracies[parameters[batch_index]].append(cross_validation_accuracies_item)
            cross_validation_times[parameters[batch_index]].append(cross_validation_times_item)

    return cross_validation_accuracies, cross_validation_times

cross_validation_accuracies, cross_validation_times = find_optimal_hyperparameter(X_train_scaled_dict, X_val_scaled_dict, y_train_dict, y_val_dict, batch_sizes)

Batch sizes:   0%|          | 0/4 [00:00<?, ?it/s]
Fold for batch size 64:   0%|          | 0/5 [00:00<?, ?it/s][A
Fold for batch size 64:  20%|██        | 1/5 [00:21<01:27, 21.81s/it][A
Fold for batch size 64:  40%|████      | 2/5 [00:45<01:08, 22.92s/it][A
Fold for batch size 64:  60%|██████    | 3/5 [01:09<00:46, 23.40s/it][A
Fold for batch size 64:  80%|████████  | 4/5 [01:33<00:23, 23.48s/it][A
Fold for batch size 64: 100%|██████████| 5/5 [01:56<00:00, 23.61s/it][A
Batch sizes:  25%|██▌       | 1/4 [01:56<05:50, 116.94s/it]
Fold for batch size 128:   0%|          | 0/5 [00:00<?, ?it/s][A
Fold for batch size 128:  20%|██        | 1/5 [00:22<01:28, 22.24s/it][A
Fold for batch size 128:  40%|████      | 2/5 [00:45<01:08, 22.75s/it][A
Fold for batch size 128:  60%|██████    | 3/5 [01:11<00:48, 24.09s/it][A
Fold for batch size 128:  80%|████████  | 4/5 [01:33<00:23, 23.26s/it][A
Fold for batch size 128: 100%|██████████| 5/5 [01:56<00:00, 23.27s/it][A
Batch sizes:  50%|█████

In [21]:
cross_validation_accuracies

{64: [64.2514256091239,
  64.10782789009849,
  64.60808709175738,
  64.80819077242094,
  64.1601866251944],
 128: [64.53654743390358,
  64.02229134266459,
  63.87713841368585,
  64.71902540176256,
  64.63297045101089],
 256: [64.35406946604458,
  64.5137376879212,
  64.20580611715916,
  64.94297563504406,
  64.20787973043028],
 512: [64.07102125453603,
  64.20580611715916,
  64.23794712286158,
  64.93157076205287,
  64.25038880248833]}

In [22]:
cross_validation_times

{64: [0.19954967498779297,
  0.19513607025146484,
  0.32150983810424805,
  0.22108912467956543,
  0.21268010139465332],
 128: [0.31194543838500977,
  0.2023317813873291,
  0.3451814651489258,
  0.20494389533996582,
  0.20014214515686035],
 256: [0.3335123062133789,
  0.2069988250732422,
  0.22620248794555664,
  0.34028053283691406,
  0.2071692943572998],
 512: [0.2003481388092041,
  0.20895648002624512,
  0.19667744636535645,
  0.32222938537597656,
  0.21345138549804688]}

4. Plot scatterplot of mean cross validation accuracies for the different batch sizes.

In [None]:
# YOUR CODE HERE

5. Create a table of time taken to train the network on the last epoch against different batch sizes. Select the optimal batch size and state a reason for your selection.

In [None]:
df = pd.DataFrame({'Batch Size':
                   'Last Epoch Time':
                  })

df

In [None]:
# YOUR CODE HERE
optimal_batch_size =
reason =