# CS7643 Final Project

This notebook is meant to go through the different models and experiment on our dataset containing 21782 training samples on different sounds. 

## Group Members
Zach Halaby
Michael Marzec
Shayan Mukhtar

## Dataset

The dataset was obtained under Google's GPL license terms from the following site: https://research.google.com/audioset/download.html

## Colab Prep

github <--> colab instructions: https://medium.com/analytics-vidhya/how-to-use-google-colab-with-github-via-google-drive-68efb23a42d

In [None]:
### If using Google Colab
from google.colab import drive
drive.mount('/content/drive')

In [None]:
%load_ext autoreload
%autoreload 2

### Initial Cloning

In [None]:
### cd to github drive (create the folder, if it doesn't already exist)
%cd /content/drive/Othercomputers/My\ MacBook\ Pro\ \(1\)

# git clone -b branch https://{git_token}@github.com/{username}/{repository}
# See instructions file if you're unfamiliar with generating git_tokens

username = 'shayanmukhtar'
repository = 'cs7643_final'
# git_token = 

# !git clone -b UPDATE_BRANCH https://{git_token}@github.com/{username}/{repository}



### Git Commands

#### access git (via Google Drive)

In [None]:
repository = 'final'
%cd {repository}

%ls -a

#### Commit / Status / Push

In [None]:
# !git status


In [None]:
# !git add .

In [None]:
# !git config user.email "michaelmarzec11@gmail.com"
# !git commit -m "notebook rename"

In [None]:
# !git push

## Instructions
Cells with a "Mandatory" in their title must be run. Cells with a title stating that running is optional do not have to be run.

## Mandatory - Imports

Let's start by importing the necessary packages:

In [None]:
!pip install torchmetrics
!pip install tfrecord

import os.path

import torch
import torchmetrics
import tfrecord
import numpy as np
from os import walk

from torchmetrics import ConfusionMatrix
from torchmetrics import F1

from torch.utils.data import DataLoader

from utils import utils
from utils import dataloader
from torch import nn
from torch import optim

# Tqdm progress bar
from tqdm import tqdm_notebook

## Mandatory - Load Training Data

Load the training data into memory

In [None]:
# Figure out which device this notebook is being run from
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("You are using device: %s" % device)

# Load the training data from memory. Note this training data was created
# by converting the tfrecord files from the original dataset
training_data = utils.load_pytorch_tensor('./utils/balanced_train_data.pt')
training_label = utils.load_pytorch_tensor('./utils/balanced_train_label.pt')

# training_data = utils.load_pytorch_tensor('./utils/eval_data.pt')
# training_label = utils.load_pytorch_tensor('./utils/eval_label.pt')

# make this multi-classification problem a binary classification problem by
# selecting all labels which contain a given class and making their label
# True, and all others False. In this case, 0 means select the speech class
training_label, count = utils.convert_multiclass_to_binary(0, training_label)
print("Total of " + str(count) + " positive examples out of " + str(training_label.shape[0]) + " samples")

# convert the training data to floating point
training_data = np.float32(training_data)

# split the training data into two parts, one for training and the other for validation
data_train, label_train, data_val, label_val = \
    utils.split_data_train_val(training_data, training_label, percent_val=0.20)

# put on the right device
data_train = torch.from_numpy(data_train).to(device)
label_train = torch.from_numpy(label_train).to(device)
data_val = torch.from_numpy(data_val).to(device)
label_val = torch.from_numpy(label_val).to(device)

# Load the dataset into an iterable object from which batches can be made
train_dataset = dataloader.MusicDataset(data_train, label_train)
val_dataset = dataloader.MusicDataset(data_val, label_val)

## Optional - Simple Linear Model

Let's get a training loop running with this simple linear model, which is nothing but an input, a ReLu, and an output

In [None]:
# Linear Model Hyperparameters

BATCH_SIZE = 32
LEARNING_RATE = 1e-3
HIDDEN_LAYER_SIZE = 64
NUM_EPOCHS = 50

In [None]:
# Linear Model boilerplate code
from models import LinearModel

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

linear_model = LinearModel.LinearModel(10*128, HIDDEN_LAYER_SIZE, 2).to(device)
optimizer = optim.Adam(linear_model.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
criterion = nn.CrossEntropyLoss()

In [None]:
for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(linear_model, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(linear_model, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()
    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

## Optional - Simple Convolutional Model

Using convolution for sound identification is an established method and has been used on this dataset before. The idea is to make the learnable kernel 1-D and stride it across the sound artifacts. In this simple convolutional model, all 10 seconds of sounds will get flattened into one tensor, and a 1-D kernel strided. 

In [None]:
# Simple Convolution Model Hyperparameters

BATCH_SIZE = 16
LEARNING_RATE = 0.001
NUM_EPOCHS = 10

START_KERNEL_SIZE = 3
DROPOUT_RATE = 0.2


# Convolution Boilerplate code
from models import SimpleConvolutionModel
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

convolution_model = SimpleConvolutionModel.SimpleConvolutionModel(START_KERNEL_SIZE, DROPOUT_RATE).to(device)
optimizer = optim.Adam(convolution_model.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
criterion = nn.CrossEntropyLoss()


for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(convolution_model, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(convolution_model, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()
    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

In [None]:
# Convolution 2D Model Hyperparameters

BATCH_SIZE = 64
LEARNING_RATE = 0.001
NUM_EPOCHS = 50

START_KERNEL_SIZE = 2
DROPOUT_RATE = 0.2


# Convolution2D Boilerplate code
from models import ConvolutionModel2D
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

convolution_model_2d = ConvolutionModel2D.ConvolutionModel(START_KERNEL_SIZE, DROPOUT_RATE).to(device)
optimizer = optim.Adam(convolution_model_2d.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)

weights = torch.tensor([1, 3]).to(device)
criterion = nn.BCEWithLogitsLoss(pos_weight=weights)
# criterion = nn.CrossEntropyLoss()


for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(convolution_model_2d, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(convolution_model_2d, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()

    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

In [None]:
# GT Transformer Model Hyperparameters
BATCH_SIZE = 32
LEARNING_RATE = 1e-5
NUM_EPOCHS = 50

# GTT Boilerplate code
from models import GtTransformer
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

input_size = 10*128
gtt = GtTransformer.GtTransformer(input_size, 2, device).to(device)
optimizer = optim.Adam(gtt.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
weights = torch.tensor([1, 3]).to(device)
criterion = nn.BCEWithLogitsLoss(pos_weight=weights)
# criterion = nn.CrossEntropyLoss()


for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(gtt, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(gtt, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()
    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

## Optional - Model Evaluation

Pick a model that was trained above and push the evaluation data through it, and compute the model metrics

In [None]:
# First, load the evaluation data
eval_data = utils.load_pytorch_tensor('./utils/eval_data.pt')
eval_label = utils.load_pytorch_tensor('./utils/eval_label.pt')

# make this multi-classification problem a binary classification problem
eval_label, count = utils.convert_multiclass_to_binary(0, eval_label)
print("Total of " + str(count) + " positive examples out of " + str(eval_label.shape[0]) + " samples")

eval_data = np.float32(eval_data)

# device = 'cpu' # the GTT model needs to execute on GPU for some reason I cannot yet explain...
# put on the right device - for evaluation, we always run from CPU
eval_data = torch.from_numpy(eval_data).to(device)
eval_label = torch.from_numpy(eval_label).to(device)

# Next, pick the model you want to evaluate
# options: linear_model, convolution_model, gtt, convolution_model_2d
model = gtt 


# if model = None
# place the model on the CPU as well
model = model.to(device)

print(eval_data.is_cuda)
print(eval_label.is_cuda)


# push the eval data through the model
eval_dataset = dataloader.MusicDataset(eval_data, eval_label)
eval_loader = DataLoader(eval_dataset, batch_size=1024, shuffle=True)


# declare the metric keepers
conf_mat = ConfusionMatrix(num_classes=2).to(device)
f1_score = F1(num_classes=2).to(device)
total_accuracy = 0.0

total_correct_perc = 0.0
total_correct = 0
total_wrong_perc = 0.0
total_wrong = 0

with torch.no_grad():
  # Get the progress bar
  progress_bar = tqdm_notebook(eval_loader, ascii=True)
  for batch_idx, data in enumerate(progress_bar):
      input_data = data[0].to(device)
      correct_labels = data[1].to(device)

      prediction = model(input_data)
      prediction = torch.softmax(prediction, dim=1)

      preds, indcs = torch.max(prediction, dim=1)

      correct_labels = torch.argmax(correct_labels, dim=1)

      total_correct_perc += preds[indcs == correct_labels].sum()
      total_correct += len(preds[indcs == correct_labels])

      total_wrong_perc += preds[indcs != correct_labels].sum()
      total_wrong += len(preds[indcs != correct_labels])

      conf_mat.update(indcs, correct_labels)
      f1_score.update(indcs, correct_labels)
      accuracy = torchmetrics.functional.accuracy(indcs, correct_labels)
      total_accuracy += accuracy
      exit(1)
      progress_bar.set_description_str(
          "Batch: %d" % (batch_idx + 1))

# avg_acc, conf_mat, f1_score = utils.evaluate_with_metrics(model, eval_loader)
avg_accuracy = total_accuracy / len(eval_loader)
avg_correct_perc = total_correct_perc / total_correct
avg_wrong_perc = total_wrong_perc / total_wrong
final_conf_mat = conf_mat.compute()
final_f1_score = f1_score.compute()

print("Model achieved an average accuracy of %.4f on evaluation data" % avg_accuracy.item())
print("Confusion Matrix:")
print(str(final_conf_mat))
print("F1 Score: %.4f" % final_f1_score.item())
print("Average score when answering correctly:   {}".format(avg_correct_perc))
print("Average score when answering incorrectly: {}".format(avg_wrong_perc))
print(model)