# CS7643 Final Project

This notebook is meant to go through the different models and experiment on our dataset containing 21782 training samples on different sounds. 

## Group Members
Zach Halaby
Michael Marzec
Shayan Mukhtar

## Dataset

The dataset was obtained under Google's GPL license terms from the following site: https://research.google.com/audioset/download.html

## Instructions
Cells with a "Mandatory" in their title must be run. Cells with a title stating that running is optional do not have to be run.

## Mandatory - Imports

Let's start by importing the necessary packages:

In [7]:
import os.path

import torch
import torchmetrics
import tfrecord
import numpy as np
from os import walk

from torch.utils.data import DataLoader

from models import LinearModel
from models import SimpleConvolutionModel
from utils import utils
from utils import dataloader
from torch import nn
from torch import optim

# Tqdm progress bar
from tqdm import tqdm_notebook

## Mandatory - Load Training Data

Load the training data into memory

In [8]:
# Figure out which device this notebook is being run from
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("You are using device: %s" % device)

# Load the training data from memory. Note this training data was created
# by converting the tfrecord files from the original dataset
training_data = utils.load_pytorch_tensor('./utils/balanced_train_data.pt')
training_label = utils.load_pytorch_tensor('./utils/balanced_train_label.pt')

# make this multi-classification problem a binary classification problem by
# selecting all labels which contain a given class and making their label
# True, and all others False. In this case, 0 means select the speech class
training_label, count = utils.convert_multiclass_to_binary(0, training_label)
print("Total of " + str(count) + " positive examples out of " + str(training_label.shape[0]) + " samples")

# convert the training data to floating point
training_data = np.float32(training_data)

# split the training data into two parts, one for training and the other for validation
data_train, label_train, data_val, label_val = utils.split_data_train_val(training_data, training_label)

# Load the dataset into an iterable object from which batches can be made
train_dataset = dataloader.MusicDataset(data_train, label_train)
val_dataset = dataloader.MusicDataset(data_val, label_val)

You are using device: cpu
Total of 5668 positive examples out of 21782 samples


## Optional - Simple Linear Model

Let's get a training loop running with this simple linear model, which is nothing but an input, a ReLu, and an output

In [9]:
# Linear Model Hyperparameters

BATCH_SIZE = 32
LEARNING_RATE = 1e-3
HIDDEN_LAYER_SIZE = 64
NUM_EPOCHS = 10

In [10]:
# Linear Model boilerplate code

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

linear_model = LinearModel.LinearModel(10*128, HIDDEN_LAYER_SIZE, 2)
optimizer = optim.Adam(linear_model.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
criterion = nn.CrossEntropyLoss()

In [11]:
for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(linear_model, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(linear_model, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()
    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

-----------------------------------
Epoch 1
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 1.9009. Validation Loss: 0.5833. 
-----------------------------------
Epoch 2
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5787. Validation Loss: 0.5707. 
-----------------------------------
Epoch 3
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5746. Validation Loss: 0.5698. 
-----------------------------------
Epoch 4
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5745. Validation Loss: 0.5698. 
-----------------------------------
Epoch 5
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5745. Validation Loss: 0.5671. 
-----------------------------------
Epoch 6
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5745. Validation Loss: 0.5697. 
-----------------------------------
Epoch 7
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5745. Validation Loss: 0.5684. 
-----------------------------------
Epoch 8
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5744. Validation Loss: 0.5672. 
-----------------------------------
Epoch 9
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5746. Validation Loss: 0.5697. 
-----------------------------------
Epoch 10
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5746. Validation Loss: 0.5697. 


## Optional - Simple Convolutional Model

Using convolution for sound identification is an established method and has been used on this dataset before. The idea is to make the learnable kernel 1-D and stride it across the sound artifacts. In this simple convolutional model, all 10 seconds of sounds will get flattened into one tensor, and a 1-D kernel strided. 

In [12]:
# Simple Convolution Model Hyperparameters

BATCH_SIZE = 32
LEARNING_RATE = 1e-3
NUM_EPOCHS = 10

START_KERNEL_SIZE = 3
DROPOUT_RATE = 0.2

In [13]:
# Convolution Boilerplate code
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)

convolution_model = SimpleConvolutionModel.SimpleConvolutionModel(START_KERNEL_SIZE, DROPOUT_RATE)
optimizer = optim.Adam(convolution_model.parameters(), lr=LEARNING_RATE)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
criterion = nn.CrossEntropyLoss()

In [14]:
for epoch_idx in range(NUM_EPOCHS):
    print("-----------------------------------")
    print("Epoch %d" % (epoch_idx+1))
    print("-----------------------------------")
    
    train_loss, avg_train_loss = utils.train(convolution_model, train_loader, optimizer, criterion)
    scheduler.step(train_loss)

    val_loss, avg_val_loss = utils.evaluate(convolution_model, val_loader, criterion)

    avg_train_loss = avg_train_loss.item()
    avg_val_loss = avg_val_loss.item()
    print("Training Loss: %.4f. Validation Loss: %.4f. " % (avg_train_loss, avg_val_loss))

-----------------------------------
Epoch 1
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 100.1410. Validation Loss: 71.8802. 
-----------------------------------
Epoch 2
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 32.9818. Validation Loss: 16.7475. 
-----------------------------------
Epoch 3
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 7.5520. Validation Loss: 2.7286. 
-----------------------------------
Epoch 4
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 1.2569. Validation Loss: 0.7304. 
-----------------------------------
Epoch 5
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.5849. Validation Loss: 0.6116. 
-----------------------------------
Epoch 6
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.4816. Validation Loss: 0.4813. 
-----------------------------------
Epoch 7
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.4389. Validation Loss: 0.5003. 
-----------------------------------
Epoch 8
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.4255. Validation Loss: 0.4764. 
-----------------------------------
Epoch 9
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.4125. Validation Loss: 0.4716. 
-----------------------------------
Epoch 10
-----------------------------------


  0%|          | 0/545 [00:00<?, ?it/s]

  0%|          | 0/137 [00:00<?, ?it/s]

Training Loss: 0.4371. Validation Loss: 0.5192. 


## Optional - Model Evaluation

Pick a model that was trained above and push the evaluation data through it, and compute the model metrics

In [16]:
# First, load the evaluation data
eval_data = utils.load_pytorch_tensor('./utils/eval_data.pt')
eval_label = utils.load_pytorch_tensor('./utils/eval_label.pt')

# make this multi-classification problem a binary classification problem
eval_label, count = utils.convert_multiclass_to_binary(0, eval_label)
print("Total of " + str(count) + " positive examples out of " + str(eval_label.shape[0]) + " samples")

eval_data = np.float32(eval_data)

# Next, pick the model you want to evaluate
# options: linear_model, convolution_model, 
model = linear_model

# push the eval data through the model
eval_dataset = dataloader.MusicDataset(eval_data, eval_label)
eval_loader = DataLoader(eval_dataset, batch_size=BATCH_SIZE, shuffle=True)

avg_acc = utils.evaluate_with_metrics(model, eval_loader)

print("Model achieved an average accuracy of %.4f on evaluation data" % avg_acc.item())


Total of 5233 positive examples out of 19976 samples


  0%|          | 0/625 [00:00<?, ?it/s]

Model achieved an average accuracy of 0.7712 on evaluation data
