# Lab 2: Cats vs Dogs

In this lab, you will train a convolutional neural network to classify an image
into one of two classes: "cat" or "dog". The code for the neural networks
you train will be written for you, and you are not (yet!) expected
to understand all provided code. However, by the end of the lab,
you should be able to:

1. Understand at a high level the training loop for a machine learning model.
2. Understand the distinction between training, validation, and test data.
3. The concepts of overfitting and underfitting.
4. Investigate how different hyperparameters, such as learning rate and batch size, affect the success of training.
5. Compare an ANN (aka Multi-Layer Perceptron) with a CNN.

In [None]:
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.utils.data.sampler import SubsetRandomSampler
import torchvision.transforms as transforms

## Part 0. Helper Functions

We will be making use of the following helper functions. You will be asked to look
at and possibly modify some of these, but you are not expected to understand all of them.

You should look at the function names and read the docstrings. If you are curious, come back and explore the code *after* making some progress on the lab.

In [None]:
###############################################################################
# Data Loading

def get_relevant_indices(dataset, classes, target_classes):
    """ Return the indices for datapoints in the dataset that belongs to the
    desired target classes, a subset of all possible classes.

    Args:
        dataset: Dataset object
        classes: A list of strings denoting the name of each class
        target_classes: A list of strings denoting the name of desired classes
                        Should be a subset of the 'classes'
    Returns:
        indices: list of indices that have labels corresponding to one of the
                 target classes
    """
    indices = []
    for i in range(len(dataset)):
        # Check if the label is in the target classes
        label_index = dataset[i][1] # ex: 3
        label_class = classes[label_index] # ex: 'cat'
        if label_class in target_classes:
            indices.append(i)
    return indices

def get_data_loader(target_classes, batch_size):
    """ Loads images of cats and dogs, splits the data into training, validation
    and testing datasets. Returns data loaders for the three preprocessed datasets.

    Args:
        target_classes: A list of strings denoting the name of the desired
                        classes. Should be a subset of the argument 'classes'
        batch_size: A int representing the number of samples per batch

    Returns:
        train_loader: iterable training dataset organized according to batch size
        val_loader: iterable validation dataset organized according to batch size
        test_loader: iterable testing dataset organized according to batch size
        classes: A list of strings denoting the name of each class
    """

    classes = ('plane', 'car', 'bird', 'cat',
               'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    ########################################################################
    # The output of torchvision datasets are PILImage images of range [0, 1].
    # We transform them to Tensors of normalized range [-1, 1].
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    # Load CIFAR10 training data
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    # Get the list of indices to sample from
    relevant_indices = get_relevant_indices(trainset, classes, target_classes)

    # Split into train and validation
    np.random.seed(1000) # Fixed numpy random seed for reproducible shuffling
    np.random.shuffle(relevant_indices)
    split = int(len(relevant_indices) * 0.8) #split at 80%

    # split into training and validation indices
    relevant_train_indices, relevant_val_indices = relevant_indices[:split], relevant_indices[split:]
    train_sampler = SubsetRandomSampler(relevant_train_indices)
    train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                               num_workers=1, sampler=train_sampler)
    val_sampler = SubsetRandomSampler(relevant_val_indices)
    val_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                              num_workers=1, sampler=val_sampler)
    # Load CIFAR10 testing data
    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                           download=True, transform=transform)
    # Get the list of indices to sample from
    relevant_test_indices = get_relevant_indices(testset, classes, target_classes)
    test_sampler = SubsetRandomSampler(relevant_test_indices)
    test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                             num_workers=1, sampler=test_sampler)
    return train_loader, val_loader, test_loader, classes

###############################################################################
# Training
def get_model_name(name, batch_size, learning_rate, epoch):
    """ Generate a name for the model consisting of all the hyperparameter values

    Args:
        config: Configuration object containing the hyperparameters
    Returns:
        path: A string with the hyperparameter name and value concatenated
    """
    path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
                                                   batch_size,
                                                   learning_rate,
                                                   epoch)
    return path

def normalize_label(labels):
    """
    Given a tensor containing 2 possible values, normalize this to 0/1

    Args:
        labels: a 1D tensor containing two possible scalar values
    Returns:
        A tensor normalize to 0/1 value
    """
    max_val = torch.max(labels)
    min_val = torch.min(labels)
    norm_labels = (labels - min_val)/(max_val - min_val)
    return norm_labels

def evaluate(net, loader, criterion):
    """ Evaluate the network on the validation set.

     Args:
         net: PyTorch neural network object
         loader: PyTorch data loader for the validation set
         criterion: The loss function
     Returns:
         err: A scalar for the avg classification error over the validation set
         loss: A scalar for the average loss function over the validation set
     """
    total_loss = 0.0
    total_err = 0.0
    total_epoch = 0
    for i, data in enumerate(loader, 0):
        inputs, labels = data
        labels = normalize_label(labels)  # Convert labels to 0/1
        outputs = net(inputs)
        loss = criterion(outputs, labels.float())
        corr = (outputs > 0.0).squeeze().long() != labels
        total_err += int(corr.sum())
        total_loss += loss.item()
        total_epoch += len(labels)
    err = float(total_err) / total_epoch
    loss = float(total_loss) / (i + 1)
    return err, loss

###############################################################################
# Training Curve
def plot_training_curve(path):
    """ Plots the training curve for a model run, given the csv files
    containing the train/validation error/loss.

    Args:
        path: The base path of the csv files produced during training
    """
    import matplotlib.pyplot as plt
    train_err = np.loadtxt("{}_train_err.csv".format(path))
    val_err = np.loadtxt("{}_val_err.csv".format(path))
    train_loss = np.loadtxt("{}_train_loss.csv".format(path))
    val_loss = np.loadtxt("{}_val_loss.csv".format(path))
    plt.title("Train vs Validation Error")
    n = len(train_err) # number of epochs
    plt.plot(range(1,n+1), train_err, label="Train")
    plt.plot(range(1,n+1), val_err, label="Validation")
    plt.xlabel("Epoch")
    plt.ylabel("Error")
    plt.legend(loc='best')
    plt.show()
    plt.title("Train vs Validation Loss")
    plt.plot(range(1,n+1), train_loss, label="Train")
    plt.plot(range(1,n+1), val_loss, label="Validation")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(loc='best')
    plt.show()

## Part 1. Visualizing the Data [7 pt]

We will make use of some of the CIFAR-10 data set, which consists of
colour images of size 32x32 pixels belonging to 10 categories. You can
find out more about the dataset at https://www.cs.toronto.edu/~kriz/cifar.html

For this assignment, we will only be using the cat and dog categories.
We have included code that automatically downloads the dataset the
first time that the main script is run.

In [None]:
# This will download the CIFAR-10 dataset to a folder called "data"
# the first time you run this code.
train_loader, val_loader, test_loader, classes = get_data_loader(
    target_classes=["cat", "dog"],
    batch_size=1) # One image per batch

### Part (a) -- 1 pt

Visualize some of the data by running the code below.
Include the visualization in your writeup.

(You don't need to submit anything else.)

In [None]:
import matplotlib.pyplot as plt

k = 0
for images, labels in train_loader:
    # since batch_size = 1, there is only 1 image in `images`
    image = images[0]
    # place the colour channel at the end, instead of at the beginning
    img = np.transpose(image, [1,2,0])
    # normalize pixel intensity values to [0, 1]
    img = img / 2 + 0.5
    plt.subplot(3, 5, k+1)
    plt.axis('off')
    plt.imshow(img)

    k += 1
    if k > 14:
        break

### Part (b) -- 3 pt

How many training examples do we have for the combined `cat` and `dog` classes?
What about validation examples?
What about test examples?

In [None]:
print("There are",len(train_loader),"training examples for the combined at and dog classes.")
print("There are",len(val_loader),"validation examples for the combined at and dog classes.")
print("There are",len(test_loader),"test examples for the combined at and dog classes.")

There are 8000 training examples for the combined at and dog classes.
There are 2000 validation examples for the combined at and dog classes.
There are 2000 test examples for the combined at and dog classes.


### Part (c) -- 3pt

Why do we need a validation set when training our model? What happens if we judge the
performance of our models using the training set loss/error instead of the validation
set loss/error?



> We need a validation set when training our model in order to avoid
overfitting to the testing data and underperforming on unseen data. We need a way to verify the outputs of our model and adjust the configuration accordingly. After being trained, the validation set loss/error is used to determine which hyperparameters need to be tuned to best improve the model.
Judging the performance of our model using the training set loss/error
instead of the validation set loss/error can be problematic in the case where
the training set loss/error is very low, even though the model is not ideal. It
needs to be tested on data it hasn't seen before, or else overfitting will
happen and the model will perform poorly on new data. Using the validation set
however avoids these problems as the model's performance is reflected in the
loss/error being low. Thus we can conclude the model is performing well.

## Part 2. Training [15 pt]

We define two neural networks, a `LargeNet` and `SmallNet`.
We'll be training the networks in this section.

You won't understand fully what these networks are doing until
the next few classes, and that's okay. For this assignment, please
focus on learning how to train networks, and how hyperparameters affect
training.

In [None]:
class LargeNet(nn.Module):
    def __init__(self):
        super(LargeNet, self).__init__()
        self.name = "large"
        self.conv1 = nn.Conv2d(3, 5, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(5, 10, 5)
        self.fc1 = nn.Linear(10 * 5 * 5, 32)
        self.fc2 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 10 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

In [None]:
class SmallNet(nn.Module):
    def __init__(self):
        super(SmallNet, self).__init__()
        self.name = "small"
        self.conv = nn.Conv2d(3, 5, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(5 * 7 * 7, 1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv(x)))
        x = self.pool(x)
        x = x.view(-1, 5 * 7 * 7)
        x = self.fc(x)
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

In [None]:
small_net = SmallNet()
large_net = LargeNet()

### Part (a) -- 2pt

The methods `small_net.parameters()` and `large_net.parameters()`
produces an iterator of all the trainable parameters of the network.
These parameters are torch tensors containing many scalar values.

We haven't learned how how the parameters in these high-dimensional
tensors will be used, but we should be able to count the number
of parameters. Measuring the number of parameters in a network is
one way of measuring the "size" of a network.

What is the total number of parameters in `small_net` and in
`large_net`? (Hint: how many numbers are in each tensor?)

In [None]:
for param in small_net.parameters():
    print(param.shape)

print("\nThe number of parameters in small_net is 5*3*3*3 + 5 + 1*245 + 1 = 386.\n")

for param in large_net.parameters():
    print(param.shape)

print("\nThe number of parameters in large_net is 5*3*3*3 + 5 + 10*5*5*5 + 10 + 32*250 + 32 + 1*32 + 1 = 9705.")

torch.Size([5, 3, 3, 3])
torch.Size([5])
torch.Size([1, 245])
torch.Size([1])

The number of parameters in small_net is 5*3*3*3 + 5 + 1*245 + 1 = 386.

torch.Size([5, 3, 5, 5])
torch.Size([5])
torch.Size([10, 5, 5, 5])
torch.Size([10])
torch.Size([32, 250])
torch.Size([32])
torch.Size([1, 32])
torch.Size([1])

The number of parameters in large_net is 5*3*3*3 + 5 + 10*5*5*5 + 10 + 32*250 + 32 + 1*32 + 1 = 9705.


### The function train_net

The function `train_net` below takes an untrained neural network (like `small_net` and `large_net`) and
several other parameters. You should be able to understand how this function works.

In [None]:
def train_net(net, batch_size=64, learning_rate=0.01, num_epochs=30):
    ########################################################################
    # Train a classifier on cats vs dogs
    target_classes = ["cat", "dog"]
    ########################################################################
    # Fixed PyTorch random seed for reproducible result
    torch.manual_seed(1000)
    ########################################################################
    # Obtain the PyTorch data loader objects to load batches of the datasets
    train_loader, val_loader, test_loader, classes = get_data_loader(
            target_classes, batch_size)
    ########################################################################
    # Define the Loss function and optimizer
    # The loss function will be Binary Cross Entropy (BCE). In this case we
    # will use the BCEWithLogitsLoss which takes unnormalized output from
    # the neural network and scalar label.
    # Optimizer will be SGD with Momentum.
    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)
    ########################################################################
    # Set up some numpy arrays to store the training/test loss/erruracy
    train_err = np.zeros(num_epochs)
    train_loss = np.zeros(num_epochs)
    val_err = np.zeros(num_epochs)
    val_loss = np.zeros(num_epochs)
    ########################################################################
    # Train the network
    # Loop over the data iterator and sample a new batch of training data
    # Get the output from the network, and optimize our loss function.
    start_time = time.time()
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        total_train_loss = 0.0
        total_train_err = 0.0
        total_epoch = 0
        for i, data in enumerate(train_loader, 0):
            # Get the inputs
            inputs, labels = data
            labels = normalize_label(labels) # Convert labels to 0/1
            # Zero the parameter gradients
            optimizer.zero_grad()
            # Forward pass, backward pass, and optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels.float())
            loss.backward()
            optimizer.step()
            # Calculate the statistics
            corr = (outputs > 0.0).squeeze().long() != labels
            total_train_err += int(corr.sum())
            total_train_loss += loss.item()
            total_epoch += len(labels)
        train_err[epoch] = float(total_train_err) / total_epoch
        train_loss[epoch] = float(total_train_loss) / (i+1)
        val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)
        print(("Epoch {}: Train err: {}, Train loss: {} |"+
               "Validation err: {}, Validation loss: {}").format(
                   epoch + 1,
                   train_err[epoch],
                   train_loss[epoch],
                   val_err[epoch],
                   val_loss[epoch]))
        # Save the current model (checkpoint) to a file
        model_path = get_model_name(net.name, batch_size, learning_rate, epoch)
        torch.save(net.state_dict(), model_path)
    print('Finished Training')
    end_time = time.time()
    elapsed_time = end_time - start_time
    print("Total time elapsed: {:.2f} seconds".format(elapsed_time))
    # Write the train/test loss/err into CSV file for plotting later
    epochs = np.arange(1, num_epochs + 1)
    np.savetxt("{}_train_err.csv".format(model_path), train_err)
    np.savetxt("{}_train_loss.csv".format(model_path), train_loss)
    np.savetxt("{}_val_err.csv".format(model_path), val_err)
    np.savetxt("{}_val_loss.csv".format(model_path), val_loss)

### Part (b) -- 1pt

The parameters to the function `train_net` are hyperparameters of our neural network.
We made these hyperparameters easy to modify so that we can tune them later on.

What are the default values of the parameters `batch_size`, `learning_rate`,
and `num_epochs`?



> The default values are batch_Size = 64, learning_Rate = 0.01, and num_epochs = 30.



### Part (c) -- 3 pt

What files are written to disk when we call `train_net` with `small_net`, and train for 5 epochs? Provide a list
of all the files written to disk, and what information the files contain.

In [None]:
train_net(small_net, 64, 0.01, 5)

The files written to disk when we call train_net with small_net, and train for 5 epochs are:
*   model_small_bs64_lr0.01_epoch0 contains model checkpoint for epoch 1
*   model_small_bs64_lr0.01_epoch1 contains model checkpoint for epoch 2
*   model_small_bs64_lr0.01_epoch2 contains model checkpoint for epoch 3
*   model_small_bs64_lr0.01_epoch3 contains model checkpoint for epoch 4
*   model_small_bs64_lr0.01_epoch4 contains model checkpoint for epoch 5
*   model_small_bs64_lr0.01_epoch4_train_err.csv contains training error per epoch
*   model_small_bs64_lr0.01_epoch4_train_loss.csv contains training loss per epoch
*   model_small_bs64_lr0.01_epoch4_val_err.csv contains validation error per epoch
*   model_small_bs64_lr0.01_epoch4_val_loss.csv contains validation loss per epoch

### Part (d) -- 2pt

Train both `small_net` and `large_net` using the function `train_net` and its default parameters.
The function will write many files to disk, including a model checkpoint (saved values of model weights)
at the end of each epoch.

If you are using Google Colab, you will need to mount Google Drive
so that the files generated by `train_net` gets saved. We will be using
these files in part (d).
(See the Google Colab tutorial for more information about this.)

Report the total time elapsed when training each network. Which network took longer to train?
Why?

In [None]:
# Since the function writes files to disk, you will need to mount
# your Google Drive. If you are working on the lab locally, you
# can comment out this code.

from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
train_net(small_net)
train_net(large_net)


> The total time elapsed when training small_net was 176.97 seconds.

> The total time elapsed when training large_net was 183.91 seconds.

> large_net took longer to traing because it has more parameters which need to be updated and a higher model complexity compared to small_net.

### Part (e) - 2pt

Use the function `plot_training_curve` to display the trajectory of the
training/validation error and the training/validation loss.
You will need to use the function `get_model_name` to generate the
argument to the `plot_training_curve` function.

Do this for both the small network and the large network. Include both plots
in your writeup.

In [None]:
model_path = get_model_name("small", batch_size=64, learning_rate=0.01, epoch=29)
plot_training_curve(model_path)

In [None]:
model_path = get_model_name("large", batch_size=64, learning_rate=0.01, epoch=29)
plot_training_curve(model_path)

### Part (f) - 5pt

Describe what you notice about the training curve.
How do the curves differ for `small_net` and `large_net`?
Identify any occurences of underfitting and overfitting.



>  In the training curve for small_net, underfitting can be seen within the first few epochs. The small_net training error decreases rapidly at lower epochs compared to large_net. Both training error/loss and validation error/loss get smaller as a higher number of epochs is approached. There is a lot more fluctuation in the validation error/loss curve than the training error/loss curve.



> The training curve for large_net shows that training error/loss curve decreases at a steadier rate as the number of epochs increases. The validation error/loss curve flattens out around half way through. This is a sign of overfitting as after around 15 epochs, the model is overfitted to the training data. The curves overall have a lot less noise/fluctuations compared to small_net.

## Part 3. Optimization Parameters [12 pt]

For this section, we will work with `large_net` only.

### Part (a) - 3pt

Train `large_net` with all default parameters, except set `learning_rate=0.001`.
Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of *lowering* the learning rate.

In [None]:
# Note: When we re-construct the model, we start the training
# with *random weights*. If we omit this code, the values of
# the weights will still be the previously trained values.
large_net = LargeNet()
train_net(large_net, learning_rate=0.001)
model_path = get_model_name("large", batch_size=64, learning_rate=0.001, epoch=29)
plot_training_curve(model_path)



> The model took approximately the same amount of time to train, about 3 seconds shorter than the original. Lowering the learning rate to 0.001 means that the size of each step was smaller. This results in a slower decrease in error/loss which makes it so that the model does not overfit at larger numbers of epochs. Small fluctuations in error/loss curves.


### Part (b) - 3pt

Train `large_net` with all default parameters, except set `learning_rate=0.1`.
Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of *increasing* the learning rate.

In [None]:
large_net = LargeNet()
train_net(large_net, learning_rate=0.1)
model_path = get_model_name("large", batch_size=64, learning_rate=0.1, epoch=29)
plot_training_curve(model_path)



> The model took more time to train, about 9 seconds more than the original and 12 seconds more than the previous with a learning rate of 0.001. Increasing the learning rate to 0.1 means the size of each step was larger. This results in a faster decrease in the error/loss and the model begins to overfit at a lower number of epochs. Large fluctuations in both error/loss curves.

### Part (c) - 3pt

Train `large_net` with all default parameters, including with `learning_rate=0.01`.
Now, set `batch_size=512`. Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of *increasing* the batch size.

In [None]:
large_net = LargeNet()
train_net(large_net, batch_size=512)
model_path = get_model_name("large", batch_size=512, learning_rate=0.01, epoch=29)
plot_training_curve(model_path)



> The model took less time to train, about 26 seconds less than the original, 23 seconds less than the previous with a learning rate of 0.001, and 35 seconds less than the previous with a learning rate of 0.1. Increasing the batch size means that less iterations occur per epoch. This resulted in a steady decrease of training and validation error/loss as the number of epochs increased. The model will be able to learn patterns over generalized data better since it has more access to data.

### Part (d) - 3pt

Train `large_net` with all default parameters, including with `learning_rate=0.01`.
Now, set `batch_size=16`. Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of *decreasing* the batch size.

In [None]:
large_net = LargeNet()
train_net(large_net, batch_size=16)
model_path = get_model_name("large", batch_size=16, learning_rate=0.01, epoch=29)
plot_training_curve(model_path)



> The model took more time to train, about 87 seconds more than the original, 90 seconds more than the previous with a learning rate of 0.001, 78 seconds less than the previous with a learning rate of 0.1, and 113 less than the previous with a batch size of 512. Decreasing the batch size means that more iterations occur per epoch. This resulted in a very low training error/loss and resulted in overfitting as validatiion error plateaus as the number of epochs increases and validation loss increases as the number of epochs increase. This means the model will perform well on training data but not well on unseen data.



## Part 4. Hyperparameter Search [6 pt]

### Part (a) - 2pt

Based on the plots from above, choose another set of values for the hyperparameters (network, batch_size, learning_rate)
that you think would help you improve the validation accuracy. Justify your choice.



> Based on the plots from above I think the following values for the hyperparameters would help improve validation accuracy:


> Netwok: large_net since it performed well in part 2e with few fluctuations/noise, one does have to be cautious of overfitting at a large number of epochs.


> Batch size: 256 since results from 3a and 3c showed that increasing the batch size helped to reduce overfitting at an increased number of epochs. Steady decrease in validation/training error/loss as number of epochs increase. However I will am reducing from a batch size of 512 to 256 to improve runtime.


> Learning rate: 0.001 since results from 3a and 3c showed that decreasing the learning rate helped to reduce overfitting at an increased number of epochs. Steady decrease in validation/training error/loss as number of epochs increase.


> epochs: 30 since overall the error was small for a larger number of epochs. Could be icnreased a little bit more, maybe by 10 but need to be careful of overfitting.

### Part (b) - 1pt

Train the model with the hyperparameters you chose in part(a), and include the training curve.

In [None]:
large_net = LargeNet()
train_net(large_net, batch_size=256, learning_rate=0.001)
model_path_large = get_model_name("large", batch_size=256, learning_rate=0.001, epoch=29)
plot_training_curve(model_path_large)

### Part (c) - 2pt

Based on your result from Part(a), suggest another set of hyperparameter values to try.
Justify your choice.



> Based on the results from Part (a), I will keep the network the same as large_net and keep the batch size as 256, as increasing the batch size helped with overfitting, however it also reduced the rate at which the error decreased. To compensate for this, I will increase the learning rate to 0.05. This will then cause some overfitting at a larger number of epochs, however reducing the number of epochs to 25 will reduce this.


### Part (d) - 1pt

Train the model with the hyperparameters you chose in part(c), and include the training curve.

In [None]:
large_net = LargeNet()
train_net(large_net, batch_size=256, learning_rate=0.005, num_epochs=25)
model_path_large = get_model_name("large", batch_size=256, learning_rate=0.005, epoch=24)
plot_training_curve(model_path_large)

## Part 5. Evaluating the Best Model [15 pt]


### Part (a) - 1pt

Choose the **best** model that you have so far. This means choosing the best model checkpoint,
including the choice of `small_net` vs `large_net`, the `batch_size`, `learning_rate`,
**and the epoch number**.

Modify the code below to load your chosen set of weights to the model object `net`.

In [None]:
net = LargeNet()
train_net(net, batch_size=256, learning_rate=0.005, num_epochs=25)
model_path = get_model_name(net.name, batch_size=256, learning_rate=0.005, epoch=24)
state = torch.load(model_path)
net.load_state_dict(state)

### Part (b) - 2pt

Justify your choice of model from part (a).

> Network: large_net based on results from 2e and 2f

> Batch size: 25 from results in 3a and 4c/4d because a large batch size reduces overfitting at a larger number of epochs and improves training time

> Learning rate: 0.005 so the error/loss steadily decreases and avoids overfitting

> Epochs: 25 since the model starts overfitting at a large number of epochs.


### Part (c) - 2pt

Using the code in Part 0, any code from lecture notes, or any code that you write,
compute and report the **test classification error** for your chosen model.

In [None]:
# If you use the `evaluate` function provided in part 0, you will need to
# set batch_size > 1
train_loader, val_loader, test_loader, classes = get_data_loader(
    target_classes=["cat", "dog"],
    batch_size=256)
test_error, test_loss = evaluate(net, test_loader, nn.BCEWithLogitsLoss())
print("The test classification error is", test_error, "and the test classification loss is", test_loss)

Files already downloaded and verified
Files already downloaded and verified
The test classification error is 0.356 and the test classification loss is 0.6410455703735352


### Part (d) - 3pt

How does the test classification error compare with the **validation error**?
Explain why you would expect the test error to be *higher* than the validation error.

In [None]:
train_loader, val_loader, test_loader, classes = get_data_loader(
    target_classes=["cat", "dog"],
    batch_size=256)
val_error, val_loss = evaluate(net, val_loader, nn.BCEWithLogitsLoss())
print("The validation error is", val_error, "and the validation loss is", val_loss)

Files already downloaded and verified
Files already downloaded and verified
The validation error is 0.372 and the validation loss is 0.638019360601902




> The test classification error is slightly lower than the validation error. This suggests that the model performed a little bit better on the test data set than the classification dataset. This is unexpected since the model's parameters do not get updated after training on the test data, I would expect the test error to be higher than the validation error. However, since I was adjusting the hyperparameters with the itnention of maximizing validation accuracy, this does not lead to the best testing accuracy. Thus explaining why the validation error is a bit higher than the classification error.



### Part (e) - 2pt

Why did we only use the test data set at the very end?
Why is it important that we use the test data as little as possible?



> It is important that we put test data away and don’t look at it so we, as a the designer, don’t have cognitive bias about the test data. We save the test data till the very end because it is used to give as an accurate assessment of our model's performance. Building the model and adjusting the parameters based on the testing data increases the bias towards the test data and causes the model to overfit and start memorizing the answers.



### Part (f) - 5pt

How does the your best CNN model compare with an 2-layer ANN model (no convolutional layers) on classifying cat and dog images. You can use a 2-layer ANN architecture similar to what you used in Lab 1. You should explore different hyperparameter settings to determine how well you can do on the validation dataset. Once satisified with the performance, you may test it out on the test data.

Hint: The ANN in lab 1 was applied on greyscale images. The cat and dog images are colour (RGB) and so you will need to flatted and concatinate all three colour layers before feeding them into an ANN.

In [None]:
class Pigeon(nn.Module):
  def __init__(self):
    super(Pigeon, self).__init__()
    self.name = "pigeon"
    self.layer1 = nn.Linear(32 * 32 * 3, 30)
    self.layer2 = nn.Linear(30, 1)

  def forward(self, img):
    flattened = img.view(-1, 32 * 32 * 3)
    activation1 = self.layer1(flattened)
    activation1 = F.relu(activation1)
    activation2 = self.layer2(activation1)
    activation2 = activation2.squeeze(1)
    return activation2

Testing:

In [None]:
pigeon = Pigeon()
train_net(pigeon, batch_size=512, learning_rate=0.001)
model_path = get_model_name("pigeon", batch_size=512, learning_rate=0.001, epoch=29)
plot_training_curve(model_path)

Training:

In [None]:
train_loader, val_loader, test_loader, classes = get_data_loader(
  target_classes=["cat", "dog"],
  batch_size=512)
criterion = nn.BCEWithLogitsLoss()
err, loss = evaluate(pigeon, test_loader, criterion)
print("The test classification error is", err, "and the test loss is", loss)

Files already downloaded and verified
Files already downloaded and verified
The test classification error is 0.3765 and the test loss is 0.6487362831830978




> The 2-layer ANN model has a much higher test classification error/loss compared to that of CNN model. Thus the CNN model is much better suited for classifying cats versus dogs since it fits better with the data.

