# CSE 576 (Spring 2020) Homework 4

Welcome friends, it's time for Deep Learning with PyTorch! This homework might need a longer running time. 
Keep this in mind and start early.

PyTorch is a deep learning framework for fast, flexible experimentation. We are going to use it to train our classifiers.

For this homework you need to turn in this file `hw4.ipynb` after running your results and answering questions in-line.

**Notes**: 
 - This assignment was designed to be used with Google Colab, but feel free to set up your own environment if you wish. Just bear in mind that we cannot provide support for custom environments.
 - Feel free to create new cells as needed, but please **do not delete existing cells**.

Before you get started, we suggest you do the [PyTorch tutorial first](https://github.com/param087/Pytorch-tutorial-on-Google-colab).

You should at least do the 60 Minute Blitz up until "Training a Classifier".

**How to use this notebook:**
 - Each cell with a grey background is executable.
 - They can be executed by pressing the "Play" button or by hitting `Shift+Enter`
 - Cells can be executed out of order.
 - You can add new cells by clicking on the `+ Code` button in the header.
 - Check out this [Colab Introduction](https://colab.research.google.com/notebooks/intro.ipynb#scrollTo=5fCEDCU_qrC0) if you're having trouble.

## Setup

This will set up the environment without GPUs. This is the recommended setup.

In [0]:
! pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html 
! pip install tqdm matplotlib

### With GPUs
If you're feeling adventurous you can use GPUs to accelerate training. Follow the following steps. Just note that GPUs might not be available. The course staff also can't provide support for GPU-related issues so if you're having trouble please just use the CPU runtime.

 1. Go to Runtime > Change runtime type and select 'GPU'
 2. Restart the Runtime, uncomment the commands below and run them.

In [0]:
# Install the necessary packages

# ! pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html 
# ! pip install tqdm matplotlib

In [0]:
# Set this flag.
use_gpu = False

### Check that things are working.

In [0]:
# Make sure things work.

import torch
torch.zeros(10)

 ## Initialize Datasets

 This code defines the data loaders that will be used to train and test our networks. It also defines data augmentation functions.

In [0]:
import torch
import torchvision
from torchvision import transforms

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

default_train_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

default_test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), 
])


def get_train_loader(batch_size, transform=default_train_transform):
    trainset = torchvision.datasets.CIFAR10(
        root='./data', 
        train=True, 
        download=True, 
        transform=transform)
    return torch.utils.data.DataLoader(
        trainset, batch_size=batch_size, shuffle=True, num_workers=4)


def get_test_loader(batch_size, transform=default_test_transform):
    testset = torchvision.datasets.CIFAR10(
        root='./data', 
        train=False, 
        download=True, 
        transform=transform) 
    return torch.utils.data.DataLoader(
        testset, batch_size=batch_size, shuffle=False, num_workers=4)


# This downloads the datasets.
get_train_loader(1)
get_test_loader(1);

## Define code that trains and tests code.

This code will train your model. Feel free to read the code below, but we suggest you don't modify it unless you know what you're doing.

In [0]:
from matplotlib import pyplot as plt
from tqdm.auto import tqdm


def train(net, loader, optimizer, criterion, epoch, use_gpu=False):
    running_loss = 0.0
    total_loss = 0.0

    if use_gpu:
        net = net.cuda()
    else:
        net = net.cpu()

    pbar = tqdm(loader)
    for i, data in enumerate(pbar):
        # get the inputs
        inputs, labels = data
        if use_gpu:
            inputs, labels = inputs.cuda(), labels.cuda()

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        total_loss += loss.item()
        pbar.set_description(f"[epoch {epoch+1}] loss = {running_loss/(i+1):.03f}")
    
    average_loss = total_loss / (i + 1)
    tqdm.write(f"Epoch {epoch} summary -- loss = {average_loss:.03f}")
    
    return average_loss


def test(net, loader, tag='', use_gpu=False):
    correct = 0
    total = 0
    net = net.cuda() if use_gpu else net.cpu()
    with torch.no_grad():
        for data in loader:
            images, labels = data
            if use_gpu:
                images = images.cuda()
                labels = labels.cuda()
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    average_accuracy = correct/total
    tqdm.write(f'{tag} accuracy of the network: {100*average_accuracy:.02f}%')

    class_correct = list(0. for i in range(10))
    class_total = list(0. for i in range(10))
    with torch.no_grad():
        for data in loader:
            images, labels = data
            if use_gpu:
                images = images.cuda()
                labels = labels.cuda()
            outputs = net(images)
            _, predicted = torch.max(outputs, 1)
            c = (predicted == labels).squeeze()
            for i in range(len(labels)):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1


    for i in range(10):
        tqdm.write(f'{tag} accuracy of {classes[i]} = {100*class_correct[i]/class_total[i]:.02f}%')
    
    return average_accuracy


def train_network(net, lr, epochs, batch_size, criterion=None,
                  train_transforms=default_train_transform, 
                  use_gpu=use_gpu): 
    optimizer = optim.SGD(net.parameters(), lr=lr)
    if criterion is None:
        # Note that CrossEntropyLoss has the Softmax built in!
        # This is good for numerical stability. 
        # Read: https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss
        criterion = nn.CrossEntropyLoss()
    train_loader = get_train_loader(batch_size)
    test_loader = get_test_loader(batch_size)
    
    train_losses = []
    train_accuracies = []
    test_accuracies = []

    for epoch in range(epochs):  # loop over the dataset multiple times
        train_loss = train(net, train_loader, optimizer, criterion, epoch, use_gpu=use_gpu)
        train_accuracy = test(net, train_loader, 'Train', use_gpu=use_gpu)
        test_accuracy = test(net, test_loader, 'Test', use_gpu=use_gpu)
        train_losses.append(train_loss)
        train_accuracies.append(train_accuracy)
        test_accuracies.append(test_accuracy)
    
    return train_losses, train_accuracies, test_accuracies
    

def plot_results(train_losses, train_accuracies, test_accuracies):
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    axes[0].plot(train_losses)
    axes[0].set_title('Training Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    
    axes[1].plot(train_accuracies)
    axes[1].set_title('Training Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy')
    
    axes[2].plot(test_accuracies)
    axes[2].set_title('Testing Accuracy')
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('Accuracy')



## 2.1. Training a classifier using only one fully connected Layer

Implement a model to classify the images from Cifar-10 into ten categories using just one fully connected layer (remember that fully connected layers are called Linear in PyTorch).

If you are new to PyTorch you may want to check out the tutorial on MNIST [here](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py).

Fill in the code for LazyNet here.

**Hints:**
 - Note that `nn.CrossEntropyLoss` has the Softmax built in for numerical stability. This means that the output layer of your network should be linear and not contain a Softmax.
 - You can use the `view()` function to flatten your input image to a vector e.g., if `x` is a `(100,3,4,4)` tensor then `x.view(-1, 3*4*4)` will flatten it into a vector of size `48`.

In [0]:
from torch import nn
from torch import optim

class LazyNet(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Define model here

    def forward(self, x):
        # TODO: Implement forward pass for LazyNet
        return x

net = LazyNet()
net

#### Run the model for 50 epochs and report the plots and accuracies.

In [0]:
train_losses, train_accuracies, test_accuracies = train_network(
    net, 
    criterion=nn.CrossEntropyLoss(),
    lr=0.01, 
    epochs=50, 
    batch_size=1024)

In [0]:
plot_results(train_losses, train_accuracies, test_accuracies)

## 2.2. Training a classifier using multiple fully connected layers ##

Implement a model for the same classification task using multiple fully connected layers.

Start with a fully connected layer that maps the data from image size (32 * 32 * 3) to a vector of size 120, followed by another fully connected that reduces the size to 84 and finally a layer that maps the vector of size 84 to 10 classes.

Use any activation you want.

Fill in the code for BoringNet below.

In [0]:
class BoringNet(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Define model here

    def forward(self, x):
        # TODO: Implement forward pass for LazyNet
        return x

net = BoringNet()
net

### Run the model for 50 epochs and report the plots and accuracies.

In [0]:
train_losses, train_accuracies, test_accuracies = train_network(
    net, 
    criterion=nn.CrossEntropyLoss(),
    lr=0.01, 
    epochs=50, 
    batch_size=1024)

In [0]:
plot_results(train_losses, train_accuracies, test_accuracies)

### Question

Try training this model with and without activations. How does the activations (such as ReLU) affect the training process and why?


In [0]:
# Your answer here


## 2.3. Training a classifier using convolutions ##

Implement a model using convolutional, pooling and fully connected layers.

You are free to choose any parameters for these layers (we would like you to play around with some values).

Fill in the code for CoolNet below. Explain why you have chosen these layers and how they affected the performance. Analyze the behavior of your model and report the plots in your report file.

In [0]:
class CoolNet(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Define model here

    def forward(self, x):
        # TODO: Implement forward pass for LazyNet
        
        return x

net = CoolNet()
net

### Run the model for 50 epochs and report the plots and accuracies 

In [0]:
train_losses, train_accuracies, test_accuracies = train_network(
    net, 
    criterion=nn.CrossEntropyLoss(),
    lr=0.01, 
    epochs=50, 
    batch_size=1024)

In [0]:
plot_results(train_losses, train_accuracies, test_accuracies)

### 2.3.1. How does batch size affect training?

Try using three different values for batch size. How do these values affect training and why?

In [0]:
# train_losses, train_accuracies, test_accuracies = train_network(net, lr=0.01, epochs=50, batch_size=1024)

In [0]:
# plot_results(train_losses, train_accuracies, test_accuracies)

### 2.3.2. How does learning rate work?

When you are trying to train a neural network it is really hard to choose a proper learning rate. 

Try to train your model with different learning rates and plot the training accuracy, test accuracy and loss and compare the training progress for learning rates = 10, 0.1, 0.01, 0.0001.

Analyze the results and choose the best one. Why did you choose this value?

During training it is often useful to reduce learning rate as the training progresses (why?). 
Fill in `adjust_learning_rate` in `BaseModel` in `models.py` to reduce the learning rate by 10% every 50 epoch and observe the behavior of network for 150 epochs. 
Turn in your plots in `Report.pdf`.


In [0]:
# TODO: Your code here

### 2.3.3. Data Augmentation

Most of the popular computer vision datasets have tens of thousands of images. 

Cifar-10 is a dataset of 60000 32x32 colour images in 10 classes, which can be relatively small in compare to ImageNet which has 1M images. 

The more the number of parameters is, the more likely our model is to overfit to the small dataset. 
As you might have already faced this issue while training the CoolNet, after some iterations the training accuracy reaches its maximum (saturates) while the test accuracy is still relatively low. 

To solve this problem, we use the data augmentation to help the network avoid overfitting.

Add data transformations in to the class below and compare the results. You are free to use any type and any number of data augmentation techniques.

Just be aware that data augmentation should just happen during training phase. 

In [0]:
train_transform = transforms.Compose([
    # TODO: Add data augmentations here
    # You can find a list of transforms here:
    #  https://pytorch.org/docs/stable/torchvision/transforms.html
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

In [0]:
train_losses, train_accuracies, test_accuracies = train_network(
    net, 
    criterion=nn.CrossEntropyLoss(),
    train_transforms=train_transform,
    lr=0.01, 
    epochs=50, 
    batch_size=1024)

In [0]:
plot_results(train_losses, train_accuracies, test_accuracies)

#### Question
How does the model trained with data augmentation compared to the model trained without?

In [0]:
# Your answer here

### 2.3.4. Change the loss function

Try Mean Squared Error loss instead of Cross Entropy.

In [0]:
train_losses, train_accuracies, test_accuracies = train_network(
    net, 
    criterion=nn.MSELoss(),
    lr=0.01, 
    epochs=50, 
    batch_size=1024)

In [0]:
plot_results(train_losses, train_accuracies, test_accuracies)

#### Question:
How does this affects the results? Explain why you think this is happening.

In [0]:
# Your answer here

## Turning In

You're done! You just need to turn in the notebook file.

Go to `File > Download .ipynb` and download the file as `hw4.ipynb`. Turn in only this file.

Make sure that you've answered all questions and all plots are correct.