<img src="img/dsci572_header.png" width="600">

# Lab 4: Transfer Learning and GANs

## Instructions
<hr>

rubric={mechanics:5}

- Follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/)

- Upload a PDF version of your lab notebook to Gradescope, in addition to the .ipynb file.

- Add a link to your GitHub repository here: https://github.ubc.ca/MDS-2022-23/DSCI_572_lab4_missarah

## Imports
<hr>

In [1]:
!pip install torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1
[0m

In [2]:
import numpy as np
import pandas as pd
from collections import OrderedDict
import torch
from torch import nn, optim
import memory_profiler
from torchsummary import summary
import torchvision
from torchvision import datasets, transforms, utils, models
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
from PIL import Image

plt.rcParams.update({'axes.grid': False})

## Getting Started with Kaggle
<hr>

We are going to run this notebook on the cloud using [Kaggle](https://www.kaggle.com). Kaggle offers 30 hours of free GPU usage per week which should be much more than enough for this lab. To get started, follow these steps:

1. Go to https://www.kaggle.com/kernels

2. Make an account if you don't have one, and verify your phone number (to get access to GPUs)
3. Select `+ New Notebook`
4. Go to `File -> Import Notebook`
5. Upload this notebook
6. On the right-hand side of your Kaggle notebook, make sure:
  
  - `Internet` is enabled.
  
  - In the `Accelerator` dropdown, choose one of the GPU options when you're ready to use it (you can turn it on/off as you need it).
    
Once you've done all your work on Kaggle, you can download the notebook from Kaggle. That way any work you did on Kaggle won't be lost.

## Exercise 1: Transfer Learning
<hr>

rubric={accuracy:15}

In this exercise you're going to practice transfer learning. We're going to develop a model that can detect the following 6 cat breeds in this Kaggle [dataset](https://www.kaggle.com/solothok/cat-breed):

1. American Short hair

2. Bengal
3. Maine Soon
4. Ragdoll
5. Scottish Fold
6. Sphinx

In order to use this dataset 

1. Click `+ Add data` at the top right of the notebook.

2. Search for **"cat-breed"** and click `Add`

### 1.1: CNN from Scratch

In this exercise, you should build a CNN model to classify images of cats based on their breeds.

In Kaggle, running the follow cell should print out `"Using device: cuda"` which means a GPU is available:

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device.type}")

Using device: cuda


To make use of the GPU, you should:
1. Move your model to the GPU after creating it using this syntax:

```python
model.to(device)
```

2. In your training/validation loops, each batch should be moved to the GPU using syntax like:

```python
for X, y in dataloader:
    X, y = X.to(device), y.to(device)
    ...
```

Here are some guidelines for building your binary classification CNN from scratch:

- You may use any architecture you like.

- This is the path to the data in your notebook: `../input/cat-breed/cat-breed/`
- You should use an `IMAGE_SIZE = 200` pixels in your data loader (the raw images could be any size).
- **You must train your model for at least 20 epochs and print or plot the accuracy for each epoch on the validation data for us to see.**

>If you want to take a look at the images after making a `train_loader`, try this code:

```python
# Plot samples
sample_batch = next(iter(train_loader))
plt.figure(figsize=(10, 8)); plt.axis("off"); plt.title("Sample Training Images")
plt.imshow(np.transpose(utils.make_grid(sample_batch[0], padding=1, normalize=True),(1, 2, 0)));
```

In [11]:
TRAIN_DIR = "../input/cat-breed/cat-breed/TRAIN"
VALID_DIR = "../input/cat-breed/cat-breed/TEST"

IMAGE_SIZE = (200, 200)

data_transforms = transforms.Compose([
    transforms.Resize(IMAGE_SIZE),
    transforms.ToTensor()
])

train_dataset = torchvision.datasets.ImageFolder(root=TRAIN_DIR, transform=data_transforms)
valid_dataset = torchvision.datasets.ImageFolder(root=VALID_DIR, transform=data_transforms)

BATCH_SIZE = 64

trainloader = torch.utils.data.DataLoader(
    train_dataset,          # our raw data
    batch_size=BATCH_SIZE,  # the size of batches we want the dataloader to return
    shuffle=True,           # shuffle our data before batching
    drop_last=False         # don't drop the last batch even if it's smaller than batch_size
)

validloader = torch.utils.data.DataLoader(
    valid_dataset,          # our raw data
    batch_size=BATCH_SIZE,  # the size of batches we want the dataloader to return
    shuffle=True,           # shuffle our data before batching
    drop_last=False         # don't drop the last batch even if it's smaller than batch_size
)

In [None]:
sample_batch = next(iter(train_loader))
plt.figure(figsize=(10, 8)); plt.axis("off"); plt.title("Sample Training Images")
plt.imshow(np.transpose(utils.make_grid(sample_batch[0], padding=1, normalize=True),(1, 2, 0)));

In [22]:
def trainer(model, criterion, optimizer, trainloader, validloader, epochs=5, verbose=True):
    """Simple training wrapper for PyTorch network."""
    
    train_loss, valid_loss, valid_accuracy = [], [], []
    for epoch in range(epochs):  # for each epoch
        train_batch_loss = 0
        valid_batch_loss = 0
        valid_batch_acc = 0
        
        # Training
        for X, y in trainloader:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()       # Zero all the gradients w.r.t. parameters
            y_hat = model(X)            # Forward pass to get output
            loss = criterion(y_hat, y)  # Calculate loss based on output
            loss.backward()             # Calculate gradients w.r.t. parameters
            optimizer.step()            # Update parameters
            train_batch_loss += loss.item()  # Add loss for this batch to running total
        train_loss.append(train_batch_loss / len(trainloader))
        
        # Validation
        with torch.no_grad():  # this stops pytorch doing computational graph stuff under-the-hood and saves memory and time
            for X, y in validloader:
                X, y = X.to(device), y.to(device)
                y_hat = model(X)
                _, y_hat_labels = torch.softmax(y_hat, dim=1).topk(1, dim=1)
                loss = criterion(y_hat, y)
                valid_batch_loss += loss.item()
                valid_batch_acc += (y_hat_labels.squeeze() == y).type(torch.float32).mean().item()
        valid_loss.append(valid_batch_loss / len(validloader))
        valid_accuracy.append(valid_batch_acc / len(validloader))  # accuracy
        
        # Print progress
        if verbose:
            print(f"Epoch {epoch + 1}:",
                  f"Train Loss: {train_loss[-1]:.3f}.",
                  f"Valid Loss: {valid_loss[-1]:.3f}.",
                  f"Valid Accuracy: {valid_accuracy[-1]:.2f}.")
    
    results = {"train_loss": train_loss,
               "valid_loss": valid_loss,
               "valid_accuracy": valid_accuracy}
    return results

In [40]:
class CNN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.main = torch.nn.Sequential(
            
            torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3, 3), padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d((2,2)),
            
            torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3, 3), padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d((2,2)),
            
            torch.nn.Conv2d(in_channels=32, out_channels=8, kernel_size=(3, 3), padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d((2,2)),
            
            torch.nn.Flatten(),
            
            torch.nn.Linear(5000, 6),
        )

    def forward(self, x):
        out = self.main(x)
        return out

In [41]:
model = CNN()
model.to(device)

CNN(
  (main): Sequential(
    (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(32, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (9): Flatten(start_dim=1, end_dim=-1)
    (10): Linear(in_features=5000, out_features=6, bias=True)
  )
)

In [42]:
summary(model, (3, 200, 200))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 16, 200, 200]             448
              ReLU-2         [-1, 16, 200, 200]               0
         MaxPool2d-3         [-1, 16, 100, 100]               0
            Conv2d-4         [-1, 32, 100, 100]           4,640
              ReLU-5         [-1, 32, 100, 100]               0
         MaxPool2d-6           [-1, 32, 50, 50]               0
            Conv2d-7            [-1, 8, 50, 50]           2,312
              ReLU-8            [-1, 8, 50, 50]               0
         MaxPool2d-9            [-1, 8, 25, 25]               0
          Flatten-10                 [-1, 5000]               0
           Linear-11                    [-1, 6]          30,006
Total params: 37,406
Trainable params: 37,406
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/ba

In [44]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
results = trainer(model, criterion, optimizer, trainloader, validloader, epochs=20)

Epoch 1: Train Loss: 1.079. Valid Loss: 1.496. Valid Accuracy: 0.38.
Epoch 2: Train Loss: 0.979. Valid Loss: 1.412. Valid Accuracy: 0.43.
Epoch 3: Train Loss: 0.896. Valid Loss: 1.430. Valid Accuracy: 0.43.
Epoch 4: Train Loss: 0.829. Valid Loss: 1.358. Valid Accuracy: 0.47.
Epoch 5: Train Loss: 0.766. Valid Loss: 1.584. Valid Accuracy: 0.42.
Epoch 6: Train Loss: 0.695. Valid Loss: 1.543. Valid Accuracy: 0.45.
Epoch 7: Train Loss: 0.650. Valid Loss: 1.537. Valid Accuracy: 0.43.
Epoch 8: Train Loss: 0.567. Valid Loss: 1.606. Valid Accuracy: 0.45.
Epoch 9: Train Loss: 0.498. Valid Loss: 1.646. Valid Accuracy: 0.43.
Epoch 10: Train Loss: 0.438. Valid Loss: 1.822. Valid Accuracy: 0.41.
Epoch 11: Train Loss: 0.379. Valid Loss: 1.836. Valid Accuracy: 0.44.
Epoch 12: Train Loss: 0.321. Valid Loss: 2.202. Valid Accuracy: 0.39.
Epoch 13: Train Loss: 0.322. Valid Loss: 2.109. Valid Accuracy: 0.42.
Epoch 14: Train Loss: 0.251. Valid Loss: 2.076. Valid Accuracy: 0.44.
Epoch 15: Train Loss: 0.181. 

### 1.2: Feature Extractor

In this exercise, you should leverage a pre-trained model customized with your own layer(s) on top, to build a CNN classifier that can identify various cat breeds.

- You can use any model you wish. I used `DenseNet`.

- Train your model for at least 20 epochs.

- Comment on the performance of this model compared to your "from scratch" model.

In [54]:
densenet = models.densenet121(pretrained=True)
#densenet.eval();
for param in densenet.parameters():  # Freeze parameters so we don't update them
    param.requires_grad = False
    
densenet.classifier

new_layers = nn.Sequential(
    nn.Linear(1024, 500),
    nn.ReLU(),
    nn.Linear(500, 6)
)
densenet.classifier = new_layers

densenet.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(densenet.parameters(), lr=0.001)
results = trainer(densenet, criterion, optimizer, trainloader, validloader, epochs=20)

Epoch 1: Train Loss: 0.893. Valid Loss: 0.418. Valid Accuracy: 0.85.
Epoch 2: Train Loss: 0.270. Valid Loss: 0.286. Valid Accuracy: 0.89.
Epoch 3: Train Loss: 0.180. Valid Loss: 0.268. Valid Accuracy: 0.91.
Epoch 4: Train Loss: 0.147. Valid Loss: 0.271. Valid Accuracy: 0.91.
Epoch 5: Train Loss: 0.149. Valid Loss: 0.326. Valid Accuracy: 0.86.
Epoch 6: Train Loss: 0.113. Valid Loss: 0.314. Valid Accuracy: 0.88.
Epoch 7: Train Loss: 0.090. Valid Loss: 0.265. Valid Accuracy: 0.91.
Epoch 8: Train Loss: 0.071. Valid Loss: 0.254. Valid Accuracy: 0.92.
Epoch 9: Train Loss: 0.051. Valid Loss: 0.211. Valid Accuracy: 0.91.
Epoch 10: Train Loss: 0.065. Valid Loss: 0.282. Valid Accuracy: 0.90.
Epoch 11: Train Loss: 0.056. Valid Loss: 0.223. Valid Accuracy: 0.92.
Epoch 12: Train Loss: 0.059. Valid Loss: 0.297. Valid Accuracy: 0.90.
Epoch 13: Train Loss: 0.046. Valid Loss: 0.239. Valid Accuracy: 0.92.
Epoch 14: Train Loss: 0.034. Valid Loss: 0.262. Valid Accuracy: 0.90.
Epoch 15: Train Loss: 0.023. 

<b> `DenseNet` is doing much better than scratch model on our train and valid datasets because it is pre-trained model. </b>

### 1.3: Fine Tuning

In this final exercise, you should fine-tune your model by updating all or some of the layers during training.

- You can fine-tune as many layers as you like: the whole model, or particular layers. Experiment with both modes of fine-tuning, and find which works better.

- Train your model for at least 20 epochs.

- Comment on the performance of this model compared to your "from scratch" and "feature extractor" models.

In [57]:
densenet = models.densenet121(pretrained=True)

#unfreeze all layers

new_layers = nn.Sequential(
    nn.Linear(1024, 500),
    nn.ReLU(),
    nn.Linear(500, 6)
)

densenet.classifier = new_layers


densenet.to(device);

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(densenet.parameters(), lr=0.001)
results = trainer(densenet, criterion, optimizer, trainloader, validloader, epochs=20)

Epoch 1: Train Loss: 0.615. Valid Loss: 0.460. Valid Accuracy: 0.85.
Epoch 2: Train Loss: 0.313. Valid Loss: 0.567. Valid Accuracy: 0.82.
Epoch 3: Train Loss: 0.212. Valid Loss: 0.478. Valid Accuracy: 0.85.
Epoch 4: Train Loss: 0.158. Valid Loss: 0.529. Valid Accuracy: 0.86.
Epoch 5: Train Loss: 0.108. Valid Loss: 0.388. Valid Accuracy: 0.89.
Epoch 6: Train Loss: 0.083. Valid Loss: 0.471. Valid Accuracy: 0.87.
Epoch 7: Train Loss: 0.091. Valid Loss: 0.718. Valid Accuracy: 0.82.
Epoch 8: Train Loss: 0.104. Valid Loss: 0.551. Valid Accuracy: 0.85.
Epoch 9: Train Loss: 0.089. Valid Loss: 0.561. Valid Accuracy: 0.87.
Epoch 10: Train Loss: 0.123. Valid Loss: 0.624. Valid Accuracy: 0.88.
Epoch 11: Train Loss: 0.100. Valid Loss: 0.521. Valid Accuracy: 0.87.
Epoch 12: Train Loss: 0.082. Valid Loss: 0.434. Valid Accuracy: 0.88.
Epoch 13: Train Loss: 0.090. Valid Loss: 0.529. Valid Accuracy: 0.88.
Epoch 14: Train Loss: 0.056. Valid Loss: 0.357. Valid Accuracy: 0.90.
Epoch 15: Train Loss: 0.021. 

## Exercise 2: Generative Adversarial Networks
<hr>

rubric={accuracy:15}

In this exercise you're going to practice building a generative adversarial network (GAN).

GANs are incredibly hard to train especially with small datasets, so you may not get good results in this exercise. But don't worry about that, it is just important to get some practice and experience with these types of NNs.

> For this exercise, you're not limited to a particular dataset, you can use any dataset you like. The `cat-breed` or any other suitable one on Kaggle is acceptable, as long as you can show the progress of your trained GAN on it.

### 2.1: Preparing the Data

In Kaggle, running the follow cell should print out `"Using device: cuda"` which means a GPU is available:

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device.type}")

To make use of the GPU, you should:
- Move your model to the GPU after creating it with the syntax:

```python
model.to(device)
```

- In your training loop, each batch should be moved to the GPU using syntax like:

```python
for X, _ in dataloader:
    X = X.to(device)
    ...
```

- Note above that we don't need the labels for training a GAN, so I ignore it by un-packing it into an underscore `_` (which is typically Python convention for variables we don't need).

Okay, prepare the data by creating a `data_loader`. This is the path to the data in your notebook if you choose to use the `cat-breed` dataset: `../input/cat-breed/cat-breed/`.

>If you want to take a look at the images after making a `data_loader`, try this code:

```python
# Plot samples
sample_batch = next(iter(data_loader))
plt.figure(figsize=(10, 8)); plt.axis("off"); plt.title("Sample Training Images")
plt.imshow(np.transpose(utils.make_grid(sample_batch[0], padding=1, normalize=True),(1, 2, 0)));
```

In [None]:
...

### 2.2: Create the Generator

Now, we need to create a generator for our GAN. You can reuse/modify the code from Lecture 8, or build your own.

In [None]:
...

### 2.3: Create the Discriminator

Now, we need to create a discriminator for our GAN. You can reuse/modify the code from Lecture 8, or build your own.

In [None]:
...

### 2.4: Initialize Weights

GANs can be quite sensitive to the initial weights assigned to each layer when we instantiate the model. Instantiate your generator and discriminator and then specify their initial weights as follows:

- `Conv2d()` layers: normal distribution with `mean=0.0` and `std=0.02`

- `ConvTranspose2d()` layers: normal distribution with `mean=0.0` and `std=0.02`

- `BatchNorm2d()` layers: normal distribution with `mean=1.0` and `std=0.02` for the weights, zeroes for the biases

- Use `LATENT_SIZE = 100`

In [None]:
...

### 2.5: Train your GAN

You now have all the ingredients you need now to train a GAN, so give it a go!

You should track the loss of your model as epochs progress and show at least one example of an image output by your trained generator (better yet, record the evolution over time of how your generator is doing, like we did in Lecture 8). **Your results may not be great and that's perfectly okay, you should just show _something_**.

Here are some tips:

- You will likely need to train for at least `NUM_EPOCHS=100` (and maybe more).

- I find that the hardest part about training GANs is that the discriminator "overpowers" the generator, making it hard for the generator to learn how to create realistic images. There are lots of things you can do to try and balance your generator and discriminator, such as: play with the optimizer's hyperparameters, change the architectures of your models, etc.

- Here's a good set of [tips and tricks for training GANs](https://github.com/soumith/ganhacks).

- Once again, GANs are notoriously difficult to train (even more so with smaller data sets like we have here). Don't worry if you're not getting amazing results. This is all about practice.

In [None]:
...