# Face Classification



## Setup/Importing

In [None]:
# Run this cell and follow instructions to connect this notebook to Google Drive
try:
    from google.colab import drive
    drive.mount('/content/drive')
except ImportError:
    print("Not on google drive")

In [None]:
# Change directories ("cd") to the folder containing your
# notebook and data folder by replacing the filepath below
%cd path/to/folder/in/google/drive

In [None]:
# TODO Run this cell to download the data from Amazon AWS
# TODO If needed, replace your the local Google Drive path (/content/drive/MyDrive/pa2b/) with a path that works for you 

!wget -P /content/drive/MyDrive/pa2b/ https://cmu-dele-leaderboard-us-east-2-003014019879.s3.us-east-2.amazonaws.com/colab/pa2b/data2pb.zip

In [None]:
# TODO Run this cell to unzip the data from Amazon AWS to your local Drive
# TODO If needed, replace your the local Google Drive path (/content/drive/MyDrive/pa1b/data1pb.zip) with a path that works for you 

!unzip -u /content/drive/MyDrive/pa2b/data2pb.zip -d /content/drive/MyDrive/pa2b/

In [None]:
# Run this cell to import packages and enable autoreloading code from other imported files
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm.notebook import tqdm
import torchvision
import os

%load_ext autoreload
%autoreload 2

### Auto-detect if GPU is available

In [None]:
# Run this cell to automatically detect if GPU is available.
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(DEVICE)

# Section 1: Datasets/DataLoaders 

## Question 1.1: Initialize Training/Validation Datasets for Classification
As discussed in section 5 of the writeup, load in the **training** and **validation** datasets for classification.

You'll want to use the [ImageFolder](https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.ImageFolder) object from `torchvision`. Read the documentation for it to figure out how to initialize it. You'll need the paths to the classification train/val folders.

In [None]:
# [Given] Method for transforming image data to a tensor, give this to ImageFolder
transform = torchvision.transforms.ToTensor() # give this to the ImageFolder object

# TODO: Initialize dataset objects for training and validation using ImageFolder from torchvision.
classification_train_dataset = None
classification_val_dataset = None

### BEGIN SOLUTION
classification_train_dataset = torchvision.datasets.ImageFolder("data/classification_train", transform=transform)
classification_val_dataset = torchvision.datasets.ImageFolder("data/classification_val", transform=transform)
### END SOLUTION

## Question 1.2: Initialize Classification DataLoaders
Now that you have the datasets initialized, give each of them to a [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader).

**DataLoader Hyperparam Notes**

Set the args for the `DataLoader` based on these notes. This assignment is really compute/memory intensive, so these will make a big difference in speeding up training.

- `batch_size`
    - See assignment 1b writeup for how to select `batch_size`.
        - `64` is a decent starting number for the train dataset.
    - Validation/test can have larger `batch_size`, this will speed up eval
        - In general, we want to maximize val/test batch sizes because we just need to get through them quickly, and because it has no impact on training or the final accuracy score.
        - We can do larger batch sizes training because we have more free memory available, because no gradients are stored during eval.
- `pin_memory`
    - Read this [page](https://pytorch.org/docs/stable/data.html#memory-pinning), set to `True`
- `num_workers`
    - Given below. We set to the number of CPU cores you have available.
        - You can use this same value for train, val, and test.
    - For what this does, read this [page](https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading)

In [None]:
# [Given] How many "workers" (multi-processing parallel threads) each dataloader should have. 
num_workers = os.cpu_count()
print(f"num_workers (# cpu cores): {num_workers}")

# TODO: Initialize `Dataloader` objects.
classification_train_dataloader = None
classification_val_dataloader = None

### BEGIN SOLUTION
batch_size = 64
classification_train_dataloader = torch.utils.data.DataLoader(classification_train_dataset, batch_size=batch_size, pin_memory=True, shuffle=True, num_workers=num_workers)
classification_val_dataloader = torch.utils.data.DataLoader(classification_val_dataset, batch_size=batch_size, pin_memory=True, shuffle=False, num_workers=num_workers)
### END SOLUTION

## Question 1.3: Initialize Val Verification Dataset/Dataloader

We need this because, while training on classification, we also want to see how our model is doing on verification every epoch.

We only initialize the validation set for now, because the test set is only needed at the very end of the assignment when you want to generate your submission for grading.

In [None]:
from utils import VerificationDataset 

# [Given] Custom dataset object for verification task. See utils.py for its implementation. 
verify_verification_dataset = VerificationDataset("data", "data/verification_pairs_val.txt")

# TODO: Initialize `Dataloader` object (similar to above question)
verification_val_dataloader = None

### BEGIN SOLUTION
batch_size = 64
verification_val_dataloader = torch.utils.data.DataLoader(verify_verification_dataset, batch_size=batch_size, pin_memory=True, shuffle=False, num_workers=num_workers)
### END SOLUTION

# Section 2: ResNet Model

Now let's implement your model: ResNet.

<!-- 
Note that our model is a class that extends `nn.Module`. Extending `nn.Module` means that any `nn.Module`s assigned as attributes to the class will automatically have their parameters absorbed. So when we do the following:

    model = Model()
    optimizer = MyOptimizer(model.parameters())

Any attributes of `model` that have parameters will have those parameters included in `model.parameters()`, so `optimizer` will optimize them during training. -->

## Question 2.1: `ResBlock`

<p align="center"><img src="images/resblock_downsample_false.png" width="400"/><img src="images/resblock_downsample_true.png" width="411"/></p>

Recall that ResNet is composed of these modular `ResBlock` units that are simply stacked together with differing parameters to get the final model.

Also recall that the block differs depending on whether `downsample` is `False` or `True`.

- If `downsample=False` (left-side of diagram), the output's width/height will end up unchanged from the input's.
- If `downsample=True` (right-side of diagram), the output's width/height end up smaller than the input's.

For both cases, the number of output channels depends solely on how you specify `out_channels`.

<p align="center"><img src="images/residual_sizes_downsample_false.png" width="400"/><img src="images/residual_sizes_downsample_true.png" width="411"/></p>

The above diagram shows how to specify layer sizes for both the `downsample=False` case and the `downsample=True` case.

Most important differences:

- If `downsample=False`, the shortcut should be `nn.Identity()`, and the first `Conv1d` in `self.residual` should have `stride=1`.
- If `downsample=True`, the shortcut should be the block shown on the bottom right, and the first `Conv1d` in `self.residual` should have `stride=2`.

**Instructions:**

Implement `ResBlock.__init__()` and `ResBlock.forward()` below. The `# TODO` comments should walk you through what to implement.  

In [None]:
class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, downsample=False):
        super().__init__()            
        # TODO: Initialize the residual EXCLUDING the last `ReLU`.
        # Your stride for the first layer will need to depend on the `downsample` variable.
        self.residual = nn.Sequential(
            # Add appropriate layers here
        )
        
        # TODO: Implement the shortcut (right side of diagram) based on whether or not `downsample` is True or False
        if downsample:
            self.shortcut = nn.Sequential(
                # Add appropriate layers here
            ) 
        else:
            # TODO: If no downsampling, use an Identity layer
            self.shortcut = None
        
        # TODO: Implement the final ReLU activation function
        self.final_activation = None
        
        ### BEGIN SOLUTION
        self.residual = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride=2 if downsample else 1, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            nn.Conv2d(out_channels, out_channels, kernel_size, stride=1, padding=1),
            nn.BatchNorm2d(out_channels),
        )
                
        if downsample:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, 2),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.shortcut = nn.Identity()
            
        self.final_activation = nn.ReLU()
        ### END SOLUTION

    def forward(self, x):
        # TODO: pass the input through the residual (don't overwrite `x`!)
        residual_out = None

        # TODO: pass the input through the shortcut
        shortcut_out = None
        
        # TODO: add the shortcut and residual outputs, and pass them through the final activation
        final_out = None
        
        ### BEGIN SOLUTION
        shortcut_out = self.shortcut(x)
        residual_out = self.residual(x)
        final_out = self.final_activation(shortcut_out + residual_out)
        ### END SOLUTION

        return final_out

## Question 2.2: `ResNet`

Now we can implement the overall model.

First, implement `ResNet.__init__()`. Begin by implementing the table with our recommended architecture from the writeup.

You definitely can modify the widths/depths yourself later, but you'll need to carefully ensure the sizes will line up.

Second, implement `ResNet.forward()`.

In [None]:
class ResNet(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO: Implement the first Conv2d layer
        self.conv = None
        
        # TODO: Implement a nn.Sequential containing the ResBlocks.
        self.blocks = nn.Sequential(
            # Add appropriate layers here, make sure to set downsample appropriately.
        )
        self.linear = None
        ### BEGIN SOLUTION
        self.conv = nn.Conv2d(3, 64, 3, 1, padding=1)
        self.blocks = nn.Sequential(
            ResBlock(64, 64, 3),
            ResBlock(64, 64, 3),

            ResBlock(64, 128, 3, True),
            ResBlock(128, 128, 3),

            ResBlock(128, 256, 3, True),
            ResBlock(256, 256, 3),

            ResBlock(256, 512, 3, True),
            ResBlock(512, 512, 3),
        )

        self.linear = nn.Linear(512, 4000)
        ### END SOLUTION

    def forward(self, x):
        # TODO: Pass the input through the first convolution
        conv_out = None
        
        # TODO: Pass conv_out through the ResBlocks
        block_out = None
        
        # TODO: Average the output along the last two axes to get the embedding
        features = None
        
        # TODO: Pass the features through the final linear layer
        logits = None
        
        # this function should return (embedding, logits)
        ### BEGIN SOLUTION
        features = self.blocks(self.conv(x)).mean(2).mean(2)
        logits = self.linear(features)
        ### END SOLUTION
        
        # [Given] Return both the logits AND features; will need logits for classification, features for verification
        return logits, features

Run the cell below to initialize your model.

If there are any errors/exceptions thrown during the initialization, you likely made a mistake during implementation somewhere. But even if there are no errors thrown, you could have still made a mistake! It'll become apparent when you run your training loop for the first time.

This is a good opportunity to learn how to read the error messages and debug neural networks. Remember: the `%debug` Jupyter notebook command is really useful.

In [None]:
# Run this cell to initialize your model
model = ResNet()
model = model.to(DEVICE)

# make sure the model initializes properly
print(model)

# Section 3: Training/Validation/Prediction Routines

Now to write the training/eval routines.

Below, we've given you the overall `train()` method. It should look mostly familiar, but the main difference is that we do **two validations** every epoch:

One validation on the **classification** task, and a second for the **verification** task.

We do this to get a sense of how our model is doing for each task.

In [None]:
# [Given] ALREADY COMPLETED, SHOWN JUST FOR YOUR REFERENCE
def train(model, optimizer, classification_train_dataloader, classification_val_dataloader,
          verification_val_dataloader, num_epochs, scheduler=None, center_loss_function=None):
    """[Given] Trains and validates network for given number of epochs

    Args:
        model (nn.Module): Your initialized ResNet model.
        optimizer (optim.Optimizer): Initialized optimizer like `optim.SGD` or `optim.Adam`
        classification_train_dataloader (torch.utils.data.DataLoader): Classification train dataloader
        classification_val_dataloader (torch.utils.data.DataLoader): Classification val dataloader
        verification_val_dataloader (torch.utils.data.DataLoader): Verification val dataloader
        num_epochs (int): Number of epochs to train for
        scheduler (optim.lr_scheduler): Initialized scheduler like `optim.lr_scheduler.ReduceLROnPlateau` (or None)
    Returns:
        (list, list): a list of loss values per batch and a list of validation accuracies every epoch
    """
    losses = []
    val_accuracies = []
    
    for e in range(num_epochs):
        print(f"Epoch #{e}")
        epoch_losses = train_epoch(model, optimizer, classification_train_dataloader, \
                                   scheduler, center_loss_function)
        losses.extend(epoch_losses)
        
        # Eval on classification validation set
        val_accuracy = eval_classification(model, classification_val_dataloader)
        print("Classification Val Accuracy:", 100 * val_accuracy)
        
        # Eval on verification validation set
        verification_accuracy = eval_verification(model, verification_val_dataloader)
        print("Verification Val ROC AUC:", 100 * verification_accuracy)
        val_accuracies.append(val_accuracy)


    return losses, val_accuracies

Also given below is the evaluation method for classification.  It's very similar to the previous assignments', so we figured we'd just give it to you.

In [None]:
# [Given] ALREADY COMPLETED, SHOWN JUST FOR YOUR REFERENCE
def eval_classification(model, dataloader):
    """[Given] Evaluates network and calculates accuracy for a full validation dataset.

    Args:
        model (nn.Sequential): Your initialized network, stored in a `Sequential` object.
        dataloader (torch.utils.data.DataLoader): Initialized validation dataloader

    Returns:
        float: Accuracy rate for entire val set.
    """
    model.eval()
    total_correct = 0
    # Run faster by ignoring information necessary for calculating gradients
    with torch.no_grad():
        for i, (data, labels) in tqdm(enumerate(dataloader), total=len(dataloader)):
            # Put tensors on specified device
            data, labels = data.to(DEVICE), labels.to(DEVICE)
            
            # Get the two outputs of ResNet.forward()
            logits, features = model(data) # Note that we don't need `features` so we just ignore it.
            
            # Get integer predictions by taking the max index along the `logits`. Then compare against the labels. 
            num_correct = (logits.argmax(axis=1) == labels).sum()
            
            total_correct += num_correct.item()

    return total_correct / len(dataloader.dataset)

## Question 3.1: `eval_verification()`

Now you'll implement evaluation on the **verification** task.

**Overview**

Instead of calculating accuracy, this method calculates the [Area Under the Curve (AUC) of the ROC curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), which is a common metric for evaluating binary classifiers. Remember, this problem is binary classification: are these two images of the same person or not?

To do this, we first need to calculate the predicted cosine similarity scores for each pair of images in the verification val set. Once we have those, we can calculate the AUC ROC and return it.

**Methodology**
- We iterate through the `verification_val_dataloader` to get our pairs of images.
    - Each iteration, we get two batches of images. For example, if `batch_size=5`, we get two batches that each contain 5 images.
    - We give the model each batch, one at a time. 
        - Doesn't matter which batch you provide first
    - We grab the `feature` matrix the model outputs for each batch, and give them to our initialized `nn.CosineSimilarity` module
    - This should return all of the cosine similarity scores for the batch. `extend` the given `similarities` list with these scores.
- After calculating all of the cosine similarity scores, we give these and the true binary labels to `roc_auc_score` function
    - The function is imported for you in the next cell
- The output of `roc_auc_score` is a single float, representing your ROC AUC score for the entire verification val dataset.

In [None]:
from sklearn.metrics import roc_auc_score

def eval_verification(model, verification_val_dataloader):
    """[Given] Evaluates network and calculates ROC AUC for a full validation dataset.

    Args:
        model (nn.Module): Self-explanatory!
        verification_val_dataloader (torch.utils.data.DataLoader): Initialized verification val dataloader

    Returns:
        float: ROC AUC for entire val set.
    """
    model.eval()
    
    # TODO: Initialize the nn.CosineSimilarity object from torch
    cosine_similarity = None
    ### BEGIN SOLUTION
    cosine_similarity = nn.CosineSimilarity()
    ### END SOLUTION

    # [Given] Cosine similarity scores go here
    similarities = []
    
    with torch.no_grad():
        for (batch_1, batch_2, _) in tqdm(verification_val_dataloader, total=len(verification_val_dataloader)):
            # [Given] Put batches on GPU
            batch_1, batch_2 = batch_1.to(DEVICE), batch_2.to(DEVICE)

            # TODO: Give each batch to the model, store the feature output of each
            ### BEGIN SOLUTION
            _, features_1 = model(batch_1)
            _, features_2 = model(batch_2)
            ### END SOLUTION

            # TODO: Give the feature vectors to cosine_similarity, get the scores
            batch_similarities = None
            ### BEGIN SOLUTION
            batch_similarities = cosine_similarity(features_1, features_2)
            ### END SOLUTION

            # [Given] Store the similarity scores
            similarities.extend(batch_similarities.tolist())

    # [Given] List of binary labels for the entire dataset
    labels = verification_val_dataloader.dataset.get_labels()

    # TODO: Give the labels and the similarities to the roc_auc_score function, return the final score
    auc_score = None
    ### BEGIN SOLUTION
    auc_score = roc_auc_score(labels, similarities)
    ### END SOLUTION

    return auc_score

## Question 3.2: `train_epoch()`

Now to write the training routine of a single epoch.

**Note: If you haven't already, read section 6 of the writeup for explanations of mixed precision and center loss.**

The pseudocode below reflects the usage of both techniques. Note that we're giving you parts of this because it's pretty complicated and the main point of this exercise is the concepts.

```
def train_epoch():
    set_model_to_train_mode()
    alpha = 0.005
    grad_scaler = create_amp_gradient_scaler()
    for (data, labels) in tqdm(dataloader):
        data, labels = put_tensors_on_appropriate_device(DEVICE, data, labels) # See val method for how to do this
        reset_gradients_to_zero()
        with autocast:
            logits, features = forward_pass_through_model(model, data)
            center_loss = center_loss_function(features, labels) * alpha
            loss = cross_entropy_loss_function(logits, labels) + center_loss
        backprop_with_amp_scaled_gradients() 
        update_center_loss_params()
        update_model_with_amp_scaled_gradients()
        update_scaler()
        store_loss_value(loss)

    return loss_values
```

**CenterLoss Notes:**
- We're using the "second method" described in the [readme](https://github.com/KaiyangZhou/pytorch-center-loss), with one optimizer for model and loss function parameters.
- Notice that `CenterLoss` has **trainable parameters**
    - If you rerun training, you should make sure to use the same object without reinitializing it. If you initialize it again, it'll lose the centers it was training.
- We're using `alpha=0.005` for center loss.

In [None]:
def train_epoch(model, optimizer, dataloader, scheduler=None, center_loss_function=None):
    """Train model for one epoch.

    Args:
        model (nn.Module): Initialized ResNet network.
        optimizer (optim.Optimizer): Initialized optimizer like `optim.SGD` or `optim.Adam`
        dataloader (torch.utils.data.DataLoader): Initialized training dataloader
        center_loss_function (CenterLoss): Initialized CenterLoss object (imported from `center_loss.py`).
        scheduler (optim.lr_scheduler): Optional scheduler if you want it

    Returns:
        list: Loss value of each batch for this epoch.
    """
    # [Given] Make sure you're aware of what's here
    loss_per_batch = []
    loss_function = nn.CrossEntropyLoss()
    alpha = 0.005

    # TODO: set model to train mode
        
    # TODO: initialize your AMP gradient scaler 
    amp_grad_scaler = None

    ### BEGIN SOLUTION
    model.train()
    amp_grad_scaler = torch.cuda.amp.GradScaler()
    ### END SOLUTION

    # Run loop with `tqdm` progress bar
    for i, (data, labels) in tqdm(enumerate(dataloader), total=len(dataloader)):
        # TODO: Complete code based on pseudocode
        ### BEGIN SOLUTION
        data, labels = data.to(DEVICE), labels.to(DEVICE)
        optimizer.zero_grad()

        with torch.cuda.amp.autocast():
            logits, features = model(data)
            center_loss = center_loss_function(features, labels) * alpha
            loss = loss_function(logits, labels) + center_loss
        ### END SOLUTION
        
        # [Given] Run backprop on scaled gradient with AMP
        amp_grad_scaler.scale(loss).backward()
        
        # [Given] Manually update center loss trainable parameters
        lr = optimizer.param_groups[0]['lr']
        for param in center_loss_function.parameters():
            param.grad.data *= (1e-1 /(alpha * lr))

        # [Given] Update model parameters (equivalent to doing `optimizer.step()` without AMP)
        amp_grad_scaler.step(optimizer)
        amp_grad_scaler.update()

        # [Given] Store the loss value for the batch
        loss_per_batch.append(loss.item())

    # TODO (Optional): step the scheduler if it is not None
    ### BEGIN SOLUTION
    if scheduler is not None:
        scheduler.step(sum(loss_per_batch))
    ### END SOLUTION
    return loss_per_batch

# Section 4: Train Model!

Almost there!

## Question 4.1: Initialize objects

Time to initialize objects. If things break, you may need to figure out what the error messages mean and backtrack to fix the appropriate thing.

Google is your friend. Post questions if you're particularly lost.

Here's a link to the [CenterLoss documentation](https://github.com/KaiyangZhou/pytorch-center-loss), and a description of [LR schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate). A good scheduler is ReduceLROnPlateau.

Note that instead of using an optimizer for center loss, and a different one for the model as suggested in the center loss docsm, we can turn both parameter sets into lists, and then concatenate them together.

In [None]:
from center_loss import CenterLoss

# Note: you already should have initialized your model a few cells above, but you can reinitialize it here if you want.

# TODO: Initialize center loss function (see the readme and/or center_loss.py)
center_loss_function = None

# TODO: Concatenate your model and center loss function parameters so that you can just use one optimizer;
#       this will looks like list(model.parameters()) + list(center_loss_function.parameters())
parameters = None

# TODO: Initialize optimizer with combined parameters.
optimizer = None

# TODO: (Optional) Initialize a scheduler if you want it!
scheduler = None

### BEGIN SOLUTION
center_loss_function = CenterLoss(num_classes=4000, feat_dim=512, device=DEVICE)
parameters = list(model.parameters()) + list(center_loss_function.parameters())
optimizer = torch.optim.Adam(parameters, lr=1e-3, weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=0)
### END SOLUTION

## Question 4.2: Run training!

Now to run training! Begin by training for 8 epochs and see how it does. We encourage you to generate test results for section 5 as soon and as often as possible, so you can guarantee you'll at least earn some points before the final deadline!

In [None]:
# TODO: Call your training routine for some epochs
num_epochs = 8
# losses, val_accuracies = train() # TODO: Finish me!

### BEGIN SOLUTION
losses, val_accuracies = train(model, optimizer, classification_train_dataloader, \
                               classification_val_dataloader, verification_val_dataloader, \
                               num_epochs, scheduler, center_loss_function)
### END SOLUTION

# Section 5: Generating Test Predictions for Submitting!

Now to submit! Run these cells below and look for the output file generated in the `submissions/` folder (it will be time-stamped).

**NOTE:** The first two rows of the CSV should look like this:

`Id,Category
verification_data/00020839.jpg verification_data/00035322.jpg,[some number]`

**Please remember to edit the second entry of the first row ('Category') to the name you want to appear on the leaderboard.** 

In [None]:
from utils import export_predictions_to_csv, generate_predictions

test_verification_dataset = VerificationDataset("./data/", "./data/verification_pairs_test.txt", True)
test_verification_dataloader = torch.utils.data.DataLoader(test_verification_dataset, batch_size=batch_size,
                                                           pin_memory=True, shuffle=False,
                                                           num_workers = num_workers)
similarities = generate_predictions(model, test_verification_dataloader, DEVICE)
export_predictions_to_csv("data/verification_pairs_test.txt", similarities)