# ICS-504 Deep learning Assignment 02

Face recognition can be categorized into face classification and face verification. Given an image of a person’s face, the task of classifying the ID of the face is known as face classification, which is a closed-set problem. The task of determining whether two face images are of the same person is known as face verification, which is an open-set problem1.

In this assignment, you will use Convolutional Neural Networks (CNNs) to design an end-to-end system for face classification/identification. Your system will be given an image as input and will output the ID/name of the person shown in that image.


You will train your model on a dataset with a few thousand of images (i.e., a set of images, each labeled by an ID that uniquely identifies the person). Use Jupyter notebook to show each of the following steps:

## Part 01 [80 points]

1. Prepare the data by

    1.1.Preprocessing the data by zero-centering it 
    
    1.2.Dividing the train data into train (80%) and validation (20%)


2. Implement the following model in Pytorch. The specifications of each layer are given under it.

In [1]:
# import libraries 
import os
import numpy as np
import matplotlib.pyplot as plt

import torch
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
import torch.nn as nn

from tqdm.auto import tqdm

# Loading the Dataset
load the data using ImageFolder and transforming and resize to 64x64

In [2]:
# Write transform for image
data_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),# Resize the images to 64x64
    transforms.ToTensor() # Turn the image into a torch.Tensor
])

# loading training data
train_data = datasets.ImageFolder(root='./FaceDataset/train', # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

# loading test data
test_data = datasets.ImageFolder(root='./FaceDataset/test', 
                                 transform=data_transform)

print(f"Train data:\n{train_data}\nTest data:\n{test_data}")

Train data:
Dataset ImageFolder
    Number of datapoints: 4298
    Root location: ./FaceDataset/train
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=None)
               ToTensor()
           )
Test data:
Dataset ImageFolder
    Number of datapoints: 100
    Root location: ./FaceDataset/test
    StandardTransform
Transform: Compose(
               Resize(size=(64, 64), interpolation=bilinear, max_size=None, antialias=None)
               ToTensor()
           )


we can see that we have 4298 training data and 100 test data

In [3]:
# Get class names as a list
class_dict = train_data.class_to_idx
class_dict

{'n000003': 0,
 'n000010': 1,
 'n000011': 2,
 'n000013': 3,
 'n000015': 4,
 'n000017': 5,
 'n000018': 6,
 'n000020': 7,
 'n000023': 8,
 'n000024': 9,
 'n000025': 10,
 'n000028': 11,
 'n000032': 12,
 'n000033': 13,
 'n000034': 14,
 'n000035': 15,
 'n000038': 16,
 'n000041': 17,
 'n000042': 18,
 'n000046': 19,
 'n000048': 20,
 'n000049': 21,
 'n000051': 22,
 'n000053': 23,
 'n000054': 24,
 'n000057': 25,
 'n000059': 26,
 'n000062': 27,
 'n000064': 28,
 'n000065': 29,
 'n000070': 30,
 'n000072': 31,
 'n000074': 32,
 'n000075': 33,
 'n000077': 34,
 'n000086': 35,
 'n000088': 36,
 'n000089': 37,
 'n000093': 38,
 'n000101': 39,
 'n000108': 40,
 'n000110': 41,
 'n000111': 42,
 'n000113': 43,
 'n000115': 44,
 'n000118': 45,
 'n000119': 46,
 'n000120': 47,
 'n000121': 48,
 'n000125': 49}

we can see that we have 50 classes

# Zero centering
1. 1st we need to calculate the mean and the standar deviation of the training dataset
2. reload the data using the mean and std

In [4]:
### train data
mean = 0.
std = 0.
for images, _ in train_data:
    mean += images.mean(dim=(1,2))
    std += images.std(dim=(1,2))

# Compute the mean and standard deviation of the pixel values for each channel
mean /= len(train_data)
std /= len(train_data)

print("train mean", mean)
print('train std', std)

# train mean tensor([0.4811, 0.4057, 0.3730])
# train std tensor([0.2601, 0.2350, 0.2283])

train mean tensor([0.4811, 0.4057, 0.3730])
train std tensor([0.2601, 0.2350, 0.2283])


In [5]:
data_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),# Resize the images to 64x64
    transforms.ToTensor(), # Turn the image into a torch.Tensor
    transforms.Normalize(
        mean=mean,
        std=std
    )
])

# loading training data
train_data = datasets.ImageFolder(root='./FaceDataset/train', # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

# loading test data
test_data = datasets.ImageFolder(root='./FaceDataset/test', transform=data_transform)

# Dividing the train data into train (80%) and validation (20%)
split the data and then load them into a dataloader

In [6]:
# Split the dataset into training and validation sets
train_subdata, val_subdata = random_split(train_data, [int(len(train_data)*0.8), len(train_data)-int(len(train_data)*0.8)])

# Setup batch size and number of workers 
BATCH_SIZE = 32
NUM_WORKERS = os.cpu_count()
print(f"Creating DataLoader's with batch size {BATCH_SIZE} and {NUM_WORKERS} workers.")

# Create DataLoader's
train_dataloader = DataLoader(train_subdata, 
                                     batch_size=BATCH_SIZE, 
                                     shuffle=True, 
                                     num_workers=NUM_WORKERS)

val_dataloader = DataLoader(val_subdata, 
                                     batch_size=BATCH_SIZE, 
                                     shuffle=True, 
                                     num_workers=NUM_WORKERS)

test_dataloader = DataLoader(test_data, 
                                    batch_size=BATCH_SIZE, 
                                    shuffle=False, 
                                    num_workers=NUM_WORKERS)

train_dataloader, test_dataloader, val_dataloader

Creating DataLoader's with batch size 32 and 8 workers.


(<torch.utils.data.dataloader.DataLoader at 0x21033eb1370>,
 <torch.utils.data.dataloader.DataLoader at 0x21033ecc310>,
 <torch.utils.data.dataloader.DataLoader at 0x21033ecc610>)

# Build the model


In [8]:
class CNNmodel(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, output_shape: int) -> None:
        super().__init__()
        #1st block CNN with 32 filters size 7x7 strid 1 no padding with relu activation function
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape,out_channels=32,kernel_size=7, stride=1, padding=0),
            nn.ReLU()
        )
        
        #2nd block CNN with 16 filters size 7x7 strid 1 no padding with relu activation function
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=32,out_channels=16,kernel_size=7, stride=1, padding=0),
            nn.ReLU()
        )
        #3dr block pooling 2x2 and 2 stride
        self.conv_block_3 = nn.Sequential(
            nn.MaxPool2d(kernel_size=2,stride=2)
        )
        #4th block CNN 16 filters 5x5 stride 1 padding 1 with relu activation function
        self.conv_block_4 = nn.Sequential(
            nn.Conv2d(in_channels=16,out_channels=16,kernel_size=5, stride=1, padding=1),
            nn.ReLU()
        )
        #5th block dropout p=0.5
        self.conv_block_5 = nn.Sequential(
            nn.Dropout2d(p=0.5)
        )
        
        #6th block CNN 16 filters 5x5 strid 1 padding 0 with relu activation function
        self.conv_block_6 = nn.Sequential(
            nn.Conv2d(in_channels=16,out_channels=16,kernel_size=5, stride=1, padding=0),
            nn.ReLU()
        )
        
        #6th block batch normalization
        self.conv_block_7 = nn.Sequential(
            nn.BatchNorm2d(16)
        )
        
        #7th block fully connected NN 
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our inputs data.
            nn.Linear(in_features=6400,out_features=1024),
            nn.ReLU(),
            nn.Linear(in_features=1024,out_features=output_shape)
        )
        
        #8th block is included in the cross entropy 
    
    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        # print(x.shape)
        x = self.conv_block_2(x)
        # print(x.shape)
        x = self.conv_block_3(x)
        # print(x.shape)
        x = self.conv_block_4(x)
        # print(x.shape)
        x = self.conv_block_5(x)
        # print(x.shape)
        x = self.conv_block_6(x)
        # print(x.shape)
        x = self.classifier(x)
        # print(x.shape)
        return x
        


torch.manual_seed(42)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNNmodel(input_shape=3, # number of color channels (3 for RGB) 
                  output_shape=len(train_data.classes)).to(device)
model

CNNmodel(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(7, 7), stride=(1, 1))
    (1): ReLU()
  )
  (conv_block_2): Sequential(
    (0): Conv2d(32, 16, kernel_size=(7, 7), stride=(1, 1))
    (1): ReLU()
  )
  (conv_block_3): Sequential(
    (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_4): Sequential(
    (0): Conv2d(16, 16, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
  )
  (conv_block_5): Sequential(
    (0): Dropout2d(p=0.5, inplace=False)
  )
  (conv_block_6): Sequential(
    (0): Conv2d(16, 16, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
  )
  (conv_block_7): Sequential(
    (0): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=6400, out_features=1024, bias=True)
    (2): ReLU()
    (3): Linear(in_features=1024, out_features=50, bias=True)
  )
)

## testing the model using singal image

In [9]:
# 1. Get a batch of images and labels from the DataLoader
img_batch, label_batch = next(iter(train_dataloader))

# 2. Get a single image from the batch and unsqueeze the image so its shape fits the model
img_single, label_single = img_batch[0].unsqueeze(dim=0), label_batch[0]
print(f"Single image shape: {img_single.shape}\n")

# 3. Perform a forward pass on a single image
model.eval()
with torch.inference_mode():
    pred = model(img_single.to(device))
    
# 4. Print out what's happening and convert model logits -> pred probs -> pred label
print(f"Output logits:\n{pred}\n")
print(f"Output prediction probabilities:\n{torch.softmax(pred, dim=1)}\n")
print(f"Output prediction label:\n{torch.argmax(torch.softmax(pred, dim=1), dim=1)}\n")
print(f"Actual label:\n{label_single}")

Single image shape: torch.Size([1, 3, 64, 64])

Output logits:
tensor([[-0.0491, -0.0155, -0.0406,  0.0256,  0.0178,  0.0076, -0.0121, -0.0138,
         -0.0032, -0.0183, -0.0100, -0.0030,  0.0226,  0.0338,  0.0060,  0.0187,
         -0.0204,  0.0063, -0.0081, -0.0262,  0.0133,  0.0255, -0.0254, -0.0411,
         -0.0019, -0.0282,  0.0033,  0.0263,  0.0025,  0.0329,  0.0096, -0.0201,
          0.0206, -0.0208, -0.0350,  0.0014, -0.0232, -0.0104,  0.0148, -0.0030,
         -0.0093,  0.0148, -0.0082,  0.0350,  0.0224,  0.0181, -0.0396, -0.0071,
          0.0233,  0.0006]])

Output prediction probabilities:
tensor([[0.0191, 0.0197, 0.0192, 0.0206, 0.0204, 0.0202, 0.0198, 0.0198, 0.0200,
         0.0197, 0.0198, 0.0200, 0.0205, 0.0207, 0.0202, 0.0204, 0.0196, 0.0202,
         0.0199, 0.0195, 0.0203, 0.0205, 0.0195, 0.0192, 0.0200, 0.0195, 0.0201,
         0.0206, 0.0201, 0.0207, 0.0202, 0.0196, 0.0204, 0.0196, 0.0193, 0.0201,
         0.0196, 0.0198, 0.0203, 0.0200, 0.0198, 0.0203, 0.0199,

# training function

In [10]:
def train_step(model: torch.nn.Module, 
               dataloader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               optimizer: torch.optim.Optimizer):
    # Put model in train mode
    model.train()
    
    # Setup train loss and train accuracy values
    train_loss, train_acc = 0, 0
    
    # Loop through data loader data batches
    for batch, (X, y) in enumerate(dataloader):
        # Send data to target device
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)

        # 2. Calculate  and accumulate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item() 

        # 3. Optimizer zero grad
        optimizer.zero_grad()

        # 4. Loss backward
        loss.backward()

        # 5. Optimizer step
        optimizer.step()

        # Calculate and accumulate accuracy metric across all batches
        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item()/len(y_pred)

    # Adjust metrics to get average loss and accuracy per batch 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

In [11]:
def test_step(model: torch.nn.Module, 
              dataloader: torch.utils.data.DataLoader, 
              loss_fn: torch.nn.Module):
    # Put model in eval mode
    model.eval() 
    
    # Setup test loss and test accuracy values
    test_loss, test_acc = 0, 0
    
    # Turn on inference context manager
    with torch.inference_mode():
        # Loop through DataLoader batches
        for batch, (X, y) in enumerate(dataloader):
            # Send data to target device
            X, y = X.to(device), y.to(device)
    
            # 1. Forward pass
            test_pred_logits = model(X)

            # 2. Calculate and accumulate loss
            loss = loss_fn(test_pred_logits, y)
            test_loss += loss.item()
            
            # Calculate and accumulate accuracy
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == y).sum().item()/len(test_pred_labels))
            
    # Adjust metrics to get average loss and accuracy per batch 
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    return test_loss, test_acc

In [17]:
# 1. Take in various parameters required for training and test steps
def train(model: torch.nn.Module, 
          train_dataloader: torch.utils.data.DataLoader, 
          test_dataloader: torch.utils.data.DataLoader, 
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5):
    
    # 2. Create empty results dictionary
    results = {"train_loss": [],
        "train_acc": [],
        "test_loss": [],
        "test_acc": []
    }
    
    # 3. Loop through training and testing steps for a number of epochs
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer)
        test_loss, test_acc = test_step(model=model,
            dataloader=test_dataloader,
            loss_fn=loss_fn)
        
        # 4. Print out what's happening
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}"
        )

        # 5. Update results dictionary
        results["train_loss"].append(train_loss)
        results["train_acc"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_acc"].append(test_acc)

    # 6. Return the filled results at the end of the epochs
    return results

In [None]:
# Set random seeds
torch.manual_seed(42) 
torch.cuda.manual_seed(42)

# Set number of epochs
NUM_EPOCHS = 60

# Recreate an instance of TinyVGG
model = CNNmodel(input_shape=3, # number of color channels (3 for RGB) 
                  output_shape=len(train_data.classes)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss() #include softmax
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)

# Start the timer
from timeit import default_timer as timer 
start_time = timer()

# Train model_0 
model_results = train(model=model, 
                        train_dataloader=train_dataloader,
                        test_dataloader=test_dataloader,
                        optimizer=optimizer,
                        loss_fn=loss_fn, 
                        epochs=NUM_EPOCHS)

# End the timer and print out how long it took
end_time = timer()
print(f"Total training time: {end_time-start_time:.3f} seconds")

model_results

  0%|          | 0/60 [00:00<?, ?it/s]

Epoch: 1 | train_loss: 3.8573 | train_acc: 0.0379 | test_loss: 3.7794 | test_acc: 0.0312
