# **Facial Expression Recognition Competition (15%)**
For this competition, we will use the a facial classification(https://cloudstor.aarnet.edu.au/plus/s/8J44RsLu7uyRzhd) dataset. The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centred and occupies about the same amount of space in each image. You can download the CSV from this link (https://drive.google.com/file/d/1B_3ABybPrJKSkGJNSSJwctQijYOHcJZu/view)

The task is to categorize each face based on the emotion shown in the facial expression into one of seven categories (0: Angry, 1: Disgust, 2: Fear, 3: Happy, 4: Sad, 5: Surprise, 6: Neutral). The training set consists of 28,709 examples and the public test set consists of 3,589 examples.

We provide baseline code that includes the following features:

*   Loding and Analysing the FER-2013 dataset using torchvision.
*   Defining a simple convolutional neural network.
*   How to use existing loss function for the model learning.
*   Train the network on the training data.
*   Test the trained network on the testing data.
*   Generate prediction for the random test image(s).

The following changes could be considered:
-------
1. Change of advanced training parameters: Learning Rate, Optimizer, Batch-size, Number of Max Epochs, and Drop-out.
2. Use of a new loss function.
3. Data augmentation
4. Architectural Changes: Batch Normalization, Residual layers, Attention Block, and other varients.

Marking Rules:
-------
We will mark the competition based on the final test accuracy on testing images and your report.

Final mark (out of 50) = acc_mark + efficiency mark + report mark
###Acc_mark 10:

We will rank all the submission results based on their test accuracy. Zero improvement over the baseline yields 0 marks. Maximum improvement over the baseline will yield 10 marks. There will be a sliding scale applied in between.

###Efficiency mark 10:

Efficiency considers not only the accuracy, but the computational cost of running the model (flops: https://en.wikipedia.org/wiki/FLOPS). Efficiency for our purposes is defined to be the ratio of accuracy (in %) to Gflops. Please report the computational cost for your final model and include the efficiency calculation in your report. Maximum improvement over the baseline will yield 10 marks. Zero improvement over the baseline yields zero marks, with a sliding scale in between.

###Report mark 30:
Your report should comprise:
1. An introduction showing your understanding of the task and of the baseline model: [10 marks]

2. A description of how you have modified aspects of the system to improve performance. [10 marks]

A recommended way to present a summary of this is via an "ablation study" table, eg:

|Method1|Method2|Method3|Accuracy|
|---|---|---|---|
|N|N|N|60%|
|Y|N|N|65%|
|Y|Y|N|77%|
|Y|Y|Y|82%|

3. Explanation of the methods for reducing the computational cost and/or improve the trade-off between accuracy and cost: [5 marks]

4. Limitations/Conclusions: [5 marks]


In [1]:
##################################################################################################################################
### Subject: Computer Vision
### Year: 2023
### Student Name: Hugh, Adrian
### Student ID: a1829716, a1777746
### Comptetion Name: Facial Expression Recognition/Classification
### Final Results:
### ACC:         FLOPs:
##################################################################################################################################

In [2]:
# Importing libraries.

import torch
import torchvision
import tarfile
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# To avoid non-essential warnings
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline
from tqdm import tqdm
import torchvision.transforms as T
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data import Dataset, DataLoader

In [3]:
# Mounting G-Drive to get your dataset.
# To access Google Colab GPU; Go To: Edit >>> Network Settings >>> Hardware Accelarator: Select GPU.
# Reference: https://towardsdatascience.com/google-colab-import-and-export-datasets-eccf801e2971
from google.colab import drive
drive.mount('/content/drive')




Mounted at /content/drive


In [5]:
improt os
# # Dataset path.
# adrians_dir ='/content/drive//MyDrive/Colab Notebooks/cv-assignment-4/fer2013.csv'
# data_directory ='/content/drive/MyDrive/Collab Notebooks/cv-assignment-4/fer2013.csv'
# # Dataset path exists
# os.path.exists('/content/drive/MyDrive/Colab Notebooks/cv-assignment-4/fer2013.csv')

NameError: ignored

In [None]:
# Reading the dataset file using Pandas read_csv function and print the first
# 5 samples.
#
# Reference: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
# data_df = pd.read_csv(adrians_dir)
data_df = pd.read_csv(data_directory)
data_df.head(4)

# Mapping of the Facial Expression Labels.
Labels = {
    0:'Angry',
    1:'Disgust',
    2:'Fear',
    3:'Happy',
    4:'Sad',
    5:'Surprise',
    6:'Neutral'
}

In [None]:
# Categorizing the dataset to three categories.
# Training: To train the model.
# PrivateTest: To test the train model; commonly known as Validation.
# PublicTest: To test the final model on Test set to check how your model perfomed. Do not use this data as your validation data.
train_df = data_df[data_df['Usage']=='Training']
valid_df = data_df[data_df['Usage']=='PublicTest']
test_df = data_df[data_df['Usage']=='PrivateTest']
print(train_df.head())
print(valid_df.head(-1))

In [None]:
# Test-check to see wether usage labels have been allocated to the dataset/not.
valid_df = valid_df.reset_index(drop=True)
test_df = test_df.reset_index(drop=True)
print(test_df.head())
print('   -----   -------    -------    --------     -----    -------')
print(valid_df.head())

In [None]:
# Preview of the training sample and associated labels.
def show_example(df, num):
    expression_index = int(df.loc[num, ['emotion']])
    print(expression_index)

    print('expression: ', Labels[expression_index])
    image = np.array([[int(i) for i in x.split()] for x in df.loc[num, ['pixels']]])
    image = image.reshape(48,48)
    plt.imshow(image, interpolation='nearest', cmap='gray')
    plt.show()

In [None]:
show_example(train_df, 107)
show_example(train_df, 343)

In [None]:
import random
import PIL

# Normalization of the train and validation data.
class expressions(Dataset):
    def __init__(self, df, transforms=None, augment_transforms=None, augment_size = 0.2):
        self.df = df.copy()
        self.df['pixels'] = self.df['pixels'].apply(lambda x: np.array(x.split(), dtype=np.uint8))

        self.augment_transforms = augment_transforms
        self.transforms = transforms

        # augment the dataset if specified
        if augment_transforms is not None:
            num_rows = len(self.df)
            num_rows_to_loop = int(augment_size * num_rows)  # % of the rows based on augment size
            row_indices = random.sample(range(num_rows), num_rows_to_loop)
            print('shape before augmentation:', self.df['pixels'].shape)
            augmented_data = []
            for index in row_indices:
                row = self.df.loc[index]
                image = np.array(row['pixels'], dtype=np.uint8).reshape(48, 48, 1)
                augmented_image = augment_transforms(PIL.Image.fromarray(image.squeeze(), mode='L'))
                augmented_image_np = np.array(augmented_image)
                augmented_row = {
                    'emotion': row['emotion'],
                    'pixels': augmented_image_np,
                    'Usage': row['Usage']
                }
                augmented_data.append(augmented_row)
            augmented_df = pd.DataFrame(augmented_data)
            # Concatenate the original and augmented_df
            self.df = pd.concat([self.df, augmented_df], ignore_index=True)
            print('shape after augmentation:', self.df['pixels'].shape)

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        row = self.df.loc[index]
        image = np.array(row['pixels'])
        label = row['emotion']
        image = np.asarray(image).reshape(48, 48, 1)

        # transform image as it is retreived
        if self.transforms:
            image = self.transforms(image)

        return image, label


In [None]:
#import albumentations as A
stats = ([0.5],[0.5])

In [None]:
train_tsfm = T.Compose([
    T.ToPILImage(),
    T.Grayscale(num_output_channels=1),
    T.ToTensor(),
    T.Normalize(*stats,inplace=True),
])
valid_tsfm = T.Compose([
    T.ToPILImage(),
    T.Grayscale(num_output_channels=1),
    T.ToTensor(),
    T.Normalize(*stats,inplace=True)
])

# NOTE: augmentations defined here
augment_tsfm = T.Compose([
    T.RandomHorizontalFlip(),
    T.RandomRotation(10),
    T.RandomAffine(degrees=0, translate=(0.1, 0.1)), #  affine transormation includes rotation, translation, scaling, and shearing operations; degrees = 0 does not apply a rotation
    # T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1) # randomly adjusts the brightness, contrast, saturation, and hue of the image.
])


In [None]:
valid_ds = expressions(valid_df, valid_tsfm)
test_ds = expressions(test_df, valid_tsfm)
train_ds = expressions(train_df, train_tsfm)

In [None]:
import math
def show_df_pixels(df, sample_size=10):
    images = df.sample(sample_size)['pixels'].values
    num_images = len(images)
    num_cols = 10
    num_rows = math.ceil(num_images / num_cols)

    fig, axs = plt.subplots(num_rows, num_cols, figsize=(12, 6))
    fig.subplots_adjust(hspace=0, wspace=0)

    for i, ax in enumerate(axs.flat):
        ax.imshow(np.array(images[i]).reshape(48, 48), cmap='gray')
        ax.axis('off')
        if i >= num_images - 1:
            break

    plt.show()

In [None]:

print('train set size', train_ds.df.shape)
print('augmented validation set size', valid_ds.df.shape)

print('non augmented df')
show_df_pixels(train_ds.df.tail(10),10)
print('augmented df')
show_df_pixels(train_ds_augmented.df.tail(10), 10)

In [None]:
# Evaluation metric - Accuracy in this case.
import torch.nn.functional as F
input_size = 48*48
output_size = len(Labels)

def accuracy(output, labels):
    predictions, preds = torch.max(output, dim=1)
    return torch.tensor(torch.sum(preds==labels).item()/len(preds))

In [7]:
# Expression model class for training and validation purpose.

class expression_model(nn.Module):
    def training_step(self, batch):
        images, labels = batch
        out = self(images)
        loss = F.cross_entropy(out, labels)
        return loss

    def validation_step(self, batch):
        images, labels = batch
        out = self(images)
        loss = F.cross_entropy(out, labels)
        acc = accuracy(out, labels)
        return {'val_loss': loss.detach(), 'val_acc': acc}

    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()
        batch_acc = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_acc).mean()
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

    def epoch_end(self, epoch, result):
        print("Epoch[{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))

In [8]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')

device = get_default_device()
print(f'Running on device {device}') # cuda = gpu

Running on device cuda


In [9]:
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device

    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl:
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

In [None]:
# Basic model - 1 layer
simple_model = nn.Sequential(
    nn.Conv2d(1, 8, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(2, 2)
)

In [10]:
batch_size = 400
train_dataset = DataLoader(train_ds, batch_size, shuffle=True, num_workers=2)
train_dataset_augmented = DataLoader(train_ds_augmented, batch_size, shuffle=True, num_workers=2)
valid_dataset = DataLoader(valid_ds, batch_size, num_workers=2)
test_dataset = DataLoader(test_ds, batch_size, num_workers=2)

train_dl = DeviceDataLoader(train_dataset, device)
valid_dl = DeviceDataLoader(valid_dataset, device)
test_dl = DeviceDataLoader(test_dataset, device)

NameError: ignored

In [None]:
for images, labels in train_dl:
    print('images.shape:', images.shape)
    out = simple_model.to(device)(images)
    print('out.shape:', out.shape)
    break

In [11]:
# Model - 7 layer
class expression(expression_model):
    def __init__(self,classes=7):
        super().__init__()
        self.num_classes = classes
        self.network = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=3, padding=1),  #(input channels, output channels)
            nn.ReLU(),
            nn.Conv2d(8, 32, kernel_size=3, padding=1),  #(input channels, output channels)
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 64 x 24 x 24

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 128 x 12 x 12

            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 256 x 6 x 6

            nn.Flatten(),
            nn.Linear(256*6*6, 2304),
            nn.ReLU(),
            nn.Linear(2304, 1152),
            nn.ReLU(),
            nn.Linear(1152, 576),
            nn.ReLU(),
            nn.Linear(576,288),
            nn.ReLU(),
            nn.Linear(288,144),
            nn.ReLU(),
            nn.Linear(144,self.num_classes))
            # 144 to 6


    def forward(self, xb):
        return self.network(xb)

In [16]:
model = expression()
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total trainable parameters: {total_params}")

total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")

Total trainable parameters: 25891863
Total parameters: 25891863


In [15]:
# !wget -c https://cloudstor.aarnet.edu.au/plus/s/hXo1dK9SZqiEVn9/download
# !mv download FLOPs_counter.py
# !rm -rf download
from FLOPs_counter import print_model_parm_flops
# default_input_size = torch.randn(1, 1, 48, 48) # model input size (N, C, H, W)
def computeFlops(model, input):
  print_model_parm_flops(model, input, detail=False)

ModuleNotFoundError: ignored

In [None]:
# Plots for accuracy and loss during training period.
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs')
    plt.show()

def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs')
    plt.show()

In [None]:
"""This function is responsible for evaluating the model's performance on a validation dataset"""
@torch.no_grad()
def evaluate(model, valid_dl):
    model.eval() # disable any regularization techniques that might affect the evaluation
    outputs = [model.validation_step(batch) for batch in valid_dl]
    return model.validation_epoch_end(outputs)

def train(epochs, lr, model, train_dl, valid_dl, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase
        model.train()
        train_losses = []
        for batch in train_dl:
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, valid_dl)
        print(f'epoch {epoch} result {result}')
        result['train_loss'] = torch.stack(train_losses).mean().item()
        model.epoch_end(epoch, result)
        history.append(result)
    return history

In [None]:
from torch.cuda.amp import GradScaler, autocast
from torch.optim.lr_scheduler import StepLR
""" improved training function with optional parameters """
# train_gpu(num_epochs, lr, model_3, train_dl, valid_dl, opt_fn, stop_on_convergence=False, lr_scheduler=True, display_running_loss=True, convergence_threshold=0.001)
def train_gpu(epochs, lr, model, train_dl, valid_dl, opt_func, stop_on_convergence=True, convergence_threshold=0.001, display_running_loss=True):
    history = []
    optimizer = opt_func(model.parameters(), 0.001) if lr is None else opt_func(model.parameters(), lr)
    last_loss = None
    scaler = GradScaler()
    for epoch in range(epochs):
        model.train()
        running_loss = 0
        train_losses = []
        epoch_quarter = len(train_dl) // 4
        if lr is None:
          scheduler = StepLR(optimizer, step_size=7, gamma=0.1) # LR scheduler

        for i, batch in enumerate(train_dl):
            # Move batch to GPU
            inputs, targets = batch
            inputs = inputs.to('cuda')
            targets = targets.to('cuda')

            # Forward and backward pass with mixed precision
            optimizer.zero_grad()
            with torch.cuda.amp.autocast():
                loss = model.training_step((inputs, targets))
                scaler.scale(loss).backward()

            scaler.step(optimizer)
            scaler.update()

            running_loss += loss.item()
            train_losses.append(loss.detach())

            if display_running_loss and (i+1) % epoch_quarter == 0:  # print culmulative sum of losses every quarter epoch
              print(f'Running loss: {running_loss}')
              running_loss = 0

        # learning rate scheduling
        if lr is None:
          scheduler.step()

        # Validation
        result = evaluate(model, valid_dl)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        model.epoch_end(epoch, result)
        history.append(result)

         # Stop condition
        if stop_on_convergence and last_loss is not None and abs(last_loss - result['val_loss']) < convergence_threshold:
            print(f'stopping training as convergence threshold reached after {epoch} epochs with loss {last_loss}')
            break
        last_loss = result['val_loss']
    return history

In [None]:
def getDataLoaders(train_ds, valid_ds, test_ds, batch_size):
  train_dataset = DataLoader(train_ds, batch_size, shuffle=True, num_workers=2)
  valid_dataset = DataLoader(valid_ds, batch_size, num_workers=2)
  test_dataset = DataLoader(test_ds, batch_size, num_workers=2)

  train_dl = DeviceDataLoader(train_dataset, device)
  valid_dl = DeviceDataLoader(valid_dataset, device)
  test_dl = DeviceDataLoader(test_dataset, device)
  return train_dl, valid_dl, test_dl

In [None]:
def evaluateModel(model_,history_, test_dl, batch_size):
  res = evaluate(model_, test_dl) # final result
  print(f"test accuracy: {res['val_acc']}")
  input = torch.randn(batch_size, 1, 48, 48)
  input = input.to(device)
  computeFlops(model_, input)
  plot_accuracies(history_)
  plot_losses(history_)

### Changes to consider
- [x] Learning Rate
- [x] Optimizer (Nadam is used but could trial other ones)
- [x] Batch-size
- [x] Number of Max Epochs
- [x] Dropout
- [ ] Use of a new loss function (requires creating new evaluate function for new loss function)
- [x] Data augmentation
- [x] Architectural Changes:
  - [x] Batch Normalization
  - [x] Residual layers
- [ ] Use Pretrained Models: Transfer learning

In [None]:
# Train 1: find best learning rates
opt_fn=torch.optim.Adam
batch_size = 256
train_dl, valid_dl, test_dl = getDataLoaders(train_ds, valid_ds, test_ds, batch_size)
num_epochs = 15

learning_rates = [None, 0.001, 0.01]
train_0_histories = []

for idx, lr in enumerate(learning_rates):
    model_name = f'model_{idx}'
    print(f'training {model_name} with lr {lr}')
    model = to_device(expression(classes = 7), device)
    model_history = train_gpu(num_epochs, lr, model, train_dl, valid_dl, opt_fn, stop_on_convergence=True)
    train_0_histories.append([model.to(device), model_history])

In [None]:
for i, lr in enumerate(learning_rates):
    print(f'evaluating model {i} with lr {lr}')
    model  = train_0_histories[i][0]
    history = train_0_histories[i][1]
    evaluateModel(model, history, test_dl, batch_size)

### Model 0: tuning Learning Rate
Learning rate scheduling scored the highest accuracy at 56%

efficiency = accuracy % / Gflops
# Learning rate summary
|model|lr|epochs needed to converge|test_accuracy|flops|efficiency|
|--------|---------|--|------|-------|----|
| model0 |scheduler|14|0.5636|13.28G |4.25|
| model1 |0.001    |8 |0.5443|13.28G |4.09|
| model2 |0.01     |7 |0.2550|13.28G |1.92|



In [None]:
# Train 2: find best batch size
opt_fn=torch.optim.Adam
batch_size = 256
num_epochs = 15

learning_rate = None # lr scheduling
batch_sizes = [64,128,256]
train_1_histories = []

for idx, batch_size in enumerate(batch_sizes):
    model_name = f'model_{idx}'
    print(f'training {model_name} with batch size {batch_size}')
    train_dl, valid_dl, test_dl = getDataLoaders(train_ds, valid_ds, test_ds, batch_size)
    model = to_device(expression(classes = 7), device)
    model_history = train_gpu(num_epochs,learning_rate, model, train_dl, valid_dl, opt_fn, stop_on_convergence=True)
    train_1_histories.append([model.to(device), model_history])

In [None]:
for i, bs in enumerate(batch_sizes):
    print(f'evaluating model {i} with batch size {bs}')
    model  = train_1_histories[i][0]
    history = train_1_histories[i][1]
    evaluateModel(model, history, test_dl, bs)

### Train 2: tuning batch size
A batch size of 128 seems  strikes a good balance in terms of accuracy

|model|batch size|epochs|test_accuracy|flops|efficiency|
|--------|---|--|------|-------|-----|
| model0 |64 |15|0.5592|3.77G  |14.83|
| model1 |128|15|0.5908|6.94G  |08.50|
| model2 |256|15|0.5530|13.28G |04.16|



In [None]:
# Train 3: choosing best loss function
num_epochs = 15
learning_rate = None
loss_functions = [torch.optim.Adam, torch.optim.SGD, torch.optim.RMSprop]
train_3_histories = []

for idx, loss_fn in enumerate(loss_functions):
    model_name = f'model_{idx}'
    print(f'training {model_name} with loss function {loss_fn}')
    train_dl, valid_dl, test_dl = getDataLoaders(train_ds, valid_ds, test_ds, batch_size)
    model = to_device(expression(classes = 7), device)
    model_history = train_gpu(num_epochs,learning_rate, model, train_dl, valid_dl, opt_fn, stop_on_convergence=True)
    train_3_histories.append([model.to(device), model_history])

In [None]:
for i, loss_fn in enumerate(loss_functions):
    print(f'evaluating model {i} with loss function {loss_fn}')
    model  = train_3_histories[i][0]
    history = train_3_histories[i][1]
    evaluateModel(model, history, test_dl, batch_size)

### Train 3: choosing best optimizer
Both SGD and adam provide best optimization with similar performance, however Adam appears to be more stable as it has less fluctuations and will be used

|model|optimizer|epochs|test_accuracy|flops|efficiency|
|--------|---|--|------|-------|-----|
| model0 |Adam   |15|0.5700|6.94G  |8.22|
| model1 |SGD    |15|0.5751|6.94G  |8.29|
| model2 |RMSprop|15|0.5488|6.94G |7.92|



# changing neural network architecture
model has been altered to include Batch Normalization after each convolution and Dropout after each max pooling. The dropout rate of 0.5 means that approximately half of the neurons in the layer will be "turned off" and outputs are set to 0during each forward pass. This should help prevent overfitting and improve generalisation

In [17]:
class expressionsModified(expression_model):
    def __init__(self, classes=7):
        super().__init__()
        self.num_classes = classes
        self.network = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=3, padding=1), #(input channels, output channels)
            nn.BatchNorm2d(8),
            nn.ReLU(),
            nn.Conv2d(8, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 64 x 24 x 24
            nn.Dropout(0.25),

            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 128 x 12 x 12
            nn.Dropout(0.25),

            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2), # output: 256 x 6 x 6
            nn.Dropout(0.25),

            nn.Flatten(),
            nn.Linear(256*6*6, 2304),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2304, 1152),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(1152, 576),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(576,288),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(288,144),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(144,self.num_classes)
        )

    def forward(self, xb):
        return self.network(xb)

In [18]:
model = expressionsModified()
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total trainable parameters: {total_params}")

total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")

Total trainable parameters: 25893607
Total parameters: 25893607


In [None]:
# Train 4: training on modified arhictecture with dropout and batch normalization applied
num_epochs = 40
batch_size = 128
learning_rate = None
loss_fn = torch.optim.Adam
train_dl, valid_dl, test_dl = getDataLoaders(train_ds, valid_ds, test_ds, batch_size)
model_4 = to_device(expressionsModified(classes = 7), device)
model_4_history = train_gpu(num_epochs,learning_rate, model_4, train_dl, valid_dl, opt_fn, stop_on_convergence=True)

In [None]:
evaluateModel(model_4, model_4_history, test_dl, batch_size)

In [None]:
# Train 5: training best performing model using data augmentation with of size 150% of the training dataset
train_ds_augmented = expressions(train_df, train_tsfm, augment_tsfm, augment_size = 0.5)
num_epochs = 60
batch_size = 128
learning_rate = None
loss_fn = torch.optim.Adam
train_dl, valid_dl, test_dl = getDataLoaders(train_ds_augmented, valid_ds, test_ds, batch_size)
model_5 = to_device(expressions(classes = 7), device)
model_5_history = train_gpu(num_epochs,learning_rate, model_5, train_dl, valid_dl, opt_fn, stop_on_convergence=True)

In [None]:
evaluateModel(model_5, model_5_history, test_dl, batch_size)

In [None]:
# Train 6: training best performing model using data augmentation with of size 150% of the training dataset
num_epochs = 20
batch_size = 128
learning_rate = None
loss_fn = torch.optim.Adam
train_dl, valid_dl, test_dl = getDataLoaders(train_ds_augmented, valid_ds, test_ds, batch_size)
model_6 = to_device(expressionsModified(classes = 7), device)
model_6_history = train_gpu(num_epochs,learning_rate, model_6, train_dl, valid_dl, opt_fn, stop_on_convergence=True)

In [None]:
evaluateModel(model_6, model_6_history, test_dl, batch_size)

Best_model = model_5 (model with the highest accuracy)

| Model   | Architecture | Epochs | Train Accuracy | FLOPs | Efficiency | Augmented Dataset |
|---------|--------------|--------|----------------|-------|------------|-------------------|
| model4  | Modified     | 40     | 0.3858         | 6.94G | 5.56       | N                 |
| model5  | Baseline     | 17     | 0.6057         | 6.94G | 8.73       | Y                 |
| model6  | Modified     | 25     | 0.5554         | 6.94G | 8.00       | Y                 |


In [None]:
# Prediction function to evaluate the model.
def predict_image(img, model):
    xb = img.unsqueeze(0)
    yb = model(xb)
    _, preds  = torch.max(yb, dim=1)
    return Labels[preds[0].item()]

In [None]:

best_model = model_5
img, label = test_ds[0]
plt.imshow(img[0], interpolation='nearest', cmap='gray')
img = img.to(device)
print('Label:', Labels[label], ', Predicted:', predict_image(img, best_model))

In [None]:
# making 10 predictions
img_index = 110
for i in range(10):
  img, label = test_ds[img_index + i]
  plt.imshow(img[0], interpolation='nearest', cmap='gray')
  plt.show()  # Display the image
  img = img.to(device)
  print('Label:', Labels[label], ', Predicted:', predict_image(img, best_model), '\n')

In [None]:
!sudo apt-get install texlive-xetex texlive-fonts-recommended
!jupyter nbconvert --to pdf --no-input --TagRemovePreprocessor.remove_cell_tags 'remove-cell' '/content/drive/MyDrive/Collab Notebooks/cv-assignment-4/Assignment4.ipynb'
!jupyter nbconvert  --to pdf '/content/drive/MyDrive/Collab Notebooks/cv-assignment-3/Assignment3.ipynb'