## 0. Notebook description

Based on the findings of the previous notebooks, this notebook uses 6 convolutional layers. It also uses **average pooling** in all layers except the first. No changes were made to any other parameters.

## 1. Importing libraries and loading the data

In [21]:
# Import Libraries
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms
from torch.utils.tensorboard import SummaryWriter
from sklearn.metrics import classification_report
from utils.preprocessing import load_data, ReshapeAndScale, create_dataloaders
from utils.fer2013_dataset import Fer2013Dataset

First, we load the data into a pandas dataframe.

In [22]:
train_df = load_data("data/oversampled_train.csv")
val_df = load_data("data/validation.csv")

print("Training Data")
print(train_df.head(10))

print("\n\nValidation Data")
print(val_df.head(10))

   emotion                                             pixels
0        0  70 80 82 72 58 58 60 63 54 58 60 48 89 115 121...
1        0  151 150 147 155 148 133 111 140 170 174 182 15...
2        2  231 212 156 164 174 138 161 173 182 200 106 38...
3        4  24 32 36 30 32 23 19 20 30 41 21 22 32 34 21 1...
4        6  4 0 0 0 0 0 0 0 0 0 0 0 3 15 23 28 48 50 58 84...
5        2  55 55 55 55 55 54 60 68 54 85 151 163 170 179 ...
6        4  20 17 19 21 25 38 42 42 46 54 56 62 63 66 82 1...
7        3  77 78 79 79 78 75 60 55 47 48 58 73 77 79 57 5...
8        3  85 84 90 121 101 102 133 153 153 169 177 189 1...
9        2  255 254 255 254 254 179 122 107 95 124 149 150...


In [23]:
print("Training data shape:", train_df.shape)
print("Validation data shape", val_df.shape)

(28709, 2)

## 2. Define a custom dataset
We define a custom PyTorch dataset class, `Fer2013Dataset`, for handling the FER2013 data. The dataset is designed to load images (stored as pixel strings) and their corresponding emotion labels. It also supports optional transformations to preprocess the images during training. This setup makes it easy to integrate the dataset with PyTorch DataLoaders.

The class is contained in the `utils/fer2013_dataset` file

In [26]:
# Define a default transformation pipeline
transform = transforms.Compose([
    ReshapeAndScale(n_rows=48, n_cols=48),
    transforms.Normalize(mean=[0.5], std=[0.5])  # Normalize to [-1, 1]
])

In [27]:
train_dataset = Fer2013Dataset(train_df, train_df['emotion'], transform=transform)
val_dataset = Fer2013Dataset(val_df, val_df['emotion'], transform=transform)

### Create DataLoaders

We create DataLoaders for both subsets to enable batch processing. The training DataLoader shuffles the data for better learning, while the validation DataLoader does not. Finally, we print the shapes of the batches to verify that everything works correctly.

In [28]:
batch_size = 32
train_loader, val_loader = create_dataloaders(train_dataset, val_dataset, batch_size)

Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Size([32])
Train Batch Shape: torch.Size([32, 1, 48, 48]) Train Labels Shape: torch.Siz

## 3. Define the CNN model

We define a custom Convolutional Neural Network (CNN) for emotion recognition. The model includes multiple convolutional layers with batch normalization, dropout for regularization, max pooling for downsampling, and fully connected layers for classification. 

The network dynamically calculates the flattened size needed for the fully connected layers based on the input size (48x48 grayscale images). Finally, we instantiate the model, move it to the available device (CPU or GPU), and print its architecture for verification.


In [29]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # Shortcut connection
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        residual = self.shortcut(x)
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.bn2(self.conv2(x))
        x += residual
        return F.relu(x)

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()

        # 1st Conv Layer
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding='valid')
        self.bn1 = nn.BatchNorm2d(32)
        self.dropout1 = nn.Dropout(0.25)

        # 2nd Residual Block
        self.res_block1 = ResidualBlock(32, 64, stride=1)
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)

        # 3rd Residual Block
        self.res_block2 = ResidualBlock(64, 128, stride=1)
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)

        # Calculate the flatten size dynamically
        self.flatten_size = self._get_flatten_size()

        # Fully Connected Layers
        self.fc1 = nn.Linear(self.flatten_size, 250)
        self.dropout_fc = nn.Dropout(0.5)
        self.fc2 = nn.Linear(250, 7)  # 7 classes for emotion recognition

    def _get_flatten_size(self):
        dummy_input = torch.zeros(1, 1, 48, 48)
        dummy_output = self._forward_conv_layers(dummy_input)
        return dummy_output.numel()

    def _forward_conv_layers(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.dropout1(x)

        x = self.res_block1(x)
        x = self.pool1(x)

        x = self.res_block2(x)
        x = self.pool2(x)

        return x

    def forward(self, x):
        x = self._forward_conv_layers(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = F.relu(self.fc1(x))
        x = self.dropout_fc(x)
        x = self.fc2(x)
        return x

# Instantiate the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN()
model = model.to(device)

# Print model to verify layers
print(model)

CNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=valid)
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout1): Dropout(p=0.25, inplace=False)
  (res_block1): ResidualBlock(
    (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (shortcut): Sequential(
      (0): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (pool1): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (res_block2): ResidualBlock(
    (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affi

## 4. Define Loss Function and Optimizer

In this cell, we define the loss function and optimizer for training the model:
- **Loss Function**: `CrossEntropyLoss` is used, which is well-suited for multi-class classification tasks like emotion recognition.
- **Optimizer**: The Adam optimizer is initialized with a learning rate of `0.0001` to update the model parameters during training.

In [30]:
criterion = nn.CrossEntropyLoss()  # Loss function for classification
optimizer = optim.Adam(model.parameters(), lr=0.0001)  # call optimizer

## 5. Train the Model

In this cell, we define the training loop for the CNN:
- **Number of Epochs**: The model is trained for 35 epochs.
- **Training Process**:
  - The model is set to training mode.
  - For each batch, we move inputs and labels to the appropriate device, clear the gradients, perform forward and backward passes, and update the model's parameters using the optimizer.
  - The running loss is tracked and printed every 100 batches for monitoring.

In [31]:
# Initialize TensorBoard writer
writer = SummaryWriter(log_dir='facial-expression-detection/10_Group17_DLProject')

checkpoint_path = 'model-checkpoints/10_Group17_DLProject'
os.makedirs(checkpoint_path, exist_ok=True)

num_epochs = 35
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i + 1) % 100 == 0:  # Print every 100 batches
            avg_loss = running_loss / 100
            print(f"Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {avg_loss:.4f}")
            running_loss = 0.0

            # Log loss to TensorBoard
            writer.add_scalar('Loss/train', avg_loss, epoch)

    # Save checkpoint every 10 epochs
    if (epoch + 1) % 10 == 0:
        checkpoint = {
            'epoch': epoch + 1,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': running_loss,
        }
        torch.save(checkpoint, f"{checkpoint_path}/checkpoint_epoch_{epoch + 1}.pth")
        print(f"Checkpoint saved at epoch {epoch + 1}")

# Close the writer after training
writer.close()

Epoch [1/35], Step [100/1263], Loss: 1.9440
Epoch [1/35], Step [200/1263], Loss: 1.8463
Epoch [1/35], Step [300/1263], Loss: 1.7925
Epoch [1/35], Step [400/1263], Loss: 1.7204
Epoch [1/35], Step [500/1263], Loss: 1.6912
Epoch [1/35], Step [600/1263], Loss: 1.6227
Epoch [1/35], Step [700/1263], Loss: 1.6159
Epoch [1/35], Step [800/1263], Loss: 1.5517
Epoch [1/35], Step [900/1263], Loss: 1.5237
Epoch [1/35], Step [1000/1263], Loss: 1.5067
Epoch [1/35], Step [1100/1263], Loss: 1.4825
Epoch [1/35], Step [1200/1263], Loss: 1.4540
Epoch [2/35], Step [100/1263], Loss: 1.4060
Epoch [2/35], Step [200/1263], Loss: 1.3897
Epoch [2/35], Step [300/1263], Loss: 1.3666
Epoch [2/35], Step [400/1263], Loss: 1.3628
Epoch [2/35], Step [500/1263], Loss: 1.3433
Epoch [2/35], Step [600/1263], Loss: 1.3362
Epoch [2/35], Step [700/1263], Loss: 1.3292
Epoch [2/35], Step [800/1263], Loss: 1.3171
Epoch [2/35], Step [900/1263], Loss: 1.3280
Epoch [2/35], Step [1000/1263], Loss: 1.3077
Epoch [2/35], Step [1100/126

KeyboardInterrupt: 

## 6. Evaluate the Model

In this cell, we evaluate the trained model using the validation dataset:
- The model is set to evaluation mode, and gradient computation is disabled.
- For each batch, we perform a forward pass and predict the class labels.
- Ground truth labels and predictions are stored and used to generate a classification report using `sklearn`. This report provides precision, recall, and F1-scores for each emotion class.

In [12]:
# Switch model to evaluation mode
model.eval()

# Initialize lists to store ground truth and predictions
y_true, y_pred = [], []

# Disable gradient computation
with torch.no_grad():
    for inputs, labels in val_loader:
        # Move inputs and labels to the same device as the model
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = model(inputs)

        # Get the predicted class
        _, predicted = torch.max(outputs, 1)

        # Append ground truth and predictions to respective lists
        y_true.extend(labels.cpu().numpy())  # Convert tensors to numpy
        y_pred.extend(predicted.cpu().numpy())

# Generate the classification report
emotion_labels = {0:'Angry', 1:'Disgust', 2:'Fear', 3:'Happy', 4: 'Sad', 5: 'Surprise', 6: 'Neutral'}
print(classification_report(y_true, y_pred, target_names=list(emotion_labels.values())))


              precision    recall  f1-score   support

       Angry       0.81      0.84      0.82      1483
     Disgust       0.99      1.00      0.99      1430
        Fear       0.80      0.82      0.81      1413
       Happy       0.87      0.78      0.82      1432
         Sad       0.75      0.75      0.75      1463
    Surprise       0.91      0.94      0.93      1468
     Neutral       0.77      0.76      0.76      1412

    accuracy                           0.84     10101
   macro avg       0.84      0.84      0.84     10101
weighted avg       0.84      0.84      0.84     10101



This notebook demonstrates that using six convolutional layers combined with average pooling in all but the first layer leads to a slight improvement in performance, achieving an accuracy of 0.84. Compared to previous configurations, this setup outperforms both the five- and seven-layer models. It indicates that six layers with the use of average pooling appear to enhance generalization without unnecessary complexity.