In [1]:
# TODO: Change documentation after training: Conclusion

## 0. Notebook description

In this project, the standard convolutional layers were replaced with Residual Blocks. Residual Blocks help mitigate the vanishing gradient problem by allowing gradients to flow through shortcut connections directly. This architecture enables the training of deeper networks, improving the model's ability to learn complex features and enhancing overall performance.

Residual Blocks introduce shortcut connections that bypass one or more layers. These connections add the input of the block directly to the output. In a typical Residual Block, the input is passed through a series of convolutional layers, and the output of these layers is added to the original input.

## 1. Importing libraries and loading the data

In [2]:
# Import Libraries
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms
from torch.utils.tensorboard import SummaryWriter
from sklearn.metrics import classification_report
from utils.preprocessing import load_data, ReshapeAndScale, create_dataloaders
from utils.fer2013_dataset import Fer2013Dataset

First, we load the data into a pandas dataframe.

In [3]:
train_df = load_data("data/oversampled_train.csv")
val_df = load_data("data/validation.csv")

print("Training Data")
print(train_df.head(10))

print("\n\nValidation Data")
print(val_df.head(10))

Training Data
                                              pixels  emotion
0  208 209 210 209 209 208 206 215 162 47 44 47 5...        3
1  164 166 170 168 168 169 171 172 173 172 172 17...        2
2  246 248 248 248 244 124 45 37 45 98 141 160 16...        0
3  1 0 0 1 1 1 2 3 2 3 3 9 10 7 6 7 9 10 16 16 22...        4
4  208 201 211 209 196 192 177 177 187 186 193 19...        0
5  4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ...        2
6  255 255 255 255 255 255 255 255 253 252 221 12...        3
7  93 100 115 90 93 70 84 66 64 76 76 87 92 98 98...        3
8  219 206 211 219 217 222 204 197 198 205 192 13...        6
9  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 ...        0


Validation Data
   emotion                                             pixels
0        3  254 253 253 254 254 254 252 252 253 252 253 25...
1        4  92 70 64 65 62 64 91 139 167 181 186 187 191 1...
2        5  20 19 16 23 24 25 34 24 25 60 113 123 139 149 ...
3        4  79 82 83 84 85 89 92 90 93

In [4]:
print("Training data shape:", train_df.shape)
print("Validation data shape", val_df.shape)

Training data shape: (40404, 2)
Validation data shape (5742, 2)


## 2. Define a custom dataset
We define a custom PyTorch dataset class, `Fer2013Dataset`, for handling the FER2013 data. The dataset is designed to load images (stored as pixel strings) and their corresponding emotion labels. It also supports optional transformations to preprocess the images during training. This setup makes it easy to integrate the dataset with PyTorch DataLoaders.

The class is contained in the `utils/fer2013_dataset` file

We also apply the following preprocessing steps to convert the data into the desired format we can further work with:
1. **Reshaping**:
   - Convert the pixel string into a `48x48` matrix for visualization and processing.
2. **Scaling**:
   - Scale pixel values to the range `[0, 1]` by dividing the pixel values by 255.
3. **Normalization**:
   - Normalize pixel values to the range `[-1, 1]` by subtracting the mean and diving them by the standard deviation.


These preprocessing steps are contained in the class `ReshapeAndScale`, which is available under path `utils/preprocessing.py`.

In [5]:
# Define a default transformation pipeline
transform = transforms.Compose([
    ReshapeAndScale(n_rows=48, n_cols=48),
    transforms.Normalize(mean=[0.5], std=[0.5])  # Normalize to [-1, 1]
])

In [6]:
train_dataset = Fer2013Dataset(train_df, train_df['emotion'], transform=transform)
val_dataset = Fer2013Dataset(val_df, val_df['emotion'], transform=transform)

### Create DataLoaders

Here, we use the custom dataset to create a full dataset object and split it into training and validation sets (80% and 20%). Then, we create DataLoaders for both subsets to enable batch processing. The training DataLoader shuffles the data for better learning, while the validation DataLoader does not. Finally, we print the shapes of the batches to verify that everything works correctly.


In [7]:
batch_size = 32
train_loader, val_loader = create_dataloaders(train_dataset, val_dataset, batch_size)

## 3. Define the CNN model

We define a custom Convolutional Neural Network (CNN) for emotion recognition. The model includes multiple Residual Blocks with batch normalization, dropout for regularization, average pooling for downsampling, and fully connected layers for classification.

The network dynamically calculates the flattened size needed for the fully connected layers based on the input size (48x48 grayscale images). Finally, the model is instantiated, moved to the available device (CPU or GPU), and its architecture is printed for verification.


In [8]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # Shortcut connection
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        residual = self.shortcut(x)
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.bn2(self.conv2(x))
        x += residual
        return F.relu(x)

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()

        # 1st Conv Layer
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding='valid')
        self.bn1 = nn.BatchNorm2d(32)
        self.dropout1 = nn.Dropout(0.25)

        # 2nd Residual Block
        self.res_block1 = ResidualBlock(32, 64, stride=1)
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)

        # 3rd Residual Block
        self.res_block2 = ResidualBlock(64, 128, stride=1)
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)

        # Calculate the flatten size dynamically
        self.flatten_size = self._get_flatten_size()

        # Fully Connected Layers
        self.fc1 = nn.Linear(self.flatten_size, 250)
        self.dropout_fc = nn.Dropout(0.5)
        self.fc2 = nn.Linear(250, 7)  # 7 classes for emotion recognition

    def _get_flatten_size(self):
        dummy_input = torch.zeros(1, 1, 48, 48)
        dummy_output = self._forward_conv_layers(dummy_input)
        return dummy_output.numel()

    def _forward_conv_layers(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.dropout1(x)

        x = self.res_block1(x)
        x = self.pool1(x)

        x = self.res_block2(x)
        x = self.pool2(x)

        return x

    def forward(self, x):
        x = self._forward_conv_layers(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = F.relu(self.fc1(x))
        x = self.dropout_fc(x)
        x = self.fc2(x)
        return x

# Instantiate the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN()
model = model.to(device)

# Print model to verify layers
print(model)

CNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=valid)
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout1): Dropout(p=0.25, inplace=False)
  (res_block1): ResidualBlock(
    (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (shortcut): Sequential(
      (0): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (pool1): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (res_block2): ResidualBlock(
    (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affi

## 4. Define Loss Function and Optimizer

In this cell, we define the loss function and optimizer for training the model:
- **Loss Function**: `CrossEntropyLoss` is used, which is well-suited for multi-class classification tasks like emotion recognition.
- **Optimizer**: The Adam optimizer is initialized with a learning rate of `0.0001` to update the model parameters during training.

In [9]:
criterion = nn.CrossEntropyLoss()  # Loss function for classification
optimizer = optim.Adam(model.parameters(), lr=0.0001)  # call optimizer

## 5. Train the Model

In this cell, we define the training loop for the CNN:
- **Number of Epochs**: The model is trained for 35 epochs.
- **Training Process**:
  - The model is set to training mode.
  - For each batch, we move inputs and labels to the appropriate device, clear the gradients, perform forward and backward passes, and update the model's parameters using the optimizer.
  - The running loss is tracked and printed every 100 batches for monitoring.

In [10]:
# Initialize TensorBoard writer
writer = SummaryWriter(log_dir='tensorboard-runs/10_Group17_DLProject')

checkpoint_path = 'model-checkpoints/10_Group17_DLProject'
os.makedirs(checkpoint_path, exist_ok=True)

model_save_path = 'models'
os.makedirs(model_save_path, exist_ok=True)

num_epochs = 35
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i + 1) % 100 == 0:  # Print every 100 batches
            avg_loss = running_loss / 100
            print(f"Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {avg_loss:.4f}")
            running_loss = 0.0

            # Log loss to TensorBoard
            writer.add_scalar('Loss/train', avg_loss, epoch)

    # Save checkpoint every 10 epochs
    if (epoch + 1) % 10 == 0:
        checkpoint = {
            'epoch': epoch + 1,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': running_loss,
        }
        torch.save(checkpoint, f"{checkpoint_path}/checkpoint_epoch_{epoch + 1}.pth")
        print(f"Checkpoint saved at epoch {epoch + 1}")

# Save the final model after training
final_model_path = os.path.join(model_save_path, '10_Group17_DLProject.pth')
torch.save(model.state_dict(), final_model_path)
print(f"Final model saved at {final_model_path}")

# Close the writer after training
writer.close()

Epoch [1/35], Step [100/1263], Loss: 1.9230
Epoch [1/35], Step [200/1263], Loss: 1.8306
Epoch [1/35], Step [300/1263], Loss: 1.7811
Epoch [1/35], Step [400/1263], Loss: 1.7017
Epoch [1/35], Step [500/1263], Loss: 1.6253
Epoch [1/35], Step [600/1263], Loss: 1.5966
Epoch [1/35], Step [700/1263], Loss: 1.5648
Epoch [1/35], Step [800/1263], Loss: 1.5390
Epoch [1/35], Step [900/1263], Loss: 1.4994
Epoch [1/35], Step [1000/1263], Loss: 1.4578
Epoch [1/35], Step [1100/1263], Loss: 1.4405
Epoch [1/35], Step [1200/1263], Loss: 1.4166
Epoch [2/35], Step [100/1263], Loss: 1.3363
Epoch [2/35], Step [200/1263], Loss: 1.3444
Epoch [2/35], Step [300/1263], Loss: 1.3592
Epoch [2/35], Step [400/1263], Loss: 1.2947
Epoch [2/35], Step [500/1263], Loss: 1.2827
Epoch [2/35], Step [600/1263], Loss: 1.2776
Epoch [2/35], Step [700/1263], Loss: 1.2905
Epoch [2/35], Step [800/1263], Loss: 1.2753
Epoch [2/35], Step [900/1263], Loss: 1.2548
Epoch [2/35], Step [1000/1263], Loss: 1.2643
Epoch [2/35], Step [1100/126

## 6. Evaluate the Model

In this cell, we evaluate the trained model using the validation dataset:
- The model is set to evaluation mode, and gradient computation is disabled.
- For each batch, we perform a forward pass and predict the class labels.
- Ground truth labels and predictions are stored and used to generate a classification report using `sklearn`. This report provides precision, recall, and F1-scores for each emotion class.

In [11]:
model = CNN()
model = model.to(device)
model_path = 'models/10_Group17_DLProject.pth'
model.load_state_dict(torch.load(model_path, weights_only=True))
model.eval()

# Initialize lists to store ground truth and predictions
y_true, y_pred = [], []

# Disable gradient computation
with torch.no_grad():
    for inputs, labels in val_loader:
        # Move inputs and labels to the same device as the model
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = model(inputs)

        # Get the predicted class
        _, predicted = torch.max(outputs, 1)

        # Append ground truth and predictions to respective lists
        y_true.extend(labels.cpu().numpy())  # Convert tensors to numpy
        y_pred.extend(predicted.cpu().numpy())

# Generate the classification report
emotion_labels = {0:'Angry', 1:'Disgust', 2:'Fear', 3:'Happy', 4: 'Sad', 5: 'Surprise', 6: 'Neutral'}
print(classification_report(y_true, y_pred, target_names=list(emotion_labels.values())))


              precision    recall  f1-score   support

       Angry       0.55      0.45      0.49       799
     Disgust       0.71      0.53      0.61        87
        Fear       0.40      0.50      0.44       820
       Happy       0.80      0.81      0.80      1443
         Sad       0.45      0.50      0.47       966
    Surprise       0.79      0.72      0.75       634
     Neutral       0.54      0.50      0.52       993

    accuracy                           0.59      5742
   macro avg       0.61      0.57      0.58      5742
weighted avg       0.60      0.59      0.60      5742



### Conclusion

Using Residual Blocks in the convolutional neural network resulted in a lower accuracy of 0.59 compared to the previous architecture. This suggests that, for this specific emotion recognition task, the introduction of Residual Blocks did not improve the model's performance and may have even hindered it.