<img src = "https://github.com/exponentialR/DL4CV/blob/main/media/BMC_Summer_Course_Deep_Learning_for_Computer_Vision.jpg?raw=true" alt='BMC Summer Course' width='300'/>

### BMC Summer Course: Deep Learning for Computer Vision, Transfer Learning Example

Author: Samuel A.

### Introduction

This notebook will guide you through the process of building a Facial Expression Recognition (FER) model using 
a pre-trained ResNet50 model. We will be leveraging transfer learning, a powerful technique that allows us to use 
a model trained on a large dataset and adapt it to our specific task with a smaller dataset.

We'll use the RAF-FACE dataset, which contains images of faces annotated with one of seven emotions: 
Surprise, Fear, Disgust, Happiness, Sadness, Anger, and Neutral. The goal is to train a model that can 
accurately classify these emotions from new images.



### Import necessary libraries

Before we start, let's import all the necessary libraries. These libraries will help us load and preprocess 
the data, build and train our model, and visualize the results.


In [34]:
import numpy as np
import pandas as pd
import os
import cv2
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import models
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
from torch.utils.tensorboard import SummaryWriter


In [36]:
# TensorBoard SummaryWriter
writer = SummaryWriter('runs/rafface-transfer-learning')

### Dataset Overview and Preparation


The RAF-FACE dataset consists of thousands of facial images annotated with emotions. 
We'll start by loading the image filenames and their corresponding labels from a provided text file.


In [15]:
# Set the directory paths
image_dir = 'data/face'  # Update this with the actual path where your images are stored
label_file = 'data/label.txt'  # Update this if your label file is in a different location

# Load the labels
labels_df = pd.read_csv(label_file, sep=" ", header=None, names=["filename", "label"])


In [16]:
print(f'Length of Dataset: {len(os.listdir(image_dir))}')

Length of Dataset: 15339


In [5]:
# Map the numerical labels to corresponding emotions
emotion_mapping = {
    1: "Surprise",
    2: "Fear",
    3: "Disgust",
    4: "Happiness",
    5: "Sadness",
    6: "Anger",
    7: "Neutral"
}

# Apply the mapping to the labels
labels_df['emotion'] = labels_df['label'].map(emotion_mapping)

# Preview the data
print("Data Preview:")
print(labels_df.head())


Data Preview:
          filename  label    emotion
0  train_00001.jpg      5    Sadness
1  train_00002.jpg      5    Sadness
2  train_00003.jpg      4  Happiness
3  train_00004.jpg      4  Happiness
4  train_00005.jpg      5    Sadness


The labels dataframe now contains the image filenames, the corresponding numeric labels, 
and the mapped emotion labels. This will help us organize and process our data effectively.

### Custom Dataset Class 
As you already know that in PyTorch, we often use a custom Dataset class to handle data loading. Here, we'll define a custom Dataset class that reads images from the disk,  applies transformations, and provides labels.

In [28]:
class RAFDataset(Dataset):
    def __init__(self, labels_df, img_dir, transform=None):
        self.labels_df = labels_df
        self.img_dir = img_dir
        self.transform = transform
    
    def __len__(self):
        return len(self.labels_df)
    
    def __getitem__(self, idx):
        # Modify the filename to include '_aligned'
        base_filename = self.labels_df.iloc[idx, 0]
        aligned_filename = base_filename.replace('.jpg', '_aligned.jpg')
        
        img_name = os.path.join(self.img_dir, aligned_filename)
        image = cv2.imread(img_name)
        
        # Check if the image was loaded successfully
        if image is None:
            raise ValueError(f"Image at path {img_name} could not be loaded. Please check if the file exists and is accessible.")
        
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
        label = self.labels_df.iloc[idx, 1] - 1  # Adjust label to be 0-indexed
        
        if self.transform:
            image = self.transform(image)
        
        return image, label


The RAFDataset class will load an image and its corresponding label given an index. 
It also applies any specified transformations, which we'll define shortly.

### Data Transformations and Splitting the Dataset 
We'll now define the transformations to be applied to the images. 
These transformations will include resizing, normalization, and data augmentation. 
We'll then split the dataset into training, validation, and test sets.

In [29]:
# Define the transformations
# Define the transformations
transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

In [27]:

# Separate the dataset into training and test sets based on filenames
train_labels_df = labels_df[labels_df['filename'].str.startswith('train')]
test_labels_df = labels_df[labels_df['filename'].str.startswith('test')]

# Create the dataset
# Create the dataset objects
train_dataset = RAFDataset(train_labels_df, image_dir, transform=transform)
test_dataset = RAFDataset(test_labels_df, image_dir, transform=transform)

# If you want to create a validation set from the training data, you can split the train_dataset
val_size = int(0.2 * len(train_dataset))  # e.g., 20% of the training data for validation
train_size = len(train_dataset) - val_size
train_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

At this point, we have defined our data transformations and created data loaders 
for the training, validation, and test sets. The data loaders will be used 
to feed the data into our model during training and evaluation.

### Using a Pretrained Model (ResNet50)
Now, we'll load a pre-trained ResNet50 model. This model has been trained on the ImageNet dataset, 
which contains millions of images across thousands of classes. We'll leverage the features 
it has learned and adapt them to our facial expression recognition task.


In [30]:


# Load the ResNet50 model with pre-trained ImageNet weights
resnet = models.resnet50(pretrained=True)

# Freeze the layers of the ResNet50 model to retain the pre-trained weights
for param in resnet.parameters():
    param.requires_grad = False



Freezing the layers ensures that the pre-trained features are not modified during training. 
We'll only train the new layers that we'll add for our specific task.

### Building the Final Model
We need to modify the final layer of the ResNet50 model to output `7` classes 
instead of the original `1000` classes (from ImageNet). 
We'll replace the final fully connected layer with a new one that has 7 output units.

In [31]:
# Modify the final layer to match the number of emotion classes
num_features = resnet.fc.in_features
resnet.fc = nn.Sequential(
    nn.Linear(num_features, 512),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(512, 7),
    nn.LogSoftmax(dim=1)
)

The new final layer consists of a fully connected layer with 512 units, 
a ReLU activation function, a dropout layer for regularization, 
and a final fully connected layer with 7 output units and a LogSoftmax activation function.


### Setting Up Training
Before training the model, we'll define the loss function, optimizer, 
and a function for training and validating the model.


In [32]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet.fc.parameters(), lr=0.001)

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

The model will use CrossEntropyLoss as the loss function and Adam as the optimizer. 
We also ensure that the model runs on a GPU if one is available.

In [37]:
# TensorBoard - Add model graph to TensorBoard
images, labels = next(iter(train_loader))
images, labels = images.to(device), labels.to(device)
writer.add_graph(resnet, images)

### Training the Model
We will now define the functions for training and validating the model.

In [33]:
# Training and Validation Functions

"""
We will now define the functions for training and validating the model.
"""


def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10):
    train_losses, val_losses = [], []
    
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * inputs.size(0)
        
        epoch_loss = running_loss / len(train_loader.dataset)
        train_losses.append(epoch_loss)
        
        # Validate the model
        model.eval()
        val_loss = 0.0
        correct = 0
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item() * inputs.size(0)
                
                _, preds = torch.max(outputs, 1)
                correct += torch.sum(preds == labels.data)
        
        val_loss = val_loss / len(val_loader.dataset)
        val_losses.append(val_loss)
        val_acc = correct.double() / len(val_loader.dataset)
        
        print(f"Epoch {epoch+1}/{num_epochs}, Training Loss: {epoch_loss:.4f}, Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.4f}")
        
        # Log the losses and accuracy to TensorBoard
        writer.add_scalar('Loss/train', epoch_loss, epoch)
        writer.add_scalar('Loss/val', val_loss, epoch)
        writer.add_scalar('Accuracy/val', val_acc, epoch)
    
    return train_losses, val_losses



KeyboardInterrupt



In [None]:
# Load the TensorBoard extension
%load_ext tensorboard

# Launch TensorBoard
%tensorboard --logdir=runs


The train_model function will handle both training and validation for each epoch. 
After each epoch


In [None]:

# Train the model
train_losses, val_losses = train_model(resnet, train_loader, val_loader, criterion, optimizer, num_epochs=10)

# Close the TensorBoard writer
writer.close()

### 10. Evaluating the Model
Once training is complete, we'll evaluate the model on the test set to see how well it generalizes to unseen data.

The evaluate_model function will compute the test loss and accuracy. 
It also stores all the predictions and true labels for further analysis.

In [None]:
def evaluate_model(model, test_loader, criterion):
    model.eval()
    test_loss = 0.0
    correct = 0
    
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_loss += loss.item() * inputs.size(0)
            
            _, preds = torch.max(outputs, 1)
            correct += torch.sum(preds == labels.data)
            
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    test_loss = test_loss / len(test_loader.dataset)
    accuracy = correct.double() / len(test_loader.dataset)
    
    print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {accuracy:.4f}")
    
    # Log the test loss and accuracy to TensorBoard
    writer.add_scalar('Loss/test', test_loss)
    writer.add_scalar('Accuracy/test', accuracy)
    
    return all_labels, all_preds


In [None]:
# Evaluate the model and log the results
all_labels, all_preds = evaluate_model(resnet, test_loader, criterion)


### 11. Performance Analysis

To understand the model's performance in more detail, we'll generate a classification report 
and a confusion matrix. These will show us how well the model distinguishes between different emotions.

In [None]:
# Classification Report
print("Classification Report:")
print(classification_report(all_labels, all_preds, target_names=[emotion_mapping[i+1] for i in range(7)]))

# Confusion Matrix
cm = confusion_matrix(all_labels, all_preds)

# Log the confusion matrix as an image to TensorBoard
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=[emotion_mapping[i+1] for i in range(7)],
            yticklabels=[emotion_mapping[i+1] for i in range(7)])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')

# Convert Matplotlib plot to a tensor and add it to TensorBoard
plt.savefig('confusion_matrix.png')
image = plt.imread('confusion_matrix.png')
writer.add_image('Confusion Matrix', image, 0, dataformats='HWC')
plt.show()

# Close the TensorBoard writer
writer.close()

The classification report provides precision, recall, and F1-score for each emotion class. 
The confusion matrix visualizes the true vs. predicted labels, helping us identify any common misclassifications.



### 12. Conclusion

In this notebook, we've successfully built a facial expression recognition model using transfer learning with PyTorch. 
By leveraging a pre-trained ResNet50 model, we were able to achieve high accuracy with limited data.

There are several ways we could further improve this model:
- Experiment with different pre-trained models like VGGFace.
- Fine-tune more layers of the pre-trained model instead of just the final layers.
- Apply more advanced data augmentation techniques.

Feel free to explore these options and see how they affect the model's performance!

