## Facial Emotion Recognition using Transfer Learning with Inception V3

This notebook demonstrates how to build, train and evaluate a Convolutional Neural Network (CNN) for facial emotion recognition by leveraging transfer learning with the pre-trained Inception V3 architecture.

The model will learn to classify facial expressions into different emotion categories like happy, sad, angry, neutral, etc. using the power of deep learning and computer vision.

### Setup Instructions

#### For NVIDIA GPU / CPU Users
By default the next cell will install the required dependencies for CPU users or CUDA users. If you are running this notebook on ROCm, please comment the first line and uncomment the second line.

In [None]:
!pip install -r requirements.txt
#!pip install -r requirements.rocm.txt

### [Optional] Use WanDB to monitor your training process
If you have a wandb account, you can use it to monitor your training process. You can install wandb using pip:  
```
pip install wandb
```
and configure your wandb account using the following command:
```
wandb login
```
After this, the notebook will automatically log your training process to wandb.

In [9]:
try:
    import wandb
    WANDB_AVAILABLE = True
except ImportError:
    WANDB_AVAILABLE = False

def init_wandb(project_name, config):
    if WANDB_AVAILABLE:
        return wandb.init(project=project_name, config=config);
    return None

def log_metrics(metrics, step=None):
    if WANDB_AVAILABLE:
        wandb.log(metrics, step=step);

### Dataset Preparation for Emotion Recognition

This function sets up data loading and preprocessing for an emotion recognition model:

- Uses PyTorch's ImageFolder to load emotion images
- Applies two types of transforms to increase data diversity:
  - Training: Includes augmentation (flips, rotations, color adjustments)
  - Testing/Validation: Basic resizing and normalization only
- Splits dataset into:
  - 70% training
  - 15% validation
  - 15% testing
- Creates DataLoaders with batch size 128
- All images are processed to 299x299 (Inception V3 size)
- Uses ImageNet normalization values

In [10]:
import numpy as np
import torch
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import random_split, DataLoader

def setup_data_loaders(batch_size=64):
    train_transform = transforms.Compose([
        transforms.Resize(299),
        transforms.CenterCrop(299),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(10),
        transforms.ColorJitter(brightness=0.2, contrast=0.2),
        transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    # Validation/Testing Transformations (no augmentation but with resizing)
    test_transform = transforms.Compose([
        transforms.Resize(299),
        transforms.CenterCrop(299),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], # ImageNet mean and std
                            std=[0.229, 0.224, 0.225])
    ])

    full_dataset = datasets.ImageFolder(root='emotions_dataset/emotions_dataset_cropped_faces', transform=None)

    # Split the dataset into training, validation, and testing sets
    train_size = int(0.7 * len(full_dataset))
    val_size = int(0.15 * len(full_dataset))
    test_size = len(full_dataset) - train_size - val_size

    train_dataset, val_dataset, test_dataset = random_split(
        full_dataset, 
        [train_size, val_size, test_size]
    )

    # Apply the previously created transforms
    train_dataset.dataset.transform = train_transform
    val_dataset.dataset.transform = test_transform
    test_dataset.dataset.transform = test_transform

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    # Print dataset sizes for confirmation
    print(f"Training samples: {len(train_dataset)}")
    print(f"Validation samples: {len(val_dataset)}")
    print(f"Testing samples: {len(test_dataset)}")

    return train_loader, val_loader, test_loader, full_dataset.classes

### Training the model
This code block will define the training and validation phases of the model. It:

- Processes batches of images through the model
- Calculates losses and performs backpropagation
- Updates model weights using the optimizer

The validation phase evaluates the model's performance by:

- Computing validation loss
- Calculating prediction accuracy
- Tracking metrics like train loss, validation loss, and validation accuracy

In [11]:
import torch.nn as nn
import torchvision.models as models

def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs, device):
    model.to(device)
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)

            # Backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        # Validation phase
        model.eval()
        val_loss = 0
        correct = 0
        total = 0
        with torch.no_grad():
            for images, labels in val_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                # Calculate accuracy
                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        train_loss /= len(train_loader)
        val_loss /= len(val_loader)
        val_accuracy = 100 * correct / total

        log_metrics({"train_loss": train_loss, "val_loss": val_loss, "val_accuracy": val_accuracy})
        print(f"Epoch [{epoch + 1}/{num_epochs}], "
              f"Train Loss: {train_loss:.4f}, "
              f"Val Loss: {val_loss:.4f}, "
              f"Val Accuracy: {val_accuracy:.2f}%")

    return model

### Import Inception V3
This is a pre-trained model that has been trained on the ImageNet dataset. This function will load the model and return it. We disable the aux_logits parameter as it is not needed for transfer learning.

In [12]:
# Import Inception V3
from torchvision.models import inception_v3, Inception_V3_Weights

# Load and modify the model
def get_inception_model(num_classes):
    model = inception_v3(weights=Inception_V3_Weights.DEFAULT)
    model.aux_logits = False
    model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

### Configure the model
A few parameters can be configured to change the behavior of the model:
- num_epochs: number of epochs to train the model
- batch_size: number of samples per batch
- learning_rate: learning rate for the optimizer

In [13]:
batch_size = 64
num_epochs = 30
learning_rate = 0.000025

# Use the loaders defined above to prepare the data for training
train_loader, val_loader, test_loader, classes = setup_data_loaders(batch_size=batch_size)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device} for model training")
num_classes = len(classes)

# Model, Loss, Optimizer
model = get_inception_model(num_classes)
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


Training samples: 857
Validation samples: 183
Testing samples: 185
Using device: cuda for model training


### Start training
We are now ready to start training our model. If wandb is installed, it will automatically log the training metrics to wandb.

In [14]:
# Attempt to init wandb
init_wandb("convomo", config={
    "learning_rate": learning_rate,
    "architecture": "CNN",
    "dataset": "yousefmohamed20/sentiment-images-classifier",
    "epochs": num_epochs,
})

# Train the model
trained_model = train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs, device)

Epoch [1/30], Train Loss: 1.7352, Val Loss: 1.7058, Val Accuracy: 27.87%
Epoch [2/30], Train Loss: 1.5247, Val Loss: 1.6002, Val Accuracy: 38.25%
Epoch [3/30], Train Loss: 1.3263, Val Loss: 1.5105, Val Accuracy: 39.34%
Epoch [4/30], Train Loss: 1.1541, Val Loss: 1.4084, Val Accuracy: 48.63%
Epoch [5/30], Train Loss: 0.9924, Val Loss: 1.3165, Val Accuracy: 52.46%
Epoch [6/30], Train Loss: 0.8285, Val Loss: 1.2370, Val Accuracy: 54.64%
Epoch [7/30], Train Loss: 0.6683, Val Loss: 1.1768, Val Accuracy: 53.55%
Epoch [8/30], Train Loss: 0.5473, Val Loss: 1.1316, Val Accuracy: 55.74%
Epoch [9/30], Train Loss: 0.4324, Val Loss: 1.0952, Val Accuracy: 56.28%
Epoch [10/30], Train Loss: 0.3433, Val Loss: 1.0758, Val Accuracy: 59.02%
Epoch [11/30], Train Loss: 0.2778, Val Loss: 1.0552, Val Accuracy: 61.20%
Epoch [12/30], Train Loss: 0.2159, Val Loss: 1.0515, Val Accuracy: 59.02%
Epoch [13/30], Train Loss: 0.1640, Val Loss: 1.0327, Val Accuracy: 61.75%
Epoch [14/30], Train Loss: 0.1387, Val Loss: 1.

In [15]:
def evaluate_model(model, test_loader, device):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")


In [16]:
# Test the model
evaluate_model(trained_model, test_loader, device)
wandb.finish()

Test Accuracy: 64.32%


0,1
train_loss,██▇▆▅▅▄▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_accuracy,▁▁▃▃▅▆▆▆▆▆▇▇▇▇▇▇▇█▇██▇█████████
val_loss,██▇▆▅▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
train_loss,0.02234
val_accuracy,63.93443
val_loss,1.02186
