# Finetuning of a Faster RCNN for object detection

In this task, a [Faster RCNN](https://arxiv.org/abs/1506.01497) for object detection is re-trained on a new dataset.

An intensive use of [PyTorch](https://pytorch.org/)'s functionalities has been used. Among other things, I used a pre-trained network and fine-tune a network with new data.

**Note:** The Notbook was developed on a Linux environment and using Google-Colab. It can lead to unpredictability under Windows. Accordingly, running in Colab is recommended.


When using Google Colab set the flag `USE_COLAB` to `True`.

In [None]:
USE_COLAB = True
TARGET_SIZE_IM = (256,256)
BATCH_SIZE = 32

Set the current working directory here if using google colab

In [None]:
zip_file_path = '/content/drive/MyDrive/Capstone Project/Data/FoodLogoDet-1500.zip'
curren_dir = '/content/drive/MyDrive/Colab Notebooks/'

Import all the libraries used.
Please note to set the path to the exercise folder correctly!

In [None]:
import sys
if USE_COLAB:
    from google.colab import drive
    drive.mount('/content/drive/')
    sys.path.append(zip_file_path)
else:
    sys.path.append('')

import numpy as np
import cv2
import os
from datetime import datetime
import zipfile
from PIL import Image
from io import BytesIO
import xml.etree.ElementTree as ET
import re

import torch
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torch.utils.data import Dataset, DataLoader, Subset
import torchvision.transforms.functional as F

from sklearn.model_selection import train_test_split
from tqdm import tqdm

from matplotlib import pyplot as plt
import matplotlib.patches as patches
import pickle

import pandas as pd
import glob as glob
import random
from PIL import Image

NotImplementedError: Mounting drive is unsupported in this environment. Use PyDrive instead. See examples at https://colab.research.google.com/notebooks/io.ipynb#scrollTo=7taylj9wpsA2.

## 1. Detection with a pre-trained model

Selecting the device (CPU or GPU) on which to run the infernce and training.
This notebook is set to use the GPU under Google Colab.

Download the pre-trained **Faster R-CNN** with ResNet50 Backbone, on the COCO dataset.

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print("We use the following device: ", device)
# load a model; pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights='COCO_V1').to(device)

We use the following device:  cuda


Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:00<00:00, 168MB/s]


Function that draws the BBoxes, scores, and labels on the image.


In [73]:
import matplotlib.patches as patches
def plot_image(img, boxes, scores, labels, dataset, save_path=None):
  '''
  Function that draws the BBoxes, scores, and labels on the image.

  inputs:
    img: input-image as numpy.array (shape: [H, W, C])
    boxes: list of bounding boxes (Format [N, 4] => N times [xmin, ymin, xmax, ymax])
    scores: list of conf-scores (Format [N] => N times confidence-score between 0 and 1)
    labels: list of class-prediction (Format [N] => N times an number between 0 and _num_classes-1)
    dataset: list of all classes e.g. ["background", "class1", "class2", ..., "classN"] => Format [N_classes]
  '''

  cmap = plt.get_cmap("tab20b")
  class_labels = np.array(dataset)
  colors = [cmap(i) for i in np.linspace(0, 1, len(class_labels))]
  height, width, _ = img.shape
  # Create figure and axes
  fig, ax = plt.subplots(1, figsize=(10, 6))
  # Display the image
  ax.imshow(img)
  for i, box in enumerate(boxes):
    class_pred = labels[i]
    conf = scores[i]
    width = box[2] - box[0]
    height = box[3] - box[1]
    rect = patches.Rectangle(
        (box[0], box[1]),
        width,
        height,
        linewidth=2,
        edgecolor=colors[int(class_pred)],
        facecolor="none",
    )
    # Add the patch to the Axes
    ax.add_patch(rect)
    plt.text(
        box[0], box[1],
        s=class_labels[int(class_pred)] + " " + str(int(100*conf)) + "%",
        color="white",
        verticalalignment="top",
        bbox={"color": colors[int(class_pred)], "pad": 0},
    )

  # Used to save inference phase results
  if save_path is not None:
    plt.savefig(save_path)
  plt.axis("off")
  plt.show()

Image transformation and execution of the infernce.

##Collect the data frame with the bounding boxes

In [None]:
excel_file_path = '/content/drive/MyDrive/Capstone Project/Data/Food_boxes_224x224.xlsx'

df_labels_resized = pd.read_excel(excel_file_path, dtype={'filename': str})
df_labels_resized.head()

In [None]:
mila = int(df_labels['filename'].max())
print(mila)

99768


## 2. Finetuning of a pre-trained model with new data

### Dataset creation

Use the default dataset of PyTorch to create the dataset class `vehicleDataset` with which the FasterRCNN can be trained. (https://pytorch.org/vision/main/models/faster_rcnn.html)

**Note 1:** There are no labels for some images because they show an 'empty' scene &rarr; these images should be filtered out.

**Note 2:** There are incorrect bounding boxes in the dataset (e.g. `xmax=xmin`).

**Transformations (augmentations)**

# Collect the resized images from the folder



collect the path to all the images in one list

In [None]:
import os
import numpy as np
from sklearn.model_selection import train_test_split

image_resized_folder = '/content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224'
NUM_FILES = 10000

image_files = [None] * mila # tot_num_files
current_count = 0

# Traverse the directory and subdirectories to collect image paths
for root, dirs, files in os.walk(image_resized_folder):
    print('root', root)
    for f in files:
        if f.endswith('.jpg'):
            # if current_count < NUM_FILES:
            image_files[current_count] = os.path.join(root, f)
            current_count += 1
            # else:
            #     break  # Stop if we've reached the limit

image_files = image_files[:NUM_FILES]

# Create a list with the index for the train set and the validation set
image_indices = np.arange(len(image_files))
train_indices, val_indices = train_test_split(image_indices, test_size=0.2, random_state=42)

# Optionally print the number of images found and train/val splits
print(f"Total images found: {len(image_files)}")
print(f"Training set size: {len(train_indices)}, Validation set size: {len(val_indices)}")


root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_0
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_1
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_2
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_3
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_4
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_5
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_6
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_7
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_8
root /content/drive/MyDrive/Capstone Project/Data/FoodDet_ResizedImages_224x224/folder_9
Total images found: 10000
Trai

## Train the model

In [None]:
# Create a Dataloader for the images and the correspoding boxes and labels

def custom_data_loader(folder_path, image_files, indices, labels_dict, batch_size, target_size):
    while True:  # Infinite loop for continuous data loading
        for start in range(0, len(indices), batch_size):
            end = min(start + batch_size, len(indices))
            batch_images = []
            batch_targets = []

            for i in range(start, end):
                # Get the filename without the extension or path
                image_filename = os.path.splitext(os.path.basename(image_files[indices[i]]))[0]

                # Retrieve boxes and labels from the dictionary
                image_annotations = labels_dict.get(image_filename)
                if image_annotations is None:
                    continue  # Skip if no annotations for the image

                # Separate boxes and labels
                boxes = image_annotations[:, :4]  # xmin, ymin, xmax, ymax
                labels = image_annotations[:, 4]  # label column

                # Create the target dictionary
                target = {
                    'boxes': torch.tensor(boxes, dtype=torch.float32),
                    'labels': torch.tensor(labels, dtype=torch.long)
                }
                batch_targets.append(target)

                # Open and process the image
                image_path = os.path.join(folder_path, image_files[indices[i]])
                try:
                    with Image.open(image_path) as img:
                        img_data = np.array(img.convert('RGB'))
                        batch_images.append(img_data)
                except OSError as e:
                    print(f"Warning: Could not process image {image_files[indices[i]]}. Error: {e}")

            # Convert images to tensor and normalize to [0,1]
            batch_images = torch.tensor(np.array(batch_images), dtype=torch.float32).permute(0, 3, 1, 2) / 255.0

            yield batch_images, batch_targets



In [None]:
labels_dict = {
    filename: group[['xmin', 'ymin', 'xmax', 'ymax', 'label']].values
    for filename, group in df_labels.groupby('filename')
}

### Define the Fast R-CNN model

(run only the cell of the model you want to choose)

In [None]:
import torch.optim as optim
from torchvision.models.detection import FasterRCNN
from torchvision.models import resnet18
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.models.detection import fasterrcnn_mobilenet_v3_large_fpn


In [None]:
# Backbone model: ResNet50 (nb params= 41,299,161)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print('device used:', device )
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights='FasterRCNN_ResNet50_FPN_Weights.DEFAULT')
num_classes = 2  # Logo + background
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.to(device)

# Optimizer
params = [p for p in model.parameters() if p.requires_grad]
total_params = sum(p.numel() for p in model.parameters())
print(f"Total number of parameters: {total_params}")
optimizer = optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)


device used: cuda
Total number of parameters: 41299161


In [None]:
#  Backbone model: ResNet-18 (nb params= 40,325,781)
backbone = resnet18(weights='IMAGENET1K_V1')
backbone = torch.nn.Sequential(*list(backbone.children())[:-2])  # Remove the final layers
backbone.out_channels = 512  # This is the final number of channels in ResNet-18 before the fully connected layers

# Define the anchor generator with sizes suited for Faster R-CNN
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),) * 5)

# Create the model
model = FasterRCNN(backbone, num_classes=2, rpn_anchor_generator=anchor_generator)
model.to(device)

# Optimizer
params = [p for p in model.parameters() if p.requires_grad]
total_params = sum(p.numel() for p in model.parameters())
optimizer = optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
print(f"Total number of parameters: {total_params}")

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 126MB/s]


Total number of parameters: 40325781


In [None]:
# Backbone model: MobileNetV3 (nb params= 18,930,229)
model = fasterrcnn_mobilenet_v3_large_fpn(weights='FasterRCNN_MobileNet_V3_Large_FPN_Weights.DEFAULT')
model.roi_heads.box_predictor = FastRCNNPredictor(model.roi_heads.box_predictor.cls_score.in_features, num_classes=2)
model.to(device)

# Optimizer
params = [p for p in model.parameters() if p.requires_grad]
total_params = sum(p.numel() for p in model.parameters())
optimizer = optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
print(f"Total number of parameters: {total_params}")


Total number of parameters: 18930229


In [None]:
## Training function
BATCH_SIZE = 12
data_loader = custom_data_loader(image_resized_folder, image_files, train_indices, labels_dict, BATCH_SIZE, TARGET_SIZE_IM)

num_epochs = 1
epoch_losses = []
for epoch in range(num_epochs):
    model.train()
    batch_losses = []
    for batch_idx in tqdm(range(len(train_indices) // BATCH_SIZE)):
        images, targets = next(data_loader)  # Get the next batch

        # Move images and targets to the appropriate device (CPU or GPU)
        images = images.to(device)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)
        # The model returns a dictionary of losses, sum them
        total_loss = sum(loss for loss in loss_dict.values())

        # Backward pass and optimization
        optimizer.zero_grad()
        total_loss.backward()
        optimizer.step()
        batch_losses.append(total_loss.item())

    # Calculate the average loss for this epoch
    avg_epoch_loss = sum(batch_losses) / len(batch_losses)
    epoch_losses.append(avg_epoch_loss)  # Store the epoch loss

    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {avg_epoch_loss:.4f}")

 24%|██▎       | 157/666 [26:36<1:26:15, 10.17s/it]


KeyboardInterrupt: 

In [None]:
# Svae the model

current_dir = curren_dir if USE_COLAB else os.getcwd()
# Fetch current date and time
now = datetime.now()
dt_string = now.strftime("%d-%m-%Y-%H-%M-%S")
output_dir_name = "output-" + dt_string

OUTPUT_DIR = os.path.join(current_dir, output_dir_name)
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Save the model at the end of each epoch
ckpt_file_name = f"{OUTPUT_DIR}/trained_FastRCNN_model.pth"
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': total_loss.item(),
}, ckpt_file_name)
print(f"Final model saved to {ckpt_file_name}")


Final model saved to /content/drive/MyDrive/Colab Notebooks/output-18-10-2024-21-24-52/trained_FastRCNN_model.pth
Final model saved to /content/drive/MyDrive/Colab Notebooks/output-18-10-2024-21-24-56/trained_FastRCNN_model.pth


In [70]:
def calculate_iou(box1, box2):
    """
    Calculate Intersection over Union (IoU) between two bounding boxes.
    box1, box2: [xmin, ymin, xmax, ymax]
    Returns IoU value.
    """
    # Determine the coordinates of the intersection rectangle
    x1_inter = max(box1[0], box2[0])
    y1_inter = max(box1[1], box2[1])
    x2_inter = min(box1[2], box2[2])
    y2_inter = min(box1[3], box2[3])

    # Compute area of intersection
    inter_area = max(0, x2_inter - x1_inter) * max(0, y2_inter - y1_inter)

    # Compute areas of both the bounding boxes
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])

    # Compute the union area
    union_area = box1_area + box2_area - inter_area

    # Compute the IoU
    iou = inter_area / union_area if union_area != 0 else 0
    return iou

def evaluate_iou_for_batch(predictions, targets, iou_threshold=0.5):
    """
    Evaluate IoU for a batch of predictions and targets.
    predictions: List of predicted bounding boxes
    targets: List of ground truth boxes
    iou_threshold: IoU threshold to consider a prediction correct
    Returns the IoU accuracy.
    """
    correct_boxes = 0
    total_boxes = 0

    for pred, target in zip(predictions, targets):
        pred_boxes = pred['boxes'].cpu().numpy()  # Predicted boxes
        true_boxes = target['boxes'].cpu().numpy()  # Ground truth boxes

        for t_box in true_boxes:
            total_boxes += 1
            # Find the predicted box with the highest IoU for this true box
            iou_max = 0
            for p_box in pred_boxes:
                iou = calculate_iou(t_box, p_box)
                if iou > iou_max:
                    iou_max = iou

            # Count as correct if IoU exceeds threshold
            if iou_max >= iou_threshold:
                correct_boxes += 1

    iou_accuracy = correct_boxes / total_boxes if total_boxes > 0 else 0
    return iou_accuracy


In [None]:
model.eval()  # Switch to evaluation mode
BATCH_SIZE = 12
# Create your custom data loader for validation (or training)
data_loader = custom_data_loader(zip_file_path, image_files, val_indices, df_labels, BATCH_SIZE, TARGET_SIZE_IM)

with torch.no_grad():
    iou_accuracies = []
    for batch_idx in tqdm(range(len(val_indices) // BATCH_SIZE)):  # Loop through validation data
        images, targets = next(data_loader)  # Get the next batch

        # Move to appropriate device
        images = images.to(device)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        # Forward pass to get predictions
        predictions = model(images)

        # Calculate IoU for this batch
        iou_accuracy = evaluate_iou_for_batch(predictions, targets)
        iou_accuracies.append(iou_accuracy)

    # Average IoU accuracy over all batches
    mean_iou_accuracy = sum(iou_accuracies) / len(iou_accuracies)
    print(f"Mean IoU Accuracy: {mean_iou_accuracy:.4f}")


100%|██████████| 166/166 [08:17<00:00,  2.99s/it]

Mean IoU Accuracy: 0.1739





In [None]:
crowdai = ["Background", "Logo"]

# Set model to evaluation mode
model.eval()

# Number of images to visualize
num_images = 1

# Create your custom data loader (you might already have this part)
BATCH_SIZE = 12
data_loader = custom_data_loader(zip_file_path, image_files, train_indices, df_labels, BATCH_SIZE, TARGET_SIZE_IM)

# Disable gradient calculations for evaluation
with torch.no_grad():
    for _ in range(num_images):  # Loop through the desired number of images
        # Get a batch of images and targets from the data loader
        images, targets = next(data_loader)

        # Move images to the correct device
        images = images.to(device)

        # Get model predictions (outputs are lists of dictionaries)
        predictions = model(images)

        # Loop over each image in the batch
        for i in range(len(images)):
            # Move image back to CPU and convert to numpy for plotting
            img = images[i].cpu().permute(1, 2, 0).numpy()

            # Extract predicted boxes, labels, and scores
            pred_boxes = predictions[i]['boxes'].cpu().numpy()  # Predicted bounding boxes
            pred_labels = predictions[i]['labels'].cpu().numpy()  # Predicted labels
            pred_scores = predictions[i]['scores'].cpu().numpy()  # Confidence scores for each box

            max_index = pred_scores.argmax()  # Get the index of the highest score
            best_box = pred_boxes[max_index].reshape(1, -1)  # The best bounding box
            best_label = pred_labels[max_index].reshape(1)  # The best label
            best_score = pred_scores[max_index].reshape(1)  # The highest confidence score


            # Visualize the image with the best predicted box
            plot_image(img, best_box, best_score, best_label, crowdai)


Output hidden; open in https://colab.research.google.com to view.