# Finetuning Faster RCNN for Door Detection

In this notebook, the [Faster RCNN](https://arxiv.org/abs/1506.01497) model is re-trained on the CubiCasa5k dataset to detect door bounding boxes. An intensive use of [PyTorch](https://pytorch.org/)'s functionalities has been used. Among other things, I used a pre-trained network and fine-tune a network with new data.

**Note:** The Notbook was developed on a Linux environment and using CUDA 12.1


In [1]:
!python --version # confirm the version is 3.10.12

Python 3.10.12


Set the current working directory here

In [3]:
curren_dir = '/home/hudab/door_model/'

Import all the libraries used.
Please note to set the system path to the exercise folder correctly!

In [4]:
import sys
sys.path.append('') # make sure path is correct here

import math
import numpy as np
import cv2
import os
from datetime import datetime

from PIL import Image

import torch
import torchvision

from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torch.utils.data import Dataset, DataLoader, Subset
import torchvision.transforms.functional as F

from torchmetrics.detection import MeanAveragePrecision

from sklearn.model_selection import train_test_split
from tqdm import tqdm

from matplotlib import pyplot as plt
import pickle

import pandas as pd
import glob as glob
import random
import gdown

## 1. Detection with a pre-trained model

Selecting the device (CPU or GPU) on which to run the infernce and training.
This notebook is set to use the GPU under Google Colab.

Download the pre-trained **Faster R-CNN** with ResNet50 Backbone, on the COCO dataset.

In [6]:
print(torch.__version__) #current version
print(torch.cuda.is_available()) #check if GPU is available
print(":)")

2.1.0+cu121
True
:)


In [7]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print("We use the following device: ", device)
# load a model; pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights='COCO_V1').to(device)

We use the following device:  cuda


Transformation of a sample input for infernce.

In [8]:
def img_transform(img):
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)
  img /= 255.0
  img = torch.from_numpy(img).permute(2,0,1)
  return img

  Inference of a single input image.

In [9]:
def inference(img, model, detection_threshold=0.00):
  '''
  Infernece of a single input image

  inputs:
    img: input-image as torch.tensor (shape: [C, H, W])
    model: model for infernce (torch.nn.Module)
    detection_threshold: Confidence-threshold for NMS (default=0.7)

  returns:
    boxes: bounding boxes (Format [N, 4] => N times [xmin, ymin, xmax, ymax])
    labels: class-prediction (Format [N] => N times an number between 0 and _num_classes-1)
    scores: confidence-score (Format [N] => N times confidence-score between 0 and 1)
  '''
  model.eval()

  img = img.to(device)
  outputs = model([img])
  # print(outputs)
  
  boxes = outputs[0]['boxes'].data.cpu().numpy()
  scores = outputs[0]['scores'].data.cpu().numpy()
  labels = outputs[0]['labels'].data.cpu().numpy()

  boxes = boxes[scores >= detection_threshold].astype(np.int32)
  labels = labels[scores >= detection_threshold]
  scores = scores[scores >= detection_threshold]

  return boxes, scores, labels

Function that draws the BBoxes, scores, and labels on the image.


In [10]:
import matplotlib.patches as patches

def plot_image(img, boxes, scores, labels, dataset, save_path=None):
  '''
  Function that draws the BBoxes, scores, and labels on the image.

  inputs:
    img: input-image as numpy.array (shape: [H, W, C])
    boxes: list of bounding boxes (Format [N, 4] => N times [xmin, ymin, xmax, ymax])
    scores: list of conf-scores (Format [N] => N times confidence-score between 0 and 1)
    labels: list of class-prediction (Format [N] => N times an number between 0 and _num_classes-1)
    dataset: list of all classes e.g. ["background", "class1", "class2", ..., "classN"] => Format [N_classes]
  '''

  cmap = plt.get_cmap("tab20b")
  class_labels = np.array(dataset)
  colors = [cmap(i) for i in np.linspace(0, 1, len(class_labels))]
  height, width, _ = img.shape
  # Create figure and axes
  fig, ax = plt.subplots(1, figsize=(16, 8))
  # Display the image
  ax.imshow(img)
  for i, box in enumerate(boxes):
    class_pred = labels[i]
    conf = scores[i]
    width = box[2] - box[0]
    height = box[3] - box[1]
    rect = patches.Rectangle(
        (box[0], box[1]),
        width,
        height,
        linewidth=2,
        edgecolor=colors[int(class_pred)],
        facecolor="none",
    )
    # Add the patch to the Axes
    ax.add_patch(rect)
    plt.text(
        box[0], box[1],
        s=class_labels[int(class_pred)] + " " + str(int(100*conf)) + "%",
        color="white",
        verticalalignment="top",
        bbox={"color": colors[int(class_pred)], "pad": 0},
    )

  # Used to save inference phase results
  if save_path is not None:
    plt.savefig(save_path)

  plt.show()

## 2. Prepping Door Dataset

In [None]:
from bs4 import BeautifulSoup
import sys
import pandas as pd

### Download the CubiCasa5k Dataset

In [None]:
!gdown --fuzzy https://drive.google.com/file/d/1Wm3q2vyEeFL-gGPEsbWVFX31rgtMyywp/view?usp=sharing -O cubicasa5k.tar

In [None]:
!tar -xf cubicasa5k.tar

### Collate file of all items

Collect all samples from train, test and validation into one file

In [None]:
train = open("cubicasa5k/train.txt", "r")
test = open("cubicasa5k/test.txt", "r")
validation = open("cubicasa5k/val.txt", "r")

all = open("cubicasa5k/all.txt", "w")
all.write(train.read() + "\n" + test.read() + "\n" + validation.read() + "\n")
all.close()

Here, we create a smaller image subset to be used for any debugging for the door bounding boxes extraction. The subset details are stored under the file "cubicasa5k/sample.txt".

In [None]:
file = open("cubicasa5k/all.txt", "r")
sample = open("cubicasa5k/sample.txt", "w")

section = file.readlines()[:10]
sample.write("".join(section))

file.close()
sample.close()

### Extract Door Bounding Boxes

Functions used to extract door bounding boxes from svg files

In [None]:
"""
get_svg: string -> string
REQUIRES: input string is a valid system path to an SVG file
ENSURES: return value is the contents of the SVG pointed to by input path
"""
def get_svg(path):
    file = open(path, "r")
    svg = file.read()

    file.close()
    return svg

"""
get_door_tags: string -> bs4.element.ResultSet
REQUIRES: string is XML of a SVG file
ENSURES: returns a ResultSet of all tags with id = 'Door' inthe given SVG string
"""
def get_door_tags(svg):
    soup = BeautifulSoup(svg, 'xml')
    return soup.find_all(attrs={'id':'Door'})

def str2coord(s):
    (x, y) = s.split(",")
    return int(float(x)), int(float(y))

"""
get_door_thresh: bs4.element -> string
REQUIRES: given element has id = 'Door' and has a polygon tag with attribute
          points, depicting a rectangle with coordinate arrangement bottom left,
          bottom right, top right, top left
ENSURES: returns a string of the first two coordinate pairs of the points
         attribute, separated by a space
"""
def get_door_thresh(elem):
    polygon = elem.find("polygon")
    coord = polygon.attrs["points"].split(" ")[:-1]
    x = [str2coord(s)[0] for s in coord]
    y = [str2coord(s)[1] for s in coord]

    return x, y

def get_curve_end(elem):
    path = elem.find("path")

    coord = path.attrs["d"].split(" ")
    coord = [s.strip(" MmQqLl\n\r\t") for s in coord]

    x = [str2coord(s)[0] for s in (coord[::2])]
    y = [str2coord(s)[1] for s in (coord[::2])]

    return sum(x), sum(y)

Open file that has locations of floor plans

In [None]:
folder = "cubicasa5k"
paths = open("cubicasa5k/all.txt", "r")

Go through each SVG one by one to get its bounding boxes

In [None]:
df=pd.DataFrame(columns=['xmin', 'ymin', 'xmax', 'ymax', 'Frame', 'Label'])

# read each dataset path
for line in paths:
    path = line.strip() # path to output
    svg = get_svg(folder + path + "model.svg")

    door_elements = get_door_tags(svg)
    i = 0

    # write output value for each door
    for elem in door_elements:
        all_x, all_y = get_door_thresh(elem)
        try:
          x, y = get_curve_end(elem)
        except:
          continue

        all_x.append(x)
        all_y.append(y)

        # print(min(all_x), min(all_y), max(all_x), max(all_y))
        df.loc[len(df.index)] = [min(all_x), min(all_y), max(all_x), max(all_y),
                                 path.strip("/") + "/F1_scaled.png", "Door"]

        i += 1

### Save resulting dataset

In [None]:
df

Check all bounding boxes are valid and then save the data

In [None]:
df = df.query("xmin < xmax and ymin < ymax")

In [None]:
df.to_csv("doors.csv", index=False)

## 3. Finetuning of a pre-trained model with new data

Here, we finetune the pre-trained model from above. For this purpose, we use the CubiCasa5k floor plan images along with the bounding boxes extracted above.

In [11]:
path_csv = 'doors.csv'
img_dir = 'cubicasa5k/'

### Dataset creation

Use the default dataset of PyTorch to create the dataset class `imageDataset` with which the FasterRCNN can be trained. (https://pytorch.org/vision/main/models/faster_rcnn.html)

**Transformations (augmentations)**

In [12]:
'''
Class that holds all the augmentation related attributes
'''
class Transformation():
    # This provides a random probability of the augmentation to be applied or not
    def get_probability(self):
        return np.random.choice([False, True], replace=False, p=[0.5, 0.5])

    # Increases the contrast by a factor of 2
    def random_adjust_contrast(self, image, enable=None):
        enable = self.get_probability() if enable is None else enable
        return F.adjust_contrast(image, 2) if enable else image

    # Increaes the brightness by a factor of 2
    def random_adjust_brightness(self, image, enable=None):
        enable = enable = self.get_probability() if enable is None else enable
        return F.adjust_brightness(image,2) if enable else image

    # Horizontal flip
    def random_hflip(self, image, boxes, enable=None):
        enable = enable = self.get_probability() if enable is None else enable
        if enable:
          #flip image
          new_image = F.hflip(image)

          #flip boxes
          new_boxes = boxes.clone()
          new_boxes[:, 0] = image.shape[2] - boxes[:, 0]  # image width - xmin
          new_boxes[:, 2] = image.shape[2] - boxes[:, 2]  # image_width - xmax
          new_boxes = new_boxes[:, [2, 1, 0, 3]]          # Interchange the xmin and xmax due to mirroring
          return new_image, new_boxes
        else:
          return image, boxes

**Custom Dataset for Image Data**

In [13]:
class imageDataset(Dataset):
  def __init__(self, img_path, label_path, classes, transforms=None):
    super().__init__()
    print("Preparing the dataset...")

    self.image_dir = img_path

    self.gt_info = pd.read_csv(label_path)

    self.classes = classes
    self.transforms = transforms

    # Create a list of image file names (in sorted order - this is optional)
    self.image_paths = [f"{img_path}{row['Frame']}" for i, row in self.gt_info.iterrows()]
    self.all_images = list(self.gt_info['Frame'])

    # Map Label (str) --> Label (int)
    for i in range(len(self.gt_info)):
      label = self.gt_info.loc[i, 'Label']
      self.gt_info.loc[i, 'Label'] = self.classes.index(label)

    # Filter the dataset based on given conditions:
    self.filter_dataset()
    print("Dataset prepared")

  def __getitem__(self, idx):
    target = {}

    # Read input image
    image_name = self.all_images[idx]
    image_path = self.image_paths[idx]
    image = cv2.imread(image_path)
    try:
      image = img_transform(image)
    except:
      raise Exception(f"libpng error found on image name: {image_name}, image path: {image_path}\n")

    # Fetch GT infos for given image
    gt_info = self.gt_info[self.gt_info['Frame'] == image_name]

    boxes = torch.Tensor(gt_info[['xmin', 'ymin', 'xmax', 'ymax']].values.astype('float')).float()
    labels = torch.LongTensor(gt_info['Label'].values.tolist())

    if self.transforms:
        image = self.transforms.random_adjust_contrast(image, enable=True)
        image = self.transforms.random_adjust_brightness(image, enable=True)
        image, boxes = self.transforms.random_hflip(image, boxes, enable=True)

    target["boxes"] = boxes     # Hint: Shape -> [N, 4] with N = Number of Boxes
    target["labels"] = labels   # Hint: Shape -> [N] with N = Number of Boxes

    return image, target

  def __len__(self):
    return len(self.all_images)

  '''
  Filter the dataset by removing images with no labels and incorrect bounding boxes
  '''
  def filter_dataset(self):
    print("Filtering the dataset...")
    remove_images = []

    # There are no labels for some images because they show an 'empty' scene → these images should be filtered out.
    for image_file in self.all_images.copy():
      if image_file not in self.gt_info['Frame'].values:
        remove_images.append(image_file)
        self.all_images.remove(image_file)
    print("Images removed with no labels: ", len(remove_images))

    self.gt_info = self.gt_info.query("xmin < xmax and ymin < ymax")

### Testing the created dataset class by running an infernce on random images of the dataset.

In [None]:
# crowdai = ["Background","Door"]
crowdai = ["Background", "Car", "Truck", "Pedestrian"]
dataset = imageDataset(img_dir, path_csv, crowdai, transforms=Transformation()) # here your arguments
num_images = 5

for i in range(num_images):
  x = random.randint(0, (dataset.__len__()-1))
  img, target = dataset.__getitem__(x)
  img = img.cpu().permute(1,2,0).numpy()
  boxes = target['boxes'].numpy()
  labels = target['labels'].numpy()
  scores = [1]*len(labels)
  print("Image index: ", x)
  plot_image(img, boxes, scores, labels, crowdai)

In [None]:
dataset.gt_info.iloc[2365]

In [None]:
img, target = dataset.__getitem__(2365)
img = img.cpu().permute(1,2,0).numpy()
boxes = target['boxes'].numpy()
labels = target['labels'].numpy()
scores = [1]*len(labels)
print("Image index: ", 2365)
plot_image(img, boxes, scores, labels, crowdai)

### Fine-Tune

The Faster RCNN model is now to be trained for 5 epochs on the new dataset. Several tasks have to be done for this.

In [14]:
# Hyperparameters

SEED = 42

TEST_SIZE = 0.2

NUM_EPOCHS = 34

LR = 0.005
LR_MOMENTUM=0.9
LR_DECAY_RATE=0.0005

LR_SCHED_STEP_SIZE = 5
LR_SCHED_GAMMA = 0.1

BATCH_SIZE = 16

NUM_TEST_IMAGES = 5
NMS_THRESH = 0.01

Create an output directory to store checkpoints and plots

In [15]:
current_dir = ""

# Fetch current date and time
now = datetime.now()
dt_string = now.strftime("%d-%m-%Y")
output_dir_name = "output-" + dt_string

OUTPUT_DIR = os.path.join(current_dir, output_dir_name)
os.makedirs(OUTPUT_DIR, exist_ok=True)

In [30]:
crowdai = ["Background", "Door"]

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(min_size=300, max_size=480, weights=True)
num_classes = 2  # 1 classes + background
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.to(device)

num_epochs = NUM_EPOCHS

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=LR, momentum=LR_MOMENTUM, weight_decay=LR_DECAY_RATE)

# create a train- and validation-dataset with our imageDataset
# split the dataset in train and test set
dataset = imageDataset(img_dir, path_csv, crowdai, transforms=Transformation())      # here your arguments
dataset_test = imageDataset(img_dir, path_csv, crowdai, transforms=None)             # here your arguments

torch.manual_seed(SEED)
indices = torch.randperm(len(dataset)).tolist()

test_size = int(len(dataset) * TEST_SIZE)
dataset = torch.utils.data.Subset(dataset, indices[:-test_size])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-test_size:])

# create a learning rate scheduler
# TODO: step size to be tuned !
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, \
                                                step_size=LR_SCHED_STEP_SIZE, \
                                                gamma=LR_SCHED_GAMMA)

def collate_fn(batch):
  return tuple(zip(*batch))

# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
  dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,
  collate_fn=collate_fn)

data_loader_test = torch.utils.data.DataLoader(
  dataset_test, batch_size=1, shuffle=False, num_workers=2,
  collate_fn=collate_fn)



Preparing the dataset...
Filtering the dataset...
Images removed with no labels:  0
Dataset prepared
Preparing the dataset...
Filtering the dataset...
Images removed with no labels:  0
Dataset prepared


In [31]:
print(len(data_loader), len(data_loader_test))

2317 9265


In [32]:
print(len(dataset), len(dataset_test))

37063 9265


In [33]:
num_classes

2

Train per epoch, validation loop and plotting function

In [34]:
'''
Function to train the model over one epoch.
'''
def train_one_epoch(model, optimizer, data_loader, device, epoch):
  train_loss_list = []

  tqdm_bar = tqdm(data_loader, total=len(data_loader))
  for idx, data in enumerate(tqdm_bar):
    optimizer.zero_grad()
    images, targets = data

    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]  # targets = {'boxes'=tensor, 'labels'=tensor}

    losses = model(images, targets)

    loss = sum(loss for loss in losses.values())
    loss_val = loss.item()
    train_loss_list.append(loss.detach().cpu().numpy())

    loss.backward()
    optimizer.step()

    tqdm_bar.set_description(desc=f"Training Loss: {loss:.3f}")

  return train_loss_list

'''
Function to validate the model
'''
def evaluate(model, data_loader_test, device, m_1, m_5):
    val_loss_list = []

    target = dict(boxes=torch.Tensor(),labels=torch.Tensor(),)
    preds = dict(boxes=torch.Tensor(),scores=torch.Tensor(),labels=torch.Tensor(),)
    
    tqdm_bar = tqdm(data_loader_test, total=len(data_loader_test))
    for i, data in enumerate(tqdm_bar):
        imges, tgt = data

        images = list(image.to(device) for image in imges)
        targets = [{k: v.to(device) for k, v in t.items()} for t in tgt]

        with torch.no_grad():
            losses = model(images, targets)

        loss = sum(loss for loss in losses.values())
        loss_val = loss.item()
        val_loss_list.append(loss_val)
        
        box, score, label = inference(imges[0], model)
        model.train()

        target = [dict(boxes=tgt[0]["boxes"].to("cpu"),labels=tgt[0]["labels"].to("cpu"),)]
        preds = [dict(boxes=torch.tensor(box),scores=torch.tensor(score),labels=torch.tensor(label),)]
        
        tqdm_bar.set_description(desc=f"Validation Loss: {loss:.4f}")
        
        m_1.update(preds, target)
        m_5.update(preds, target)
    return val_loss_list

'''
Function to plot training and valdiation losses and save them in `output_dir'
'''
def plot_loss(train_loss, valid_loss):
    figure_1, train_ax = plt.subplots()
    figure_2, valid_ax = plt.subplots()

    train_ax.plot(train_loss, color='blue')
    train_ax.set_xlabel('Iteration')
    train_ax.set_ylabel('Training Loss')

    valid_ax.plot(valid_loss, color='red')
    valid_ax.set_xlabel('Iteration')
    valid_ax.set_ylabel('Validation loss')

    figure_1.savefig(f"{OUTPUT_DIR}/train_loss.png")
    figure_2.savefig(f"{OUTPUT_DIR}/valid_loss.png")

    plt.close("all")

In [21]:
OUTPUT_DIR

'output-25-03-2024'

In [24]:
iou_custom = [i / 100 for i in range(1, 11, 1)]
iou_custom

[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1]

In [28]:
from pprint import pprint

Main train loop over all epochs

In [35]:
# find latest saved chcekpoint
checkpoint_files = [f for f in os.listdir(f"/home/hudab/door_model/output-16-03-2024/") if f.endswith('.pth')]
if checkpoint_files:
    # Last ckpt file
    print("checkpoint found!")
    checkpoint_files.sort()
    latest_checkpoint = os.path.join(OUTPUT_DIR, checkpoint_files[-1])

    # Load the ckpt
    checkpoint = torch.load(f"/home/hudab/door_model/output-16-03-2024/epoch_32_model.pth")
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    start_epoch = checkpoint['epoch']
    loss_dict = checkpoint['loss_dict']
else:
    start_epoch = 0
    loss_dict = {'train_loss': [], 'valid_loss': []}

'''
Train the model over all epochs
'''
for epoch in range(start_epoch, num_epochs):
    m_1 = MeanAveragePrecision(iou_type="bbox", iou_thresholds=iou_custom)
    m_5 = MeanAveragePrecision(iou_type="bbox")
    print("----------Epoch {}----------".format(epoch+1))
    
    # Train the model for one epoch
    train_loss_list = train_one_epoch(model, optimizer, data_loader, device, epoch)
    loss_dict['train_loss'].extend(train_loss_list)
    
    lr_scheduler.step()
    
    # Run evaluation
    valid_loss_list = evaluate(model, data_loader_test, device, m_1, m_5)
    loss_dict['valid_loss'].extend(valid_loss_list)
    
    # Svae the model ckpt after every epoch
    ckpt_file_name = f"{OUTPUT_DIR}/epoch_{epoch+1}_model.pth"
    torch.save({
    'epoch': epoch+1,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss_dict': loss_dict
    }, ckpt_file_name)
    
    # NOTE: The losses are accumulated over all iterations
    plot_loss(loss_dict['train_loss'], loss_dict['valid_loss'])
    map1 = m_1.compute()
    map2 = m_5.compute()

    map1 = {k: (v.item()) for k, v in map1.items()}
    map2 = {k: (v.item()) for k, v in map2.items()}

    print("mAP@[0.01:0.1]: ")
    pprint(map1)
    print("mAP@[0.5:0.95]: ")
    pprint(map2)

# Store the losses after the training in a pickle
with open(f"loss_dict.pkl", "wb") as file:
    pickle.dump(loss_dict, file)

print("Training Finished !")

checkpoint found!
----------Epoch 33----------


Training Loss: 0.098: 100%|█████████████████████████████████████████| 2317/2317 [1:00:20<00:00,  1.56s/it]
Validation Loss: 0.2902: 100%|████████████████████████████████████████| 9265/9265 [12:45<00:00, 12.11it/s]


mAP@[0.01:0.1]: 
{'classes': 1,
 'map': 0.7967436909675598,
 'map_50': -1.0,
 'map_75': -1.0,
 'map_large': 0.7767582535743713,
 'map_medium': 0.805787205696106,
 'map_per_class': -1.0,
 'map_small': 0.1382978856563568,
 'mar_1': 0.07661794871091843,
 'mar_10': 0.6348279714584351,
 'mar_100': 0.8271962404251099,
 'mar_100_per_class': -1.0,
 'mar_large': 0.8187423944473267,
 'mar_medium': 0.8337820172309875,
 'mar_small': 0.188043475151062}
mAP@[0.5:0.95]: 
{'classes': 1,
 'map': 0.4703085422515869,
 'map_50': 0.7478157877922058,
 'map_75': 0.5338945984840393,
 'map_large': 0.43276548385620117,
 'map_medium': 0.4828597605228424,
 'map_per_class': -1.0,
 'map_small': 0.0,
 'mar_1': 0.05661710724234581,
 'mar_10': 0.44582992792129517,
 'mar_100': 0.5507619380950928,
 'mar_100_per_class': -1.0,
 'mar_large': 0.5346775054931641,
 'mar_medium': 0.5555819272994995,
 'mar_small': 0.0}
----------Epoch 34----------


Training Loss: 0.103: 100%|███████████████████████████████████████████| 2317/2317 [51:08<00:00,  1.32s/it]
Validation Loss: 0.3034: 100%|████████████████████████████████████████| 9265/9265 [12:24<00:00, 12.45it/s]


mAP@[0.01:0.1]: 
{'classes': 1,
 'map': 0.7967552542686462,
 'map_50': -1.0,
 'map_75': -1.0,
 'map_large': 0.7767672538757324,
 'map_medium': 0.8058081269264221,
 'map_per_class': -1.0,
 'map_small': 0.1382978856563568,
 'mar_1': 0.07661794871091843,
 'mar_10': 0.6348220705986023,
 'mar_100': 0.8271549344062805,
 'mar_100_per_class': -1.0,
 'mar_large': 0.8187423944473267,
 'mar_medium': 0.8337299227714539,
 'mar_small': 0.188043475151062}
mAP@[0.5:0.95]: 
{'classes': 1,
 'map': 0.4702816903591156,
 'map_50': 0.7478196620941162,
 'map_75': 0.5338643193244934,
 'map_large': 0.4327622652053833,
 'map_medium': 0.4828207790851593,
 'map_per_class': -1.0,
 'map_small': 0.0,
 'mar_1': 0.05662974342703819,
 'mar_10': 0.4458189904689789,
 'mar_100': 0.5507248640060425,
 'mar_100_per_class': -1.0,
 'mar_large': 0.5346531271934509,
 'mar_medium': 0.555541455745697,
 'mar_small': 0.0}
Training Finished !


### Test Fine-Tuned Model

Test the finetuned FasterRCNN by running an infernce on random images of the validation-dataset.

In [None]:
dataset_test.dataset.__getitem__(0)

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# Load last checkpoint
# CHANGE THE OUTPUT_DIR IF CKPT IS STORED ELSEWHERE
checkpoint_dir = f"output-16-03-2024/epoch_32_model.pth"
checkpoint = torch.load(checkpoint_dir, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])

In [None]:
random.seed(SEED)
crowdai = ["Background", "Door"]
num_images = NUM_TEST_IMAGES

for _ in range(num_images):
  x = random.randint(0, (dataset_test.__len__()-1))
  img, target = dataset_test.dataset.__getitem__(x)
  img = img.to(device)

  # Load last checkpoint
  # CHANGE THE OUTPUT_DIR IF CKPT IS STORED ELSEWHERE
  checkpoint_dir = f"output-16-03-2024/epoch_32_model.pth"
  checkpoint = torch.load(checkpoint_dir, map_location=device)
  model.load_state_dict(checkpoint['model_state_dict'])

  print(dataset_test.dataset.gt_info.iloc[x])

  boxes, scores, labels = inference(img, model)
  # print(target, boxes, scores, labels)

  img = img.squeeze(0).cpu().permute(1,2,0).numpy()
  plot_image(img, boxes, scores, labels, crowdai, save_path=f"inference_{x}.png")