# Read This before getting started

## Overview

This notebook (Colab) was created to simplify the process of training a Faster R-CNN model on a custom object detection dataset. It allows users to quickly train and evaluate the model on their dataset, focusing on generating test metrics like mAP scores. This notebook is designed to work with datasets in COCO format and provides an easy workflow for training a Faster R-CNN model.

The goal is to help users avoid the complexities of setting up the environment, data preprocessing, model configuration, and evaluation steps, all in a single notebook.

## Why This Notebook Was Created

- **Simplified Workflow:** This notebook offers a streamlined way to train and evaluate Faster R-CNN on custom datasets.
- **COCO Format Support:** It is built to work with datasets in the COCO format, one of the most popular formats for object detection tasks.
- **Metrics Calculation:** The notebook automatically computes the mean Average Precision (mAP) score during testing, allowing users to assess model performance.

## Dataset Format Guidelines

### 1. **COCO Format**
   - The dataset should be in the COCO format, which includes two main components: **Images** and **Annotations**.
   - The images should be stored in a directory, and the annotations should be in a JSON file that links the images to their respective object annotations.
   - Important: Make sure to include the full absolute path for all images in the JSON file (whether the images are stored locally or on a drive).

### 2. **JSON File Structure Example**

The JSON file should follow this format for object detection:

```json
{
  "images": [
    {
      "id": 1,
      "file_name": "/path/to/image1.jpg",  // Full absolute path to the image
      "height": 720,
      "width": 1280
    },
    {
      "id": 2,
      "file_name": "/path/to/image2.jpg",  // Full absolute path to the image
      "height": 720,
      "width": 1280
    }
  ],
  "annotations": [
    {
      "image_id": 1,
      "category_id": 1,
      "bbox": [x_min, y_min, width, height],
      "area": area,
      "iscrowd": 0
    },
    {
      "image_id": 1,
      "category_id": 2,
      "bbox": [x_min, y_min, width, height],
      "area": area,
      "iscrowd": 0
    }
  ],
  "categories": [
    {
      "id": 1,
      "name": "class1"
    },
    {
      "id": 2,
      "name": "class2"
    }
  ]
}

```
## 3. Important Notes

### Category IDs:
Ensure the class IDs start from 1 (not 0). The ID `0` is reserved for the background class.

### Bounding Boxes:
Annotations should include bounding box coordinates in the form `[x_min, y_min, width, height]`.

### Category Names:
List all the object classes in the `categories` section of the JSON.

## 4. Example Directory Structure

```
dataset/
├── images/
│   ├── image1.jpg
│   ├── image2.jpg
└── annotations/
    └── ann.json

```
## Training Guidelines

### Batch Size Consideration:
The batch size is an important parameter that influences GPU memory usage (VRAM).
If Colab crashes due to insufficient VRAM, try reducing the batch size.
For instance, reduce the batch size from 8 to 4 or 2 in the `create_dataloader` function.

### Colab Crashes:
If the Colab session crashes due to memory limits, restart the session.
Lower the batch size and retry the training. This will help the model train without running out of memory.

### Training Time:
Depending on your dataset size and batch size, training could take several hours. Be patient or consider using a more powerful environment if needed.

### Metrics Calculation:
After each epoch, the notebook will evaluate the model and display the mean Average Precision (mAP) score for the validation dataset.
Pay attention to these metrics to evaluate the performance of your model.

### Adjusting Hyperparameters:
If you wish to fine-tune the model, feel free to adjust hyperparameters like learning rate (`lr`), weight decay, and step size for the learning rate scheduler.

## Final Notes

This notebook was created by [**ady-cf**](https://github.com/ady-cf).
Feel free to fork this notebook and adapt it for your own use.
If you face any issues or need further assistance, please create an issue on the GitHub repository or reach out to [**ady-cf**](https://github.com/ady-cf).

Enjoy training your Faster R-CNN model for object detection!

# Implementation

## Mount Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Download and Import Libraries

### Install libs

In [2]:
!pip install torchmetrics[detection]

Collecting torchmetrics[detection]
  Downloading torchmetrics-1.6.2-py3-none-any.whl.metadata (20 kB)
Collecting lightning-utilities>=0.8.0 (from torchmetrics[detection])
  Downloading lightning_utilities-0.14.0-py3-none-any.whl.metadata (5.6 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->torchmetrics[detection])
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->torchmetrics[detection])
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->torchmetrics[detection])
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->torchmetrics[detection])
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting 

### Download Helper Scripts for Object Detection

This cell downloads essential helper scripts from the PyTorch Vision repository, specifically for object detection tasks. These scripts provide functionalities for training, evaluation, and data transformation, particularly when working with the COCO dataset.

For a detailed tutorial on object detection with PyTorch and Torchvision, refer to this link: [PyTorch Object Detection Tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)

**Scripts Downloaded:**

* **`engine.py`:** Contains functions for training and evaluation loops.
* **`utils.py`:** Provides general utility functions used in object detection.
* **`coco_utils.py`:** Contains utility functions specific to the COCO dataset.
* **`coco_eval.py`:** Implements evaluation metrics for COCO-style datasets.
* **`transforms.py`:** Defines data transformations for image preprocessing.

**Usage:**

Run this cell to download the scripts directly into your Colab environment. These scripts are then available for import and use in subsequent cells of your notebook.

**Note:**

* These scripts are directly from the `main` branch of the PyTorch Vision repository. Ensure you're using a compatible version of PyTorch and Torchvision.
* After running this cell, you can import functions from these scripts like `from engine import train_one_epoch, evaluate` or `from utils import ...`.
* These scripts are useful when working with object detection tasks, especially when your data uses the COCO annotation format.


In [3]:
import os
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")

0

In [4]:
import os
import io
import time
import errno
import math
import random
import copy
import json
import datetime
import contextlib
from collections import defaultdict, deque
from typing import Dict, List, Optional, Tuple, Union
from contextlib import redirect_stdout

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import cv2

# PyTorch and torchvision
import torch
import torch.nn as nn
import torch.utils.data
import torch.distributed as dist
from torch import Tensor
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import random_split, DataLoader

import torchvision
from torchvision import ops, transforms as T
from torchvision.io import read_image
from torchvision.datasets import CocoDetection
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import mask_rcnn
from torchvision.transforms.v2 import functional as F, InterpolationMode
from torchvision.tv_tensors import Image, BoundingBoxes

# Pycocotools
from pycocotools import mask as coco_mask
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import pycocotools.mask as mask_util

# Torchmetrics
import torchmetrics
from torchmetrics.detection.mean_ap import MeanAveragePrecision


import sys
sys.path.append("/content")
import utils
from coco_eval import CocoEvaluator, convert_to_xywh
from coco_eval import CocoEvaluator
from coco_utils import get_coco_api_from_dataset

## Faster-RCNN

### Custom Training Functions

This cell contains custom modifications of functions from `engine.py` used in PyTorch's object detection framework.

## Functions

1. **train_one_epoch**:
   - This function trains the model for one epoch.
   - It logs metrics, computes losses, and updates the optimizer.
   - Supports automatic mixed precision (AMP) with `torch.amp.autocast` if a scaler is provided.
   - This function is a modified version of the `train_one_epoch` function from `engine.py` for easier integration and use.

2. **_get_iou_types**:
   - This function retrieves the IoU (Intersection over Union) types used for evaluating the model.
   - The function checks if the model is of type `MaskRCNN` or `KeypointRCNN` to include segmentations and keypoints, respectively.

## Usage

- This code assumes that you have a PyTorch object detection model, a dataset, and an optimizer set up.
- The `train_one_epoch` function can be used to train the model for a single epoch. It requires the following parameters:
  - `model`: The model to train (e.g., Faster R-CNN, Mask R-CNN).
  - `optimizer`: The optimizer used for training.
  - `lr_scheduler`: The learning rate scheduler (optional).
  - `data_loader`: DataLoader for the training dataset.
  - `device`: The device (CPU or GPU) where the model should be trained.
  - `epoch`: The current epoch number.
  - `print_freq`: Frequency at which to print progress.
  - `scaler`: A `GradScaler` for mixed precision training (optional).

## Modifications

- These functions have been modified from the original `engine.py` to simplify their usage in a Colab environment.

In [5]:
def train_one_epoch(model, optimizer,lr_scheduler,data_loader, device, epoch, print_freq, scaler=None):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    metric_logger.add_meter("lr", utils.SmoothedValue(window_size=1, fmt="{value:.6f}"))
    header = f"Epoch: [{epoch}]"

    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in t.items()} for t in targets]
        with torch.amp.autocast('cuda',enabled=scaler is not None):
            loss_dict = model(images, targets)
            losses = sum(loss for loss in loss_dict.values())

        # reduce losses over all GPUs for logging purposes
        loss_dict_reduced = utils.reduce_dict(loss_dict)
        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        loss_value = losses_reduced.item()

        if not math.isfinite(loss_value):
            print(f"Loss is {loss_value}, stopping training")
            print(loss_dict_reduced)
            sys.exit(1)

        optimizer.zero_grad()
        if scaler is not None:
            scaler.scale(losses).backward()
            scaler.step(optimizer)
            scaler.update()
        else:
            losses.backward()
            optimizer.step()

        if lr_scheduler is not None:
            lr_scheduler.step()

        metric_logger.update(loss=losses_reduced, **loss_dict_reduced)
        metric_logger.update(lr=optimizer.param_groups[0]["lr"])

    return metric_logger


def _get_iou_types(model):
    model_without_ddp = model
    if isinstance(model, torch.nn.parallel.DistributedDataParallel):
        model_without_ddp = model.module
    iou_types = ["bbox"]
    if isinstance(model_without_ddp, torchvision.models.detection.MaskRCNN):
        iou_types.append("segm")
    if isinstance(model_without_ddp, torchvision.models.detection.KeypointRCNN):
        iou_types.append("keypoints")
    return iou_types

### Custom Evaluation Functions

This cell contains custom modifications for evaluating an object detection model in PyTorch, particularly for calculating mean Average Precision (mAP) scores.

## Functions

1. **evaluate**:
   - This function evaluates the model on a given dataset and computes the mean Average Precision (mAP) at different IoU thresholds (0.5 and 0.5:0.95).
   - It uses the `MeanAveragePrecision` evaluator from `torchmetrics` to calculate mAP scores.
   - The function runs the model in inference mode, performs non-maximum suppression on the predictions, and compares them with ground truth boxes to compute the precision metrics.
   - Supports both GPU and CPU-based inference.

2. **print_metrics**:
   - This function prints the evaluation results.
   - It displays the overall mAP for both mAP@50 and mAP@50-95, as well as per-class mAP for each IoU threshold.
   - It provides insights into the model's performance across different classes.

## Usage

- The `evaluate` function requires the following parameters:
  - `model`: The trained model to evaluate (e.g., Faster R-CNN).
  - `data_loader`: The DataLoader for the test/validation dataset.
  - `device`: The device (CPU or GPU) where the model should be evaluated.
  - `verbose`: A boolean flag for printing more detailed evaluation logs (default is `False`).

- The `print_metrics` function is called after evaluating the model with `evaluate` to display the mAP scores.
  - `results_map50`: The mAP@50 evaluation results.
  - `results_map50_95`: The mAP@50-95 evaluation results.
  - `CLASSES`: A list of class names corresponding to the labels in the dataset.

## Modifications

- These functions are based on the original `evaluate` and evaluation routines from  `engine.py`, with custom adjustments to improve usability and reporting.


In [6]:
@torch.inference_mode()
def evaluate(model, data_loader, device, verbose=False):
    n_threads = torch.get_num_threads()
    torch.set_num_threads(1)

    device = torch.device(device)
    model.to(device)
    model.eval()

    metric_logger = utils.MetricLogger(delimiter="  ")
    header = "Test:"

    evaluator_map50 = MeanAveragePrecision(iou_thresholds=[0.5], class_metrics=True, extended_summary=True)
    evaluator_map50_95 = MeanAveragePrecision(class_metrics=True, extended_summary=True)

    for images, target_batch in tqdm(metric_logger.log_every(data_loader, 100, header), total=len(data_loader)):
        images = [img.to(device) for img in images]

        if torch.cuda.is_available():
            torch.cuda.synchronize()

        with torch.no_grad():
            outputs = model(images)

        for i in range(len(images)):
            gt_boxes = target_batch[i]["boxes"]
            gt_labels = target_batch[i]["labels"]

            valid_gt_idx = gt_labels != 0
            gt_boxes = gt_boxes[valid_gt_idx]
            gt_labels = gt_labels[valid_gt_idx]

            pred_boxes = outputs[i]["boxes"].detach().cpu()
            pred_scores = outputs[i]["scores"].detach().cpu()
            pred_labels = outputs[i]["labels"].detach().cpu()

            valid_pred_idx = (pred_labels != 0)
            pred_boxes = pred_boxes[valid_pred_idx]
            pred_scores = pred_scores[valid_pred_idx]
            pred_labels = pred_labels[valid_pred_idx]

            evaluator_map50.update(
                preds=[{"boxes": pred_boxes, "scores": pred_scores, "labels": pred_labels}],
                target=[{"boxes": gt_boxes, "labels": gt_labels}]
            )

            evaluator_map50_95.update(
                preds=[{"boxes": pred_boxes, "scores": pred_scores, "labels": pred_labels}],
                target=[{"boxes": gt_boxes, "labels": gt_labels}]
            )

    torch.set_num_threads(n_threads)

    results_map50 = evaluator_map50.compute()
    results_map50_95 = evaluator_map50_95.compute()

    return results_map50, results_map50_95

def print_metrics(results_map50, results_map50_95, CLASSES):

    print("\n\n==== Validation Results (mAP@50-95) ====")
    print(f"mAP50-95: {results_map50_95['map']:.3f}")

    print("\nPer Class mAP (0.50:0.95 IoU threshold):")

    map_per_class_50_95 = results_map50_95["map_per_class"].cpu().to_dense().numpy()
    for i, map_value in enumerate(map_per_class_50_95):
        class_name = CLASSES[i]
        print(f"{class_name}: {map_value:.3f}")

    print("\n\n==== Validation Results (mAP@50) ====")
    print(f"mAP@50: {results_map50['map_50']:.3f}")

    print("\nPer Class mAP (50% IoU threshold):")

    map_per_class_50 = results_map50["map_per_class"].cpu().to_dense().numpy()
    for i, map_value in enumerate(map_per_class_50):
        class_name = CLASSES[i]
        print(f"{class_name}: {map_value:.3f}")

    print("\n")

### Custom COCO Dataset and Training Pipeline

This cell defines a custom dataset class and the necessary functions for training a Faster R-CNN model using a COCO-style dataset.

## Functions

1. **CustomCocoDataset**:
   - This is a custom dataset class that inherits from `torchvision.datasets.CocoDetection` to handle loading and processing of images and annotations.
   - It extracts bounding box information, labels, areas, and crowd annotations for each object in an image.
   - The class also handles transformations on the image and target, if provided.
   - `__getitem__` returns the image and corresponding target dictionary, which includes the bounding boxes, labels, and other annotations.

2. **get_coco_dataset**:
   - This function initializes the `CustomCocoDataset` by passing the image directory and annotation file.
   - It can be used to load the dataset for both training and testing.

3. **collate_fn**:
   - This function is used to combine a list of samples into a batch during data loading. It is passed as the `collate_fn` parameter when creating the `DataLoader`.

4. **split_dataset**:
   - This function splits the dataset into training and testing subsets based on the `train_fraction` parameter.
   - By default, 90% of the data is used for training, and 10% for testing.
   - You can change the `train_fraction` parameter to adjust the split ratio.
   - The random split is done using a fixed seed (`SEED`) for reproducibility, but you can modify it as needed.

5. **create_dataloader**:
   - This function creates and returns the `DataLoader` instances for both training and testing datasets.
   - The `batch_size_train` and `batch_size_test` parameters control the batch sizes for each dataset.

6. **get_model**:
   - This function initializes a Faster R-CNN model pre-trained on the COCO dataset and replaces its classifier head with a new one suitable for a custom number of classes.
   - The `num_classes` parameter specifies the number of output classes (including the background).

7. **setup_optimizer_scheduler**:
   - This function sets up the optimizer (Adam) and learning rate scheduler (StepLR) for training.
   - It returns the optimizer and scheduler, which can be used during the training loop.

8. **train_and_evaluate**:
   - This function handles the entire training and evaluation process for a specified number of epochs.
   - It calls the `train_one_epoch` function for each epoch and evaluates the model on the test dataset using the `evaluate` function.
   - It prints out evaluation metrics after each epoch using the `print_metrics` function.

## How to Use

1. To change the train/test split, adjust the `train_fraction` parameter in the `split_dataset` function. The default is 0.9 (90% for training and 10% for testing). To change this:
   - Set a different value for `train_fraction` (e.g., `train_fraction=0.8` for 80% training and 20% testing).

2. The training loop will automatically use the specified split for training and testing.

3. Use the `train_and_evaluate` function to run the training and evaluation pipeline, where you can specify the number of epochs, device (CPU/GPU), and the list of class names (`CLASSES`).

4. Modify `batch_size_train` and `batch_size_test` to adjust the batch sizes as needed.



In [7]:
SEED = 42
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
np.random.seed(SEED)
random.seed(SEED)

class CustomCocoDataset(torchvision.datasets.CocoDetection):
    def __init__(self, img_folder, ann_file, transforms=None):
        super().__init__(img_folder, ann_file)
        self.transforms = transforms

    def __getitem__(self, idx):
        image, target = super().__getitem__(idx)
        image_info = self.coco.loadImgs(self.ids[idx])[0]
        image_path = os.path.join(self.root, image_info["file_name"])

        boxes, labels, areas, iscrowd = [], [], [], []
        for obj in target:
            x_min, y_min, width, height = obj["bbox"]
            x_max, y_max = x_min + width, y_min + height
            boxes.append([x_min, y_min, x_max, y_max])
            labels.append(obj["category_id"])
            areas.append(width * height)
            iscrowd.append(obj.get("iscrowd", 0))

        if not boxes:
            boxes = [[0, 0, 1, 1]]
            labels = [0]
            areas = [1.0]
            iscrowd = [0]

        image = T.ToTensor()(image)
        target = {
            "boxes": torch.tensor(boxes, dtype=torch.float32),
            "labels": torch.tensor(labels, dtype=torch.int64),
            "image_id": torch.tensor([idx], dtype=torch.int64),
            "area": torch.tensor(areas, dtype=torch.float32),
            "iscrowd": torch.tensor(iscrowd, dtype=torch.uint8),
            "image_path": image_path,
        }

        if self.transforms:
            image, target = self.transforms(image, target)

        return image, target

    def __len__(self):
        return len(self.ids)

def get_coco_dataset(img_dir, ann_file):
    return CustomCocoDataset(img_dir, ann_file, transforms=None)

def collate_fn(batch):
    return tuple(zip(*batch))

def split_dataset(dataset, train_fraction=0.9, seed=SEED):
    train_size = int(train_fraction * len(dataset))
    test_size = len(dataset) - train_size
    return random_split(dataset, [train_size, test_size], generator=torch.Generator().manual_seed(seed))

def create_dataloader(train_dataset, test_dataset, batch_size_train=8, batch_size_test=4):
    train_loader = DataLoader(train_dataset, batch_size=batch_size_train, shuffle=True, collate_fn=collate_fn)
    test_loader = DataLoader(test_dataset, batch_size=batch_size_test, shuffle=False, collate_fn=collate_fn)
    return train_loader, test_loader

def get_model(num_classes):
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="COCO_V1")
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model

def setup_optimizer_scheduler(model, lr=0.0001, weight_decay=0.0005, step_size=5, gamma=0.5):
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.Adam(params, lr=lr, weight_decay=weight_decay)
    lr_scheduler = StepLR(optimizer, step_size=step_size, gamma=gamma)
    return optimizer, lr_scheduler

def train_and_evaluate(model, train_loader, test_loader, optimizer, lr_scheduler, num_epochs, device, CLASSES):
    for epoch in range(num_epochs):
        train_one_epoch(model, optimizer, lr_scheduler, train_loader, device, epoch, print_freq=len(train_loader))
        o1, o2 = evaluate(model, test_loader, device, verbose=True)
        print_metrics(o1, o2, CLASSES)

### Object Detection Model Training and Evaluation

This cell contains the code for training a Faster R-CNN model on a custom dataset for object detection. The dataset is expected to be in the COCO format, and the code includes steps for loading the dataset, training the model, and evaluating the results.

Steps

1. Dataset Paths
   - The dataset consists of images and annotations stored in COCO format.
   - The images are located in a specified directory, and the annotations are provided in a corresponding JSON file.
   
   Example Path:
   - Image folder: "/path/to/images"
   - Annotation file: "/path/to/annotations.json"

2. Dataset and DataLoader
   - The dataset is loaded using the `get_coco_dataset` function.
   - The dataset is then split into training and testing subsets using the `split_dataset` function.
   - Data is loaded in batches via the `create_dataloader` function, with configurable batch sizes for both training and testing.

3. Model Setup
   - A Faster R-CNN model pre-trained on a standard dataset (like COCO) is loaded via the `get_model` function.
   - The model's final classifier is adjusted to output the correct number of classes based on the `num_classes` parameter (including the background class).
   - The model is moved to the device (GPU if available, otherwise CPU).

4. Optimizer and Scheduler
   - The Adam optimizer is initialized with a learning rate and weight decay.
   - A step learning rate scheduler (`StepLR`) is set up to decay the learning rate at specified intervals during training.

5. Class Labels
   - The classes to be detected (excluding the background) are specified in the `CLASSES` list.
   - Update this list with the actual class names (e.g., "Object1", "Object2", "Object3", etc.).
   
   Example:
   - `CLASSES = ["Object1", "Object2", "Object3"]`

6. Training and Evaluation
   - The `train_and_evaluate` function trains the model for a specified number of epochs and evaluates the model on the test dataset after each epoch.
   - The model's performance metrics (e.g., mAP) are reported after each epoch.

7. Training Complete
   - After the training process completes, a message "Training complete!" is printed to indicate the process has finished.


Modifications

1. Dataset Split:
   - The dataset is split into training and testing sets by default using a 90%/10% ratio. You can change the split by modifying the `train_fraction` parameter in the `split_dataset` function.
   - Example: To use a 80%/20% split: `split_dataset(full_dataset, train_fraction=0.8)`.

2. Class Names:
   - Update the `CLASSES` list to match the classes in your dataset (excluding the background class).

3. Number of Epochs:
   - By default, the model will train for 10 epochs. You can adjust the number of epochs by modifying the `num_epochs` parameter in the `train_and_evaluate` function.

4. Evaluation Metrics:
   - During training, the evaluation metrics (e.g., mean Average Precision or mAP) are printed after each epoch. You can add more evaluation metrics if required.

5. Optimizer and Scheduler:
   - You can modify the learning rate (`lr`), weight decay, step size, and gamma values in the `setup_optimizer_scheduler` function to tune training.



In [10]:
img_dir = ""
ann_file = ""
full_dataset = get_coco_dataset(img_dir, ann_file)
train_dataset, test_dataset = split_dataset(full_dataset)
train_loader, test_loader = create_dataloader(train_dataset, test_dataset)

num_classes = 0 # background + num of classes
model = get_model(num_classes)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

optimizer, lr_scheduler = setup_optimizer_scheduler(model)

CLASSES = []  # Dont include background class

train_and_evaluate(model, test_loader, test_loader, optimizer, lr_scheduler, num_epochs=10, device=device, CLASSES=CLASSES)

print("Training complete!")

loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Epoch: [0]  [0/2]  eta: 0:00:04  lr: 0.000100  loss: 2.4693 (2.4693)  loss_classifier: 1.6049 (1.6049)  loss_box_reg: 0.0837 (0.0837)  loss_objectness: 0.4705 (0.4705)  loss_rpn_box_reg: 0.3102 (0.3102)  time: 2.1650  data: 0.9534  max mem: 5599
Epoch: [0]  [1/2]  eta: 0:00:02  lr: 0.000100  loss: 1.5236 (1.9964)  loss_classifier: 0.4927 (1.0488)  loss_box_reg: 0.0837 (0.0895)  loss_objectness: 0.3907 (0.4306)  loss_rpn_box_reg: 0.3102 (0.4275)  time: 2.2352  data: 1.0272  max mem: 5599
Epoch: [0] Total time: 0:00:04 (2.2364 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.90s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.8985  data: 1.3272  max mem: 5599


100%|██████████| 2/2 [00:03<00:00,  1.77s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.7719  data: 1.1882  max mem: 5599
Test: Total time: 0:00:03 (1.7727 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.008

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.000
class2: 0.000
class3: 0.033
class4: 0.000


==== Validation Results (mAP@50) ====
mAP@50: 0.034

Per Class mAP (50% IoU threshold):
class1: 0.000
class2: 0.000
class3: 0.135
class4: 0.000


Epoch: [1]  [0/2]  eta: 0:00:04  lr: 0.000100  loss: 0.8315 (0.8315)  loss_classifier: 0.3176 (0.3176)  loss_box_reg: 0.2305 (0.2305)  loss_objectness: 0.0166 (0.0166)  loss_rpn_box_reg: 0.2668 (0.2668)  time: 2.1545  data: 0.9711  max mem: 5599
Epoch: [1]  [1/2]  eta: 0:00:02  lr: 0.000100  loss: 0.7191 (0.7753)  loss_classifier: 0.1761 (0.2469)  loss_box_reg: 0.0744 (0.1524)  loss_objectness: 0.0166 (0.0217)  loss_rpn_box_reg: 0.2668 (0.3543)  time: 2.2051  data: 1.0152  max mem: 5600
Epoch: [1] Total time: 0:00:04 (2.2062 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.53s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5280  data: 0.9694  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.58s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.5773  data: 0.9998  max mem: 5600
Test: Total time: 0:00:03 (1.5783 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.000

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.000
class2: 0.000
class3: 0.000
class4: 0.000


==== Validation Results (mAP@50) ====
mAP@50: 0.000

Per Class mAP (50% IoU threshold):
class1: 0.000
class2: 0.000
class3: 0.000
class4: 0.000


Epoch: [2]  [0/2]  eta: 0:00:04  lr: 0.000050  loss: 0.7468 (0.7468)  loss_classifier: 0.3511 (0.3511)  loss_box_reg: 0.2020 (0.2020)  loss_objectness: 0.0074 (0.0074)  loss_rpn_box_reg: 0.1863 (0.1863)  time: 2.3925  data: 1.1922  max mem: 5600
Epoch: [2]  [1/2]  eta: 0:00:02  lr: 0.000050  loss: 0.6804 (0.7136)  loss_classifier: 0.2710 (0.3110)  loss_box_reg: 0.1298 (0.1659)  loss_objectness: 0.0074 (0.0285)  loss_rpn_box_reg: 0.1863 (0.2081)  time: 2.4691  data: 1.2675  max mem: 5600
Epoch: [2] Total time: 0:00:04 (2.4703 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.49s/it]

Test:  [0/2]  eta: 0:00:02    time: 1.4921  data: 0.9278  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.56s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.5633  data: 0.9822  max mem: 5600
Test: Total time: 0:00:03 (1.5646 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.114

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.000
class2: 0.065
class3: 0.392
class4: 0.000


==== Validation Results (mAP@50) ====
mAP@50: 0.218

Per Class mAP (50% IoU threshold):
class1: 0.000
class2: 0.072
class3: 0.799
class4: 0.000


Epoch: [3]  [0/2]  eta: 0:00:04  lr: 0.000050  loss: 0.4579 (0.4579)  loss_classifier: 0.1525 (0.1525)  loss_box_reg: 0.1391 (0.1391)  loss_objectness: 0.0730 (0.0730)  loss_rpn_box_reg: 0.0934 (0.0934)  time: 2.1572  data: 0.9573  max mem: 5600
Epoch: [3]  [1/2]  eta: 0:00:02  lr: 0.000050  loss: 0.4579 (0.4734)  loss_classifier: 0.1525 (0.1532)  loss_box_reg: 0.1269 (0.1330)  loss_objectness: 0.0635 (0.0682)  loss_rpn_box_reg: 0.0934 (0.1190)  time: 2.2096  data: 1.0039  max mem: 5600
Epoch: [3] Total time: 0:00:04 (2.2108 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.66s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.6585  data: 1.0783  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.88s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.8778  data: 1.2685  max mem: 5600
Test: Total time: 0:00:03 (1.8788 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.163

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.057
class2: 0.113
class3: 0.362
class4: 0.119


==== Validation Results (mAP@50) ====
mAP@50: 0.338

Per Class mAP (50% IoU threshold):
class1: 0.143
class2: 0.154
class3: 0.780
class4: 0.277


Epoch: [4]  [0/2]  eta: 0:00:05  lr: 0.000050  loss: 0.4751 (0.4751)  loss_classifier: 0.1551 (0.1551)  loss_box_reg: 0.1553 (0.1553)  loss_objectness: 0.0892 (0.0892)  loss_rpn_box_reg: 0.0755 (0.0755)  time: 2.6340  data: 1.3855  max mem: 5600
Epoch: [4]  [1/2]  eta: 0:00:02  lr: 0.000025  loss: 0.4751 (0.5021)  loss_classifier: 0.1551 (0.1581)  loss_box_reg: 0.1553 (0.1614)  loss_objectness: 0.0449 (0.0671)  loss_rpn_box_reg: 0.0755 (0.1156)  time: 2.4710  data: 1.2324  max mem: 5600
Epoch: [4] Total time: 0:00:04 (2.4719 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.55s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5488  data: 0.9773  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.60s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.5982  data: 1.0094  max mem: 5600
Test: Total time: 0:00:03 (1.5993 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.185

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.038
class2: 0.145
class3: 0.321
class4: 0.235


==== Validation Results (mAP@50) ====
mAP@50: 0.371

Per Class mAP (50% IoU threshold):
class1: 0.125
class2: 0.182
class3: 0.712
class4: 0.467


Epoch: [5]  [0/2]  eta: 0:00:04  lr: 0.000025  loss: 0.4741 (0.4741)  loss_classifier: 0.1584 (0.1584)  loss_box_reg: 0.1924 (0.1924)  loss_objectness: 0.0329 (0.0329)  loss_rpn_box_reg: 0.0904 (0.0904)  time: 2.1511  data: 0.9475  max mem: 5600
Epoch: [5]  [1/2]  eta: 0:00:02  lr: 0.000025  loss: 0.4741 (0.5080)  loss_classifier: 0.1576 (0.1580)  loss_box_reg: 0.1924 (0.1939)  loss_objectness: 0.0329 (0.0375)  loss_rpn_box_reg: 0.0904 (0.1186)  time: 2.4655  data: 1.2360  max mem: 5600
Epoch: [5] Total time: 0:00:04 (2.4668 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.53s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5325  data: 0.9622  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.58s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.5818  data: 0.9953  max mem: 5600
Test: Total time: 0:00:03 (1.5826 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.204

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.018
class2: 0.206
class3: 0.372
class4: 0.219


==== Validation Results (mAP@50) ====
mAP@50: 0.348

Per Class mAP (50% IoU threshold):
class1: 0.091
class2: 0.250
class3: 0.724
class4: 0.329


Epoch: [6]  [0/2]  eta: 0:00:04  lr: 0.000025  loss: 0.4597 (0.4597)  loss_classifier: 0.1637 (0.1637)  loss_box_reg: 0.2140 (0.2140)  loss_objectness: 0.0049 (0.0049)  loss_rpn_box_reg: 0.0771 (0.0771)  time: 2.1683  data: 0.9572  max mem: 5600
Epoch: [6]  [1/2]  eta: 0:00:02  lr: 0.000025  loss: 0.4597 (0.4843)  loss_classifier: 0.1637 (0.1666)  loss_box_reg: 0.2140 (0.2188)  loss_objectness: 0.0046 (0.0048)  loss_rpn_box_reg: 0.0771 (0.0942)  time: 2.2237  data: 1.0060  max mem: 5600
Epoch: [6] Total time: 0:00:04 (2.2248 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.57s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5687  data: 0.9958  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.73s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.7313  data: 1.1253  max mem: 5600
Test: Total time: 0:00:03 (1.7322 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.204

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.033
class2: 0.188
class3: 0.348
class4: 0.247


==== Validation Results (mAP@50) ====
mAP@50: 0.384

Per Class mAP (50% IoU threshold):
class1: 0.083
class2: 0.250
class3: 0.794
class4: 0.408


Epoch: [7]  [0/2]  eta: 0:00:05  lr: 0.000013  loss: 0.4742 (0.4742)  loss_classifier: 0.1702 (0.1702)  loss_box_reg: 0.2426 (0.2426)  loss_objectness: 0.0051 (0.0051)  loss_rpn_box_reg: 0.0563 (0.0563)  time: 2.5737  data: 1.3454  max mem: 5600
Epoch: [7]  [1/2]  eta: 0:00:02  lr: 0.000013  loss: 0.4742 (0.4817)  loss_classifier: 0.1612 (0.1657)  loss_box_reg: 0.2161 (0.2294)  loss_objectness: 0.0033 (0.0042)  loss_rpn_box_reg: 0.0563 (0.0824)  time: 2.4058  data: 1.1755  max mem: 5600
Epoch: [7] Total time: 0:00:04 (2.4067 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.51s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5131  data: 0.9372  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.58s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.5762  data: 0.9840  max mem: 5600
Test: Total time: 0:00:03 (1.5771 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.227

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.015
class2: 0.214
class3: 0.418
class4: 0.262


==== Validation Results (mAP@50) ====
mAP@50: 0.398

Per Class mAP (50% IoU threshold):
class1: 0.083
class2: 0.286
class3: 0.768
class4: 0.454


Epoch: [8]  [0/2]  eta: 0:00:04  lr: 0.000013  loss: 0.4787 (0.4787)  loss_classifier: 0.1731 (0.1731)  loss_box_reg: 0.2488 (0.2488)  loss_objectness: 0.0029 (0.0029)  loss_rpn_box_reg: 0.0539 (0.0539)  time: 2.1656  data: 0.9484  max mem: 5600
Epoch: [8]  [1/2]  eta: 0:00:02  lr: 0.000013  loss: 0.4787 (0.4893)  loss_classifier: 0.1601 (0.1666)  loss_box_reg: 0.2262 (0.2375)  loss_objectness: 0.0024 (0.0027)  loss_rpn_box_reg: 0.0539 (0.0826)  time: 2.2305  data: 0.9991  max mem: 5600
Epoch: [8] Total time: 0:00:04 (2.2317 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.92s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.9241  data: 1.3405  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.75s/it]


Test:  [1/2]  eta: 0:00:01    time: 1.7530  data: 1.1572  max mem: 5600
Test: Total time: 0:00:03 (1.7543 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.195

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.020
class2: 0.175
class3: 0.328
class4: 0.256


==== Validation Results (mAP@50) ====
mAP@50: 0.417

Per Class mAP (50% IoU threshold):
class1: 0.083
class2: 0.333
class3: 0.745
class4: 0.508


Epoch: [9]  [0/2]  eta: 0:00:04  lr: 0.000013  loss: 0.4812 (0.4812)  loss_classifier: 0.1780 (0.1780)  loss_box_reg: 0.2499 (0.2499)  loss_objectness: 0.0009 (0.0009)  loss_rpn_box_reg: 0.0524 (0.0524)  time: 2.1808  data: 0.9704  max mem: 5600
Epoch: [9]  [1/2]  eta: 0:00:02  lr: 0.000006  loss: 0.4812 (0.4879)  loss_classifier: 0.1574 (0.1677)  loss_box_reg: 0.2278 (0.2389)  loss_objectness: 0.0009 (0.0023)  loss_rpn_box_reg: 0.0524 (0.0791)  time: 2.2292  data: 1.0105  max mem: 5600
Epoch: [9] Total time: 0:00:04 (2.2303 s / it)


 50%|█████     | 1/2 [00:01<00:01,  1.52s/it]

Test:  [0/2]  eta: 0:00:03    time: 1.5167  data: 0.9456  max mem: 5600


100%|██████████| 2/2 [00:03<00:00,  1.55s/it]

Test:  [1/2]  eta: 0:00:01    time: 1.5514  data: 0.9623  max mem: 5600
Test: Total time: 0:00:03 (1.5528 s / it)


==== Validation Results (mAP@50-95) ====
mAP50-95: 0.222

Per Class mAP (0.50:0.95 IoU threshold):
class1: 0.020
class2: 0.210
class3: 0.315
class4: 0.343


==== Validation Results (mAP@50) ====
mAP@50: 0.482

Per Class mAP (50% IoU threshold):
class1: 0.067
class2: 0.400
class3: 0.708
class4: 0.751


Dummy Training complete!



