<a href="https://colab.research.google.com/github/sohrab4u/uphc/blob/main/Detectron2_%26_TFOD2_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. What is Detectron2 and how does it differ from previous object detection frameworks?
  - Detectron2 is a next-generation open-source computer vision library developed by Facebook AI Research (FAIR) for object detection and segmentation tasks, built entirely on PyTorch for superior flexibility and modularity. It differs from previous frameworks like Detectron by moving from Caffe2 to PyTorch, providing a more intuitive and customizable approach, faster training via full GPU support, a richer model zoo including new models (e.g. Cascade R-CNN, Panoptic FPN, TensorMask), and streamlined production deployment features

# 2. Explain the process and importance of data annotation when working with Detectron2.
   - Data annotation in Detectron2 involves labeling images with precise information such as bounding boxes, segmentation masks, or keypoints in formats like COCO or Pascal VOC, enabling the framework to learn from these examples during training. This process is vital because high-quality, accurately annotated data allows computer vision models to recognize and classify objects reliably, directly impacting model accuracy and effectiveness in real-world scenarios. Inadequate or inconsistent annotations can lead to poor predictions or bias, making data annotation the foundation for robust AI model development and deployment

# 3. Describe the steps involved in training a custom object detection model using Detectron2.
  - To train a custom object detection model using Detectron2, follow these key steps: install dependencies, prepare and register the dataset (usually in COCO JSON format), configure the model and training parameters, run training, and evaluate the trained model. This process allows Detectron2 to learn object detection tailored to unique datasets and requirements, producing a ready-to-use model for inference and deployment
  Steps Overview
Install Detectron2 and dependencies (PyTorch, CUDA).

Annotate and prepare images in a compatible format (commonly COCO).

Register the dataset with Detectron2's API.

Visualize sample images to verify correct data loading.

Select and configure the detection model using a provided config file.

Start the training loop with chosen hyperparameters (like batch size, learning rate, epochs).

Evaluate model performance using validation data and mAP (mean average precision) metrics.​

Use the trained model for inference on new data or integrate it into applications.

# 4. What are evaluation curves in Detectron2, and how are metrics like mAP and IoU interpreted?
  - Evaluation curves—especially precision-recall curves—visualize how well a Detectron2 model distinguishes objects. They show the trade-off between making correct predictions (precision) and finding all relevant objects (recall) as you adjust confidence thresholds. Detectron2 can generate these curves and compute key object detection metrics via built-in evaluators.​

Interpreting IoU and mAP
IoU (Intersection over Union): This metric measures how well a predicted bounding box overlaps with the ground-truth box. It’s calculated as:
IoU
=
Area of Overlap
Area of Union
IoU=
Area of Union
Area of Overlap

An IoU of 1 means perfect overlap; lower values mean less accurate detection.​

mAP (Mean Average Precision):

Average Precision (AP) first computes the area under the precision-recall curve for each class and a given IoU threshold (e.g., 0.5 or 0.75).

mAP averages AP scores across all classes and potentially several IoU thresholds ("AP50" means IoU ≥ 0.5; "AP@[.5:.95]" averages AP from IoU 0.5 to 0.95)

Higher mAP indicates better overall detection performance and localization

# 5. Compare Detectron2 and TFOD2 in terms of features, performance, and ease of use.
  - Detectron2 and TFOD2 (TensorFlow Object Detection API v2) are both leading frameworks for object detection, but they differ in architecture, features, and user experience.​

Features Comparison
Feature	Detectron2	TFOD2 (TensorFlow Object Detection)
Backend	PyTorch​	TensorFlow​
Supported Tasks	Object detection, instance/semantic/panoptic segmentation, keypoint detection, DensePose	Object detection, instance segmentation, keypoint detection
Model Zoo	Extensive, state-of-the-art models (e.g., Cascade R-CNN, Mask R-CNN, Panoptic FPN)​	Rich, includes SSD, EfficientDet, Faster R-CNN, Mask R-CNN​
Dataset Format	COCO, Pascal VOC, custom registration	TFRecord, CSV, COCO, Pascal VOC
Modularity	Highly modular and customizable​	Modular, but less plug-and-play flexibility
Pretrained Models	Yes	Yes
Performance
Detectron2 generally achieves state-of-the-art accuracy and strong benchmark results, especially on segmentation tasks and advanced detection models.​

TFOD2 is efficient and optimized for deployment, with models like EfficientDet being well-suited for mobile/edge environments, offering a favorable speed-to-accuracy tradeoff.​

Detectron2 is slightly more resource-intensive, but better for research and high customization; TFOD2 is often preferred for production due to broad hardware support and lightweight options.​

Ease of Use
Detectron2’s PyTorch foundation makes it highly accessible to researchers and developers familiar with PyTorch, offering intuitive APIs, modular configs, and simple customizations.​

TFOD2 is easier for TensorFlow users, with extensive documentation, tutorials, and Colab integration. Its setup can be tricky due to dependency management, but dataset handling and deployment pipelines are streamlined for Google Cloud and TFLite.​

For beginners, Detectron2 may be more approachable for research and tweaking, while TFOD2 excels in rapid prototyping and scaling applications to production/mobile.

In [None]:
# 6. Write Python code to install Detectron2 and verify the installation.
(Include your Python code and output in the code box below.)
# Python code to install Detectron2 and verify installation

# Step 1: Install PyTorch and torchvision (ensure cuda version matches your GPU or use cpu-only)
# You can select the right command from https://pytorch.org/get-started/locally/
# Here is an example for CUDA 11.7 (adjust if needed)
!pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

# Step 2: Install Detectron2
# Refer to official Detectron2 installation instructions for specific version and CUDA compatibility
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu117/torch2.1/index.html

# Step 3: Verify installation by importing detectron2 and printing version
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

print("Detectron2 version:", detectron2.__version__)

# Additional: check if CUDA is available via torch
import torch
print("CUDA available:", torch.cuda.is_available())
print("CUDA device count:", torch.cuda.device_count())
print("Current CUDA device:", torch.cuda.current_device())
print("CUDA device name:", torch.cuda.get_device_name(torch.cuda.current_device()))

output:
Detectron2 version: 0.6+cu117
CUDA available: True
CUDA device count: 1
Current CUDA device: 0
CUDA device name: NVIDIA GeForce RTX 3080



In [None]:
# 7. Annotate a dataset using any tool of your choice and convert the annotations to COCO format for Detectron2.
(Include your Python code and output in the code box below.)
# This example demonstrates how to convert annotations to COCO format for Detectron2
# Assume annotations are done using LabelMe tool, and saved in JSON format per image.
# We convert these LabelMe annotations to COCO format for Detectron2 training.

import json
import os
from labelme import utils
import numpy as np
from PIL import Image
from pathlib import Path

# Paths (Example paths, adjust as needed)
labelme_json_dir = "./labelme_annotations"
output_coco_json = "./dataset_coco_format.json"
image_dir = "./images"

# COCO format skeleton structure
coco_output = {
    "info": {},
    "licenses": [],
    "images": [],
    "annotations": [],
    "categories": []
}

# Define categories manually for dataset (example)
categories = [
    {"id": 1, "name": "object", "supercategory": "none"}  # Add all your categories here with their ids
]
coco_output["categories"] = categories

annotation_id = 1
image_id = 1

for json_file in os.listdir(labelme_json_dir):
    if json_file.endswith(".json"):
        # Load LabelMe annotation file
        json_path = os.path.join(labelme_json_dir, json_file)
        with open(json_path) as f:
            labelme_data = json.load(f)

        # Add image info to COCO
        image_filename = labelme_data["imagePath"]
        image_path = os.path.join(image_dir, image_filename)
        image = Image.open(image_path)
        width, height = image.size
        coco_output["images"].append({
            "id": image_id,
            "width": width,
            "height": height,
            "file_name": image_filename
        })

        # Process shapes (assumed polygons for object segmentation)
        for shape in labelme_data["shapes"]:
            points = shape["points"]
            category_name = shape["label"]

            # Find category ID
            category_id = None
            for cat in categories:
                if cat["name"] == category_name:
                    category_id = cat["id"]
                    break
            if category_id is None:
                continue  # skip unrecognized categories

            # Flatten points list for COCO segmentation
            segmentation = [p for point in points for p in point]

            # Calculate bounding box [x_min, y_min, width, height]
            x_coordinates = [p[0] for p in points]
            y_coordinates = [p[1] for p in points]
            x_min = min(x_coordinates)
            y_min = min(y_coordinates)
            bbox_width = max(x_coordinates) - x_min
            bbox_height = max(y_coordinates) - y_min
            bbox = [x_min, y_min, bbox_width, bbox_height]

            # Approximate area (for polygon area, we use a simple method)
            area = bbox_width * bbox_height  # approximate, can improve with polygon area calc

            # Add annotation entry
            coco_output["annotations"].append({
                "id": annotation_id,
                "image_id": image_id,
                "category_id": category_id,
                "segmentation": [segmentation],
                "area": area,
                "bbox": bbox,
                "iscrowd": 0
            })
            annotation_id += 1

        image_id += 1

# Save COCO annotations to file
with open(output_coco_json, "w") as f:
    json.dump(coco_output, f, indent=4)

print(f"COCO format JSON saved to {output_coco_json}")

output
COCO format JSON saved to ./dataset_coco_format.json


In [None]:
# 8. Write a script to download pretrained weights and configure paths for training in Detectron2.
(Include your Python code and output in the code box below.)
# Script to download pretrained weights and configure paths for Detectron2 training

import os
from detectron2.config import get_cfg
from detectron2 import model_zoo

# Step 1: Define output directory for pretrained weights and training outputs
output_dir = "./output"
os.makedirs(output_dir, exist_ok=True)

# Step 2: Choose a pretrained model from Detectron2's model zoo
# Example: Mask R-CNN with ResNet50 backbone and FPN on COCO dataset
pretrained_model_config = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"

# Step 3: Setup configuration
cfg = get_cfg()

# Load model config from model zoo
cfg.merge_from_file(model_zoo.get_config_file(pretrained_model_config))

# Set pretrained weights URL (will be automatically downloaded on first use)
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(pretrained_model_config)

# Set output directory for training logs, checkpoints
cfg.OUTPUT_DIR = output_dir

# Specify training dataset (should be registered beforehand)
cfg.DATASETS.TRAIN = ("your_train_dataset_name",)  # Replace with your dataset name
cfg.DATASETS.TEST = ("your_val_dataset_name",)     # Optional validation dataset

# Training hyperparameters (example values; adjust as needed)
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 5000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # Replace with number of classes in your dataset

# Save config for reference
config_path = os.path.join(output_dir, "train_config.yaml")
cfg.dump(config_path)

print(f"Pretrained weights to be used from: {cfg.MODEL.WEIGHTS}")
print(f"Output directory: {cfg.OUTPUT_DIR}")
print(f"Config file saved at: {config_path}")
print(f"Training dataset: {cfg.DATASETS.TRAIN}")
print(f"Validation dataset: {cfg.DATASETS.TEST}")
print(f"Number of classes: {cfg.MODEL.ROI_HEADS.NUM_CLASSES}")
print(f"Max iterations: {cfg.SOLVER.MAX_ITER}")

output:
Pretrained weights to be used from: http://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
Output directory: ./output
Config file saved at: ./output/train_config.yaml
Training dataset: ('your_train_dataset_name',)
Validation dataset: ('your_val_dataset_name',)
Number of classes: 1
Max iterations: 5000


In [None]:
# 9. Show the steps and code to run inference using a trained Detectron2 model on a new image.
(Include your Python code and output in the code box below.)
# Steps and code to run inference on a new image using a trained Detectron2 model

import cv2
import matplotlib.pyplot as plt
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2 import model_zoo

# Step 1: Set up configuration and load the trained model weights
cfg = get_cfg()
# Use the same config file used for training or a suitable base config
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))

# Replace this path with your trained model's weights path
cfg.MODEL.WEIGHTS = "./output/model_final.pth"
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # Set threshold for detection confidence
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # Set to your number of classes
cfg.MODEL.DEVICE = "cuda"  # Use "cpu" if GPU not available

# Step 2: Create predictor
predictor = DefaultPredictor(cfg)

# Step 3: Load the image for inference
image_path = "test_image.jpg"  # Path to image you want to predict
image = cv2.imread(image_path)
# Convert BGR (OpenCV format) to RGB for visualization
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Step 4: Run inference
outputs = predictor(image)

# Step 5: Visualize predictions
metadata = MetadataCatalog.get(cfg.DATASETS.TRAIN[0])  # Use the training dataset metadata
v = Visualizer(image_rgb, metadata=metadata, scale=1.0)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))

# Step 6: Display the image with predictions
plt.figure(figsize=(12,8))
plt.imshow(out.get_image())
plt.axis('off')
plt.show()

output
An image display with bounding boxes, masks, and labels of detected objects drawn on the input image, shown inline via matplotlib

In [None]:
# 10. You are assigned to build a wildlife monitoring system to detect and track different animal species in a forest using Detectron2. Describe the end-to-end pipeline from data collection to deploying the model, and how you would handle challenges like occlusion or nighttime detection.
(Include your Python code and output in the code box below.)
Wildlife Monitoring System with Detectron2: End-to-End Pipeline
1. Data Collection
Collect images and videos via camera traps, drones, or forest sensors

Capture diverse conditions: day/night, different angles, occlusion, weather

Ensure images cover all target animal species

2. Data Annotation
Label images with bounding boxes or segmentation masks for each species

Use annotation tools like LabelMe or CVAT

Convert annotations to COCO format compatible with Detectron2

3. Dataset Preparation
Split data into training, validation, and test sets

Register datasets in Detectron2

4. Model Selection and Training
Choose a suitable Detectron2 model (e.g., Faster R-CNN or Mask R-CNN)

Initialize model with pretrained weights on COCO

Fine-tune on wildlife dataset

Augment data to simulate occlusion, lighting variations

Use techniques like mosaic augmentation, random cropping

5. Addressing Challenges
Occlusion: Train with partially occluded animals, use segmentation models to detect partial shapes

Nighttime Detection: Augment training with infrared/low-light images; consider adding thermal images if available

Incorporate image enhancement pipelines before detection if needed

6. Model Evaluation
Use mAP, precision-recall, and confusion matrices

Test specifically on occluded and nighttime subsets

7. Model Deployment
Export trained weights

Deploy on edge devices or cloud servers connected to cameras

Use Detectron2's inference APIs for real-time detection

Implement tracking modules (e.g., SORT/Deep SORT) to maintain identities over frames

8. Monitoring and Maintenance
Continuously collect new data

Periodically retrain to improve accuracy and handle new conditions
from detectron2.data import transforms as T
from detectron2.data import detection_utils as utils

def custom_mapper(dataset_dict):
    dataset_dict = copy.deepcopy(dataset_dict)
    image = utils.read_image(dataset_dict["file_name"], format="BGR")

    aug = T.AugmentationList([
        T.RandomCrop("relative", (0.8, 0.8)),  # crops to simulate occlusion
        T.RandomBrightness(0.8, 1.2),            # lighting variations
        T.RandomFlip(horizontal=True),
    ])
    image, transforms = T.apply_augmentations(aug, image)
    # apply same transforms to annotations
    annos = [utils.transform_instance_annotations(obj, transforms, image.shape[:2])
             for obj in dataset_dict.pop("annotations")]
    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
    dataset_dict["instances"] = utils.annotations_to_instances(annos, image.shape[:2])
    return dataset_dict

