1. What is Detectron2 and how does it differ from previous object detection
frameworks?

ans :

## What is Detectron2?

Detectron2 is an open-source computer vision framework developed by Facebook AI Research (FAIR). It provides state-of-the-art implementations of modern vision models for various tasks, including:

- **Object Detection**: Faster R-CNN, RetinaNet  
- **Instance Segmentation**: Mask R-CNN  
- **Semantic Segmentation**: DeepLab, Panoptic FPN  
- **Keypoint Detection**: Human pose estimation  

Due to its clean design, strong performance, and flexibility, Detectron2 is widely used in research, industry applications, and academic projects.

---

## How Detectron2 Differs from Previous Object Detection Frameworks

### 1. Built on PyTorch (Dynamic Graphs)
Detectron2 is built using PyTorch, unlike Detectron (v1), which was based on Caffe2. PyTorch’s dynamic computation graphs make debugging easier, allow flexible model customization, and enable faster research experimentation.

### 2. Highly Modular and Extensible Design
Detectron2 follows a modular design where components such as backbones, heads, datasets, loss functions, and training pipelines are independent. This allows researchers to easily swap architectures, add new datasets, and implement custom losses or heads, which was difficult in earlier frameworks.

### 3. Cleaner and More Readable Codebase
The framework provides well-structured APIs, better documentation, and consistent coding practices. Older frameworks such as Darknet and early TensorFlow pipelines were harder to maintain and extend.

### 4. State-of-the-Art Performance
Detectron2 includes optimized implementations of modern architectures and supports mixed precision training, multi-GPU training, and distributed learning. This results in faster training and inference compared to many earlier frameworks.

### 5. Unified Support for Multiple Vision Tasks
Detectron2 supports object detection, instance segmentation, semantic segmentation, and panoptic segmentation within a single unified framework, whereas earlier frameworks often focused on only one task.

### 6. Better Dataset Handling and Evaluation
The framework provides built-in support for popular datasets such as COCO, Pascal VOC, and LVIS. It also offers easy dataset registration and standardized evaluation metrics.


---

---

2.  Explain the process and importance of data annotation when working with
Detectron2.

ans :    

## Process and Importance of Data Annotation in Detectron2 (5 Marks)

### Process of Data Annotation in Detectron2

Data annotation is the process of labeling images with ground-truth information required to train detection and segmentation models. When working with Detectron2, the annotation process generally involves the following steps:

**1. Data Collection**  
Raw images relevant to the target task such as object detection, instance segmentation, or keypoint detection are collected.

**2. Annotation**  
Annotation tools such as LabelImg, CVAT, or LabelMe are used to label objects with bounding boxes, segmentation masks, or keypoints.

**3. Dataset Formatting**  
The annotations are converted into formats supported by Detectron2, most commonly the COCO JSON format, which contains image metadata, category labels, bounding box coordinates, and segmentation masks.

**4. Dataset Registration**  
The annotated dataset is registered in Detectron2 using dataset registration APIs, allowing the framework to access the images and annotations during training and evaluation.

**5. Validation and Testing**  
The dataset is split into training, validation, and testing sets to ensure accurate performance evaluation.

---

### Importance of Data Annotation in Detectron2

Data annotation plays a critical role in the performance of Detectron2 models:

- Accurate annotations provide reliable ground truth, enabling the model to learn correct object features.
- High-quality labels improve detection accuracy, segmentation quality, and model generalization.
- Poor or inconsistent annotations can lead to incorrect predictions and degraded model performance.
- Proper annotation is essential for fair evaluation using standard metrics such as mean Average Precision (mAP) and Intersection over Union (IoU).
- Task-specific annotations (bounding boxes, masks, or keypoints) allow Detectron2 to support multiple computer vision tasks effectively.


---
---

3.  Describe the steps involved in training a custom object detection model
using Detectron2.

ans :    
## Steps Involved in Training a Custom Object Detection Model Using Detectron2

Training a custom object detection model using Detectron2 involves a sequence of well-defined steps, from data preparation to model evaluation.

### 1. Dataset Preparation
Collect images relevant to the detection task and annotate them using tools such as LabelImg or CVAT. The annotations are converted into a format supported by Detectron2, most commonly the COCO JSON format, which contains image details, category labels, and bounding box information.

### 2. Dataset Registration
Register the custom dataset in Detectron2 using dataset registration APIs. This step allows Detectron2 to load the images and annotations during training and evaluation.

### 3. Environment Setup
Install Detectron2 and its dependencies in the working environment (such as Google Colab). Import the required libraries and verify GPU availability for faster training.

### 4. Model Configuration
Select a pre-trained object detection model from the Detectron2 Model Zoo. Modify the configuration file to match the custom dataset, including the number of classes, dataset names, batch size, learning rate, and number of training iterations.

### 5. Training the Model
Initialize the trainer using the configured settings and start the training process. Detectron2 fine-tunes the pre-trained model on the custom dataset, updating model weights to learn task-specific features.

### 6. Model Evaluation
Evaluate the trained model using validation or test datasets. Detectron2 provides built-in evaluators that compute standard metrics such as mean Average Precision (mAP).

### 7. Inference and Visualization
Use the trained model to perform inference on new images. Visualize the predicted bounding boxes and confidence scores to assess model performance qualitatively.

---

### Conclusion
By following these steps, Detectron2 enables efficient training of custom object detection models with minimal code while maintaining high accuracy and scalability.




---



---



4. What are evaluation curves in Detectron2, and how are metrics like mAP
and IoU interpreted?

ans :    

## Evaluation Curves and Metrics in Detectron2

### Evaluation Curves in Detectron2
Evaluation curves in Detectron2 are graphical representations used to analyze the performance of object detection and segmentation models. These curves help in understanding how well a model predicts objects across different confidence thresholds.

Common evaluation curves include **Precision Recall (PR) curves**, which show the relationship between precision and recall at various confidence levels. Precision measures how many predicted objects are correct, while recall measures how many actual objects are successfully detected. A well-performing model produces a PR curve that remains close to the top-right corner, indicating high precision and high recall.

Detectron2 also logs training and validation losses over iterations, which can be visualized as learning curves to monitor convergence and detect overfitting or underfitting.

---

### Interpretation of IoU (Intersection over Union)
Intersection over Union (IoU) is a metric used to measure the overlap between a predicted bounding box and the ground-truth bounding box. It is calculated as the area of overlap divided by the area of union of the two boxes.

- IoU values range from 0 to 1.
- A higher IoU indicates better localization accuracy.
- Predictions are considered correct only if the IoU exceeds a predefined threshold (e.g., 0.5 or 0.75).

---

### Interpretation of mAP (mean Average Precision)
Mean Average Precision (mAP) is a standard metric used to evaluate object detection performance in Detectron2. It is computed by averaging the **Average Precision (AP)** across all object classes and IoU thresholds.

- AP is calculated as the area under the Precision Recall curve for a class.
- mAP summarizes detection accuracy, localization quality, and classification performance.
- Higher mAP values indicate better overall model performance.

---

### Summary
Evaluation curves and metrics such as IoU and mAP are essential for analyzing the accuracy, robustness, and generalization ability of Detectron2 models. They provide both quantitative and visual insights into detection and segmentation performance.




---



---

5. Compare Detectron2 and TFOD2 in terms of features, performance, and
ease of use.

ans  :    

## Comparison of Detectron2 and TensorFlow Object Detection API (TFOD2)

Detectron2 and TensorFlow Object Detection API (TFOD2) are two widely used frameworks for building object detection and segmentation models. They differ in terms of features, performance, and ease of use.

---

### 1. Features

**Detectron2**
- Built on **PyTorch** with dynamic computation graphs.
- Supports multiple vision tasks including **object detection, instance segmentation, semantic segmentation, panoptic segmentation, and keypoint detection**.
- Provides state-of-the-art models such as **Faster R-CNN, Mask R-CNN, RetinaNet, and Panoptic FPN**.
- Highly modular architecture allowing easy customization of backbones, heads, and losses.

**TFOD2**
- Built on **TensorFlow 2** and Keras.
- Primarily focused on **object detection** tasks.
- Supports popular models such as **SSD, Faster R-CNN, EfficientDet, and CenterNet**.
- Integrates well with TensorFlow ecosystem tools like TensorBoard and TF Serving.

---

### 2. Performance

**Detectron2**
- Optimized for high performance and research-grade experiments.
- Provides faster training and inference for complex models, especially segmentation tasks.
- Strong benchmark results on datasets like COCO due to optimized implementations.

**TFOD2**
- Provides stable and scalable performance, especially for production deployment.
- Efficient models such as SSD and EfficientDet perform well on resource-constrained devices.
- Performance is highly dependent on TensorFlow graph optimizations and hardware accelerators.

---

### 3. Ease of Use

**Detectron2**
- Easier to debug and experiment due to PyTorch’s dynamic graph support.
- Clean and well-documented codebase, but requires familiarity with PyTorch.
- Dataset registration and customization are straightforward for research use.

**TFOD2**
- More complex initial setup and configuration process.
- Requires working with configuration files and TensorFlow pipelines.
- Better suited for large-scale production environments once configured.

---

### Summary Table

| Aspect | Detectron2 | TFOD2 |
|------|-----------|-------|
| Framework | PyTorch | TensorFlow 2 |
| Supported Tasks | Detection, Segmentation, Keypoints | Mainly Detection |
| Customization | High | Moderate |
| Performance | High (research-focused) | Stable (production-focused) |
| Ease of Use | Easier for research | Steeper learning curve |
| Production Deployment | Moderate | Strong |

---

### Conclusion
Detectron2 is preferred for research and experimentation due to its flexibility, modularity, and strong performance in advanced vision tasks. TFOD2 is more suitable for production environments that require tight integration with the TensorFlow ecosystem and scalable deployment solutions.





---



---

6. Write Python code to install Detectron2 and verify the installation.



In [None]:
!pip install -U pip setuptools wheel
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip install opencv-python pycocotools


In [None]:
!pip install detectron2 -f \
https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch2.0/index.html


In [None]:
import detectron2
print("Detectron2 imported successfully")

from detectron2.utils.logger import setup_logger
setup_logger()

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

print("Detectron2 is fully working (CPU mode)")


7. Annotate a dataset using any tool of your choice and convert the
annotations to COCO format for Detectron2.


In [None]:
# Install detectron2
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

# Install other requirements
!pip install pyyaml==5.1
import torch, detectron2
!nvcc --version
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)

In [None]:
import os
import json
import cv2
import random
from google.colab.patches import cv2_imshow

from detectron2.data.datasets import register_coco_instances
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.utils.visualizer import Visualizer
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor

# --- STEP 1: Download Sample Data (Skip if using your own) ---
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
!unzip -q balloon_dataset.zip

# --- STEP 2: Register the Dataset ---
# Format: register_coco_instances(name, metadata, json_file, image_root)
# NOTE: If your JSON is from CVAT/Roboflow, point to those specific paths.
register_coco_instances("my_dataset_train", {}, "balloon/train/via_region_data.json", "balloon/train/")
register_coco_instances("my_dataset_val", {}, "balloon/val/via_region_data.json", "balloon/val/")

# --- STEP 3: Verify Registration ---
dataset_dicts = DatasetCatalog.get("my_dataset_train")
balloon_metadata = MetadataCatalog.get("my_dataset_train")

# Visualize 3 random samples to ensure annotations are correct
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    cv2_imshow(out.get_image()[:, :, ::-1])

In [None]:
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
# Load a model config (Instance Segmentation in this case)
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 300    # 300 iterations is enough for a tiny dataset; adjust for yours
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # ONLY 1 class (balloon). Change this to match your dataset!

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

In [None]:
# Inference should use the config with newly trained weights
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

from detectron2.utils.visualizer import ColorMode

dataset_dicts = DatasetCatalog.get("my_dataset_val")
for d in random.sample(dataset_dicts, 3):
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1],
                   metadata=balloon_metadata,
                   scale=0.5,
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. Useful to confirm segmentation
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(out.get_image()[:, :, ::-1])

8. Write a script to download pretrained weights and configure paths for
training in Detectron2.


In [None]:
# setup_detectron2_training.py

import os
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer
from detectron2 import model_zoo
from detectron2.data.datasets import register_coco_instances

def main():
    # ---------- 1. Configure dataset paths & register ----------
    DATASET_ROOT = "/path/to/your/dataset"          # <- change this
    TRAIN_JSON   = os.path.join(DATASET_ROOT, "annotations/train.json")
    VAL_JSON     = os.path.join(DATASET_ROOT, "annotations/val.json")
    TRAIN_IMG    = os.path.join(DATASET_ROOT, "train")
    VAL_IMG      = os.path.join(DATASET_ROOT, "val")

    register_coco_instances("my_dataset_train", {}, TRAIN_JSON, TRAIN_IMG)
    register_coco_instances("my_dataset_val",   {}, VAL_JSON,   VAL_IMG)

    # ---------- 2. Build config and download pretrained weights ----------
    cfg = get_cfg()

    # Base config from Detectron2 model zoo
    config_file = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
    cfg.merge_from_file(model_zoo.get_config_file(config_file))

    # This URL points to pretrained COCO weights and will be auto-downloaded
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(config_file)

    # ---------- 3. Set training-related paths and options ----------
    cfg.DATASETS.TRAIN = ("my_dataset_train",)
    cfg.DATASETS.TEST  = ("my_dataset_val",)

    cfg.DATALOADER.NUM_WORKERS = 4
    cfg.SOLVER.IMS_PER_BATCH   = 2
    cfg.SOLVER.BASE_LR         = 0.00025
    cfg.SOLVER.MAX_ITER        = 10000

    # Number of classes in your dataset (exclude background)
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

    # Output directory for logs and checkpoints
    cfg.OUTPUT_DIR = "./output/mask_rcnn_training"
    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

    # ---------- 4. Start training ----------
    trainer = DefaultTrainer(cfg)
    trainer.resume_or_load(resume=False)
    trainer.train()

if __name__ == "__main__":
    main()

9. Show the steps and code to run inference using a trained Detectron2
model on a new image.

In [None]:
import os
import cv2
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import MetadataCatalog


def setup_cfg(trained_output_dir, score_thresh=0.5, device="cuda"):
    """
    Load config and trained weights from an output directory.
    """
    cfg = get_cfg()

    # Load the config you used for training
    config_path = os.path.join(trained_output_dir, "config.yaml")
    cfg.merge_from_file(config_path)

    # Use the trained weights
    cfg.MODEL.WEIGHTS = os.path.join(trained_output_dir, "model_final.pth")

    # Set score threshold for this model
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = score_thresh

    # Choose device: "cuda" or "cpu"
    cfg.MODEL.DEVICE = device

    return cfg


def run_inference(
    trained_output_dir,
    input_image_path,
    output_image_path=None,
    metadata_dataset_name="my_dataset_train",
):
    # 1. Build config and predictor
    cfg = setup_cfg(trained_output_dir)
    predictor = DefaultPredictor(cfg)

    # 2. Read image
    image = cv2.imread(input_image_path)
    if image is None:
        raise FileNotFoundError(f"Could not read image: {input_image_path}")

    # 3. Run prediction
    outputs = predictor(image)
    # outputs["instances"] has fields: pred_boxes, scores, pred_classes, (and masks for instance seg)

    # 4. Visualization
    metadata = MetadataCatalog.get(metadata_dataset_name)
    v = Visualizer(
        image[:, :, ::-1],  # convert BGR to RGB for Vis
        metadata=metadata,
        scale=1.0,
        instance_mode=ColorMode.IMAGE_BW,  # background grayscale
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))

    result_image = out.get_image()[:, :, ::-1]  # back to BGR for cv2

    # 5. Save or display
    if output_image_path is not None:
        os.makedirs(os.path.dirname(output_image_path), exist_ok=True)
        cv2.imwrite(output_image_path, result_image)
        print(f"Saved result to {output_image_path}")
    else:
        # Show on screen (press a key to close)
        cv2.imshow("Result", result_image)
        cv2.waitKey(0)
        cv2.destroyAllWindows()


if __name__ == "__main__":
    TRAINED_OUTPUT_DIR = "./output/mask_rcnn_training"  # where config.yaml & model_final.pth are
    INPUT_IMAGE = "path/to/your/image.jpg"
    OUTPUT_IMAGE = "output/inference_result.jpg"  # or None to just display

    run_inference(
        trained_output_dir=TRAINED_OUTPUT_DIR,
        input_image_path=INPUT_IMAGE,
        output_image_path=OUTPUT_IMAGE,
        metadata_dataset_name="my_dataset_train",  # same as training dataset name
    )

10. You are assigned to build a wildlife monitoring system to detect and track
different animal species in a forest using Detectron2. Describe the end-to-end pipeline
from data collection to deploying the model, and how you would handle challenges like
occlusion or nighttime detection.


1. **Data collection**  
   - Install camera traps in the forest; record videos/images in different seasons, locations, and times (day + night, RGB + IR/thermal).

2. **Annotation & dataset**  
   - Extract frames, annotate animals with bounding boxes and species labels using a tool like CVAT.  
   - Export in COCO format and split into train/val/test.

3. **Setup in Google Colab (Detectron2)**  
   - Install Detectron2, mount Google Drive, and point to your `images/` and `annotations/` folders.  
   - Register datasets with `register_coco_instances("wildlife_train", ...)`.

4. **Model training with Detectron2**  
   - Start from a COCO‑pretrained model (e.g., Mask R‑CNN R50‑FPN).  
   - Set `NUM_CLASSES` to the number of species, adjust learning rate/iterations, and train with `DefaultTrainer`.  
   - Use basic augmentations (flip, scale, brightness) to improve robustness.

5. **Inference + tracking**  
   - Use `DefaultPredictor` to run the trained model on each frame and get species + bounding boxes.  
   - Feed detections into a multi‑object tracker (e.g., DeepSORT/ByteTrack) to assign track IDs and follow each animal across frames.

6. **Deployment**  
   - Edge: run the model + tracker on a Jetson or similar device near the camera and send only detection summaries.  
   - Server: stream video to a central machine that runs inference, tracking, and stores results (counts, tracks, timestamps).

7. **Handling occlusion**  
   - Include partially visible animals in training data; use augmentations like random erasing/CutOut.  
   - Use a tracker that tolerates a few missed frames so animals aren’t “lost” when briefly behind trees or other animals.

8. **Handling nighttime detection**  
   - Use IR or thermal cameras; include nighttime images in training.  
   - Optionally train separate day and night models, or one mixed model with low‑light augmentations (brightness/contrast changes, noise).  
   - Simple preprocessing (e.g., contrast enhancement) can help for very dark images.