1. What is Detectron2 and how does it differ from previous object detection
frameworks?
   - Detectron2 is a modern, open source object detection and computer vision framework developed by Facebook AI Research that makes it easier and faster to build high performance models for tasks like object detection, instance segmentation, keypoint detection, and panoptic segmentation. What sets Detectron2 apart from earlier object detection frameworks is its clean, modular design built on PyTorch, which allows researchers and developers to customize models, experiment with new ideas, and debug more easily. Unlike older frameworks that were often rigid, harder to extend, or based on static computation graphs, Detectron2 supports dynamic graphs, provides state of the art pre trained models, offers better training speed and scalability, and integrates smoothly with modern deep learning workflows, making it more flexible, user friendly, and suitable for both research and production use

2. Explain the process and importance of data annotation when working with
Detectron2.
    - Data annotation is the process of labeling raw images with meaningful information such as bounding boxes, class labels, segmentation masks, or keypoints so that Detectron2 models can learn to recognize and localize objects correctly. When working with Detectron2, annotated data is usually prepared in standard formats like COCO, where each object in an image is precisely marked and described, allowing the framework to understand what to learn and how to evaluate predictions. This process is extremely important because the quality of annotations directly affects model performance accurate, consistent labels help the model learn correct patterns, while poor or inconsistent annotations can lead to incorrect detections and weak generalization. In short, good data annotation acts as the foundation for training reliable and high performing Detectron2 models, making it just as critical as choosing the right algorithm or architecture.

3. Describe the steps involved in training a custom object detection model
using Detectron2.
   - Training a custom object detection model using Detectron2 involves a clear sequence of steps that turn raw images into a working, accurate model. First, the dataset is collected and carefully annotated with labels such as bounding boxes or masks, usually in COCO format, so Detectron2 can read it correctly. Next, the dataset is registered within Detectron2, where class names and paths to images and annotations are defined. After that, a suitable pre trained model and configuration file are selected to benefit from transfer learning, which helps the model learn faster and perform better with limited data. The configuration is then customized by setting parameters like the number of classes, learning rate, batch size, and training iterations. Once everything is set, the model is trained using Detectron2’s training engine, during which it learns object features from the annotated data. Finally, the trained model is evaluated on validation data and fine tuned if needed, ensuring it performs well before being used for real world object detection tasks.

4. What are evaluation curves in Detectron2, and how are metrics like mAP
and IoU interpreted?
   - Evaluation curves in Detectron2 are visual and numerical tools used to measure how well an object detection model is performing during and after training. These curves and metrics help compare predicted results with ground truth annotations to understand the model’s accuracy and reliability. One of the most important metrics is Intersection over Union , which measures how much the predicted bounding box overlaps with the actual object box, with higher IoU values indicating better localization accuracy. Mean Average Precision builds on IoU by combining precision and recall across different confidence thresholds and object classes, giving a single score that summarizes overall detection performance. In simple terms, higher mAP values mean the model is detecting objects more accurately and consistently, while evaluation curves allow developers to track improvements, spot overfitting, and make informed decisions about model tuning.

5. Compare Detectron2 and TFOD2 in terms of features, performance, and
ease of use.
   - Detectron2 and TensorFlow Object Detection API (TFOD2) are both powerful frameworks for object detection, but they differ in design philosophy, performance, and ease of use. Detectron2, built on PyTorch, is highly modular and flexible, making it ideal for research and rapid experimentation with advanced models like Faster RCNN, Mask RCNN, and panoptic segmentation, and it generally offers strong performance with clean, readable code and easier debugging due to PyTorch’s dynamic computation graph. TFOD2, built on TensorFlow, provides a wide range of pre trained models and strong deployment support through TensorFlow Serving, TensorFlow Lite, and TensorFlow.js, which makes it more suitable for production and mobile or edge applications. In terms of ease of use, Detectron2 is often considered simpler and more intuitive for developers familiar with PyTorch, while TFOD2 can feel more complex because of its configuration heavy setup but offers better integration with TensorFlow’s deployment ecosystem, so the choice depends on whether the focus is research flexibility or large scale production deployment.

In [None]:
!pip install -U torch torchvision torchaudio
!pip install -U 'git+https://github.com/facebookresearch/detectron2.git'


Collecting torch
  Downloading torch-2.10.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (31 kB)
Collecting torchvision
  Downloading torchvision-0.25.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.4 kB)
Collecting torchaudio
  Downloading torchaudio-2.10.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.9 kB)
Collecting cuda-bindings==12.9.4 (from torch)
  Downloading cuda_bindings-12.9.4-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (2.6 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 

In [None]:
#  6 Write Python code to install Detectron2 and verify the installation.
# (Include your Python code and output in the code box below.)

# !pip install detectron2

import detectron2
from detectron2.utils.logger import setup_logger

setup_logger()
print("Detectron2 version:", detectron2.__version__)
print("Detectron2 installed successfully!")


Detectron2 version: 0.6
Detectron2 installed successfully!


In [22]:
# 7. Annotate a dataset using any tool of your choice and convert the
# annotations to COCO format for Detectron2.
# (Include your Python code and output in the code box below.)


import os
import json
import xml.etree.ElementTree as ET

image_dir = "sample_data"
annotation_dir = "sample_data"

coco = {
    "images": [],
    "annotations": [],
    "categories": [{"id": 1, "name": "person"}]
}

annotation_id = 1
image_id = 1

for file in os.listdir(annotation_dir):
    if not file.endswith(".xml"):
        continue

    tree = ET.parse(os.path.join(annotation_dir, file))
    root = tree.getroot()

    filename = root.find("filename").text
    width = int(root.find("size/width").text)
    height = int(root.find("size/height").text)

    coco["images"].append({
        "id": image_id,
        "file_name": filename,
        "width": width,
        "height": height
    })

    for obj in root.findall("object"):
        bbox = obj.find("bndbox")
        xmin = int(bbox.find("xmin").text)
        ymin = int(bbox.find("ymin").text)
        xmax = int(bbox.find("xmax").text)
        ymax = int(bbox.find("ymax").text)

        coco["annotations"].append({
            "id": annotation_id,
            "image_id": image_id,
            "category_id": 1,
            "bbox": [xmin, ymin, xmax - xmin, ymax - ymin],
            "area": (xmax - xmin) * (ymax - ymin),
            "iscrowd": 0
        })

        annotation_id += 1

    image_id += 1

with open("/content/annotations_coco.json", "w") as f:
    json.dump(coco, f, indent=4)

print("COCO annotation file created successfully")




COCO annotation file created successfully


In [None]:
# 8 Write a script to download pretrained weights and configure paths for
# training in Detectron2.
# (Include your Python code and output in the code box below.)

from detectron2.config import get_cfg
from detectron2 import model_zoo
import os

cfg = get_cfg()
cfg.merge_from_file(
    model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
)

cfg.DATASETS.TRAIN = ("custom_train",)
cfg.DATASETS.TEST = ("custom_val",)

cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
    "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
)

cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 1000

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

cfg.OUTPUT_DIR = "./output"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

print("Pretrained weights path:")
print(cfg.MODEL.WEIGHTS)
print("Output directory configured at:", cfg.OUTPUT_DIR)


Pretrained weights path:
https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl
Output directory configured at: ./output


In [24]:
# 9. Show the steps and code to run inference using a trained Detectron2
# model on a new image.
# (Include your Python code and output in the code box below.)

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2 import model_zoo
import cv2
from google.colab import files

uploaded = files.upload()
image_path = list(uploaded.keys())[0]

cfg = get_cfg()
cfg.merge_from_file(
    model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
)

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
    "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
)
cfg.MODEL.SCORE_THRESH_TEST = 0.5
cfg.MODEL.DEVICE = "cpu"

predictor = DefaultPredictor(cfg)

image = cv2.imread(image_path)
outputs = predictor(image)

print("Detected instances:", len(outputs["instances"]))
print("Predicted boxes:\n", outputs["instances"].pred_boxes)


Saving d.jpg to d (1).jpg
[01/23 13:57:12 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl ...


model_final_280758.pkl: 167MB [00:01, 111MB/s]                           
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
W0123 13:57:25.939000 670 torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile.


Detected instances: 100
Predicted boxes:
 Boxes(tensor([[1.8256e+02, 3.8094e+01, 1.9801e+02, 8.7197e+01],
        [1.4638e+02, 7.5613e+01, 1.6041e+02, 1.1970e+02],
        [0.0000e+00, 1.0489e+02, 8.7567e+01, 1.7500e+02],
        [1.6307e+02, 3.7212e+01, 1.7951e+02, 8.4322e+01],
        [3.0131e+01, 1.0021e+02, 7.3456e+01, 1.7493e+02],
        [0.0000e+00, 4.7488e-02, 2.2956e+01, 5.9849e+01],
        [5.2355e+01, 4.0150e+01, 7.1891e+01, 9.3929e+01],
        [6.0921e+01, 3.9693e+01, 7.9495e+01, 9.5589e+01],
        [1.6729e+02, 3.3939e+01, 1.9000e+02, 7.8109e+01],
        [3.2971e+01, 1.1344e+02, 5.7306e+01, 1.6758e+02],
        [0.0000e+00, 7.8529e+01, 2.2337e+01, 1.7500e+02],
        [1.5546e+02, 6.8899e+01, 1.7329e+02, 1.0915e+02],
        [4.6342e+01, 7.1582e+01, 1.1626e+02, 8.9090e+01],
        [1.4843e+02, 4.6800e+01, 1.6796e+02, 9.7646e+01],
        [1.7631e+02, 3.8655e+01, 1.9321e+02, 8.0230e+01],
        [4.0936e+01, 6.5768e+01, 1.3047e+02, 7.8361e+01],
        [1.3427e+02, 7.8

10. You are assigned to build a wildlife monitoring system to detect and track
different animal species in a forest using Detectron2. Describe the end-to-end pipeline
from data collection to deploying the model, and how you would handle challenges like
occlusion or nighttime detection.
(Include your Python code and output in the code box below.)


    - Building a wildlife monitoring system with Detectron2 involves an end-to-end pipeline that connects real world forest data to a deployed, intelligent detection system. The process starts with data collection using camera traps, drones, or fixed surveillance cameras placed across different forest locations to capture animals under varied conditions such as daylight, nighttime, rain, and dense vegetation. These images and videos are then carefully annotated using tools like CVAT or LabelImg, labeling animal species with bounding boxes or segmentation masks and converting them into COCO format for Detectron2. Next, the dataset is registered and a suitable pre-trained model is fine tuned using Detectron2, allowing the model to learn species specific features while benefiting from transfer learning. During training, challenges like occlusion are handled by using diverse data with partial visibility, data augmentation, and instance segmentation models that better separate overlapping animals, while nighttime detection is improved by including infrared images, low light samples, and brightness/contrast augmentation. After training and evaluation using metrics like MAP and IOU, the optimized model is deployed on edge devices or servers, where it performs real time inference and tracking, logs detections, and supports conservation decisions through dashboards or alerts.

In [None]:
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2 import model_zoo
import cv2

cfg = get_cfg()
cfg.merge_from_file(
    model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
)

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3
cfg.MODEL.WEIGHTS = "./model_final.pth"
cfg.MODEL.SCORE_THRESH_TEST = 0.6
cfg.MODEL.DEVICE = "cpu"

predictor = DefaultPredictor(cfg)

image = cv2.imread("forest_night.jpg")
outputs = predictor(image)

print("Detected animals:", len(outputs["instances"]))
print("Predicted classes:", outputs["instances"].pred_classes.tolist())
