<a href="https://colab.research.google.com/github/subhashpolisetti/AutoGluon_ML_End-to-End_Implementations_Part-2/blob/main/8_Image_Object_Detection_with_AutoGluon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Image Object Detection with AutoGluon**

This notebook demonstrates how to perform **object detection** using **AutoGluon’s MultiModalPredictor**. It provides an end-to-end guide for loading data, training models, evaluating performance, generating predictions, and visualizing results.

---

## **Key Objectives**
1. Train an object detection model on the **Tiny Motorbike Dataset**.
2. Evaluate the model using standard metrics.
3. Generate object detection predictions for images.
4. Visualize detection results with confidence filtering.

---

## **Steps Covered**

### **1. Installation and Setup**
- Install necessary libraries including `autogluon.multimodal`, `torch`, and `mmdetection` for object detection.

### **2. Dataset Preparation**
- **Tiny Motorbike Dataset**:
  - Download and extract the dataset in COCO format.
  - Prepare training and testing datasets.

### **3. Model Training**
- Configure the **AutoGluon MultiModalPredictor** for object detection.
- Train the model on the training dataset using medium-quality presets.

### **4. Model Evaluation**
- Evaluate the trained model on the test dataset using object detection metrics.
- Measure the time taken for training and evaluation.

### **5. Generate Predictions**
- Predict bounding boxes for objects in the test dataset.
- Save the prediction results for further analysis.

### **6. Visualization**
- Use AutoGluon’s **ObjectDetectionVisualizer** to visualize detection results.
- Filter detections based on a confidence threshold for better clarity.

### **7. Inference on New Images**
- Perform object detection on custom input images using the trained model.
- Display predictions with bounding boxes and confidence scores.

---

## **Key Features**
- Demonstrates object detection using AutoGluon’s multimodal capabilities.
- Utilizes COCO-format datasets for compatibility with popular object detection frameworks.
- Provides options for model saving, loading, and reuse.
- Includes visualization tools for analyzing detection results.

---

## **Example Applications**
- **Surveillance Systems**: Detect specific objects in security footage.
- **Retail Automation**: Identify products or inventory in store images.
- **Self-Driving Cars**: Detect vehicles, pedestrians, and obstacles in real-time.

---

This notebook is an excellent starting point for learning and implementing object detection tasks using **AutoGluon**.


In [None]:
!pip install autogluon.multimodal

Collecting autogluon.multimodal
  Downloading autogluon.multimodal-1.1.1-py3-none-any.whl.metadata (12 kB)
Collecting scipy<1.13,>=1.5.4 (from autogluon.multimodal)
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Collecting scikit-learn<1.4.1,>=1.3.0 (from autogluon.multimodal)
  Downloading scikit_learn-1.4.0-1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting boto3<2,>=1.10 (from autogluon.multimodal)
  Downloading boto3-1.35.26-py3-none-any.whl.metadata (6.6 kB)
Collecting torch<2.4,>=2.2 (from autogluon.multimodal)
  Downloading torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting lightning<2.4,>=2.2 (from autogluon.multimodal)
  Downloading lightning-2.3.3-py3-none-any.whl.metadata (35 kB)
Collecting transformers<4.41.0,>=4.38.0 (from transformers[sent

In [None]:
!pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.0.0+cu118
  Downloading https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2267.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 GB[0m [31m754.6 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.15.1+cu118
  Downloading https://download.pytorch.org/whl/cu118/torchvision-0.15.1%2Bcu118-cp310-cp310-linux_x86_64.whl (6.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.1/6.1 MB[0m [31m81.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio==2.0.0+cu118
  Downloading https://download.pytorch.org/whl/cu118/torchaudio-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (4.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
Collecting triton==2.0.0 (from torch==2.0.0+cu118)
  Downloading https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp

In [None]:
!mim install "mmcv==2.1.0"
!pip install "mmdet==3.2.0"

Looking in links: https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/index.html
Collecting mmcv==2.1.0
  Downloading https://download.openmmlab.com/mmcv/dist/cu118/torch2.0.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl (98.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.6/98.6 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting addict (from mmcv==2.1.0)
  Downloading addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting mmengine>=0.3.0 (from mmcv==2.1.0)
  Downloading mmengine-0.10.5-py3-none-any.whl.metadata (20 kB)
Collecting yapf (from mmcv==2.1.0)
  Downloading yapf-0.40.2-py3-none-any.whl.metadata (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.4/45.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Downloading mmengine-0.10.5-py3-none-any.whl (452 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m452.3/452.3 kB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading addict-2.4.0-py3-non

In [None]:
from autogluon.multimodal import MultiModalPredictor

  check_for_updates()


In [None]:
import os
import time

from autogluon.core.utils.loaders import load_zip

In [None]:
zip_file = "https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip"
download_dir = "./tiny_motorbike_coco"

load_zip.unzip(zip_file, unzip_dir=download_dir)
data_dir = os.path.join(download_dir, "tiny_motorbike")
train_path = os.path.join(data_dir, "Annotations", "trainval_cocoformat.json")
test_path = os.path.join(data_dir, "Annotations", "test_cocoformat.json")

Downloading ./tiny_motorbike_coco/file.zip from https://automl-mm-bench.s3.amazonaws.com/object_detection_dataset/tiny_motorbike_coco.zip...


100%|██████████| 21.8M/21.8M [00:00<00:00, 37.9MiB/s]


In [None]:
presets = "medium_quality"

In [None]:
# Init predictor
import uuid

model_path = f"./tmp/{uuid.uuid4().hex}-quick_start_tutorial_temp_save"

predictor = MultiModalPredictor(
    problem_type="object_detection",
    sample_data_path=train_path,
    presets=presets,
    path=model_path,
)

In [None]:
start = time.time()
predictor.fit(train_path)  # Fit
train_end = time.time()

AutoGluon Version:  1.1.1
Python Version:     3.10.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Pytorch Version:    2.0.0+cu118
CUDA Version:       CUDA is not available
Memory Avail:       6.55 GB / 12.67 GB (51.7%)
Disk Space Avail:   56.83 GB / 107.72 GB (52.8%)
A new predictor save path is created. This is to prevent you to overwrite previous predictor saved here. You could check current save path at predictor._save_path. If you still want to use this path, set resume=True
No path specified. Models will be saved in: "AutogluonModels/ag-20240925_235921"
Using default root folder: ./tiny_motorbike_coco/tiny_motorbike/Annotations/... Specify `root=...` if you feel it is wrong...

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /c

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


INFO: GPU available: False, used: False
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
INFO: 
  | Name              | Type                             | Params | Mode 
-------------------------------------------------------------------------------
0 | model             | MMDetAutoModelForObjectDetection | 54.2 M | train
1 | validation_metric | MeanAveragePrecision             | 0      | train
-------------------------------------------------------------------------------
54.2 M    Trainable params
0         Non-trainable params
54.2 M    Total params
216.620   Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 2, global step 15: 'val_map' reached 0.44171 (best 0.44171), saving model to '/content/AutogluonModels/ag-20240925_235921/epoch=2-step=15.ckpt' as top 1


In [None]:
print("This finetuning takes %.2f seconds." % (train_end - start))

In [None]:
predictor.evaluate(test_path)
eval_end = time.time()

In [None]:
print("The evaluation takes %.2f seconds." % (eval_end - train_end))

In [None]:
new_predictor = MultiModalPredictor.load(model_path)

In [None]:
new_predictor.evaluate(test_path)

In [None]:
pred = predictor.predict(test_path)
print(pred)

In [None]:
pred = predictor.predict(test_path, save_results=True)

In [None]:
!pip install opencv-python

In [None]:
from autogluon.multimodal.utils import ObjectDetectionVisualizer

conf_threshold = 0.4  # Specify a confidence threshold to filter out unwanted boxes
image_result = pred.iloc[30]

img_path = image_result.image  # Select an image to visualize

visualizer = ObjectDetectionVisualizer(img_path)  # Initialize the Visualizer
out = visualizer.draw_instance_predictions(image_result, conf_threshold=conf_threshold)  # Draw detections
visualized = out.get_image()  # Get the visualized image

from PIL import Image
from IPython.display import display
img = Image.fromarray(visualized, 'RGB')
display(img)

In [None]:
from autogluon.multimodal import download
image_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/detection/street_small.jpg"
test_image = download(image_url)

In [None]:
import json

# create a input file for demo
data = {"images": [{"id": 0, "width": -1, "height": -1, "file_name": test_image}], "categories": []}
os.mkdir("input_data_for_demo")
input_file = "input_data_for_demo/demo_annotation.json"
with open(input_file, "w+") as f:
    json.dump(data, f)

pred_test_image = predictor.predict(input_file)
print(pred_test_image)

In [None]:
pred_test_image = predictor.predict([test_image])
print(pred_test_image)