Introduction

This project try to use Detectron2 and Huggingface framework to do object detection using transformer based architecture.

Object Detection with Trasformer using Huggingface

With Huggingface framework, the following transformer based architectures are used

DETR(Detection Transformer).
Conditional DETR
Deformable DETR.
YOLOS.

Dependencies

Install Huggingface by following steps mentioned in link.
pip install pytorch-lightning

Dataset Preparation

Ballon Dataset is converted to COCO format & present inside custom_balloon folder.

Usage

Training:

Currently Huggingface only supports following trasformer based object detection algorithm:

DETR
Conditional DETR
Deformable DETR
YOLOS

Run the below command for training

python3.8 training.py --arch [detr|cond-detr|yolos|def-detr] --path model_output/[detr|cond-detr|yolos|def-detr] --epochs 5000 --profile True.
```
 --path model_output:  Use different folder for different architecture.
```

Profiling

Inference:

Run the below command

python3.8 inference.py --model model_out/detr --arch detr/cond-detr/yolos/def-detr

Evaluation time with different model is as follows:

Evaluation Time for arch: detr is 762.916088104248 ms.
Evaluation Time for arch: yolos is 384.78732109069824 ms.
Evaluation Time for arch: cond-detr is 776.5250205993652 ms.
Evaluation Time for arch: def-detr is 2585.845708847046 ms.

Output:

Original Image:

DETR Output:

Cond-DETR Output:

Deformable DETR Output:

YOLOS Output:

Summary

In the original image, only 7 balloons are present and it was detected correctly with Cond-Detr & Def-Detr.

Detr model able to predict only 6 balloons & misses 1 prediction.Yolos is able to predict only 5 balloons &

misses 2 predictions.However,Yolos is the fastest architecture among all,whereas Def-Detr takes longer time than others.(Note: All the models were trained for 500 epochs).

So there is a clear trade-off between accuracy & speed.Please check the profiling data mentioned above.

Accuracy can be improved by finetuning the hyper parameters or with more training.

But the clear winner in terms of speed is Yolos & in terms of accuracy it's Cond-Detr & Def-Detr.

Object Detection with Trasformer Using Detectron2

Dependencies

Install detectron2. (Prefer to use Conda version).
Install DyHead by following the steps present in DynamicHead.

You may face some building issue related to CUDA in DynamicHead/dyhead/csrc/cuda/{deform_conv_cuda.cu, SigmoidFocalLoss_cuda.cu}. Try to Fix them.Otherwise,let me know what is the error you are facing.

Dataset Preparation

Balloon dataset is converted to COCO format & is present inside custom_balloon folder.

If you want to convert balloon dataset in to coco format & use it in Detectron2.Then, follow the below steps.

-   Download the balloon dataset from https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip.
-   git clone https://github.com/woctezuma/VIA2COCO
-   cd VIA2COCO/
    git checkout fixes
-   run convert_coco.py

Usage

Enviornment Setup

For DyHead:
1. Download weights(for example, dyhead_r50_atss_fpn_1x.pth) from DynamicHead & keep them inside pretrained_model/
2. Copy the config files from & keep them inside configs/
For DETR:
1. git clone https://github.com/facebookresearch/detr/tree/master/d2.
2. Keep the converted_model.pth inside pretrained_model/ Here is the steps to get the converted_model.pth - python converter.py --source_model https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --output_model converted_model.pth
3. Copy the config file https://github.com/facebookresearch/detr/tree/master/d2/configs/detr_256_6_6_torchvision.yaml & keep it inside configs/

Training

For DyHead with FPN backbone:

python3.8 training.py --outdir [where model will be saved] --arch dyhead-fpn --config [file path] --weight [file path] --epochs [no of epochs].

For Example,
    python3.8 training.py --outdir out_dyhead_fpn/ --arch dyhead-fpn --config configs/dyhead_r50_atss_fpn_1x.yaml --weight pretrained_model/dyhead_r50_atss_fpn_1x.pth --epochs 5000

For DyHead with Swin-T transformer backbone:

python3.8 training.py --outdir [where model will be saved] --arch dyhead-swint --config [file path] --weight [file path] --epochs [no of epochs]

For Example,
    python3.8 training.py --outdir out_dyhead_swint/ --arch dyhead-swint --config configs/dyhead_swint_atss_fpn_2x_ms.yaml --weight pretrained_model/dyhead_swint_atss_fpn_2x_ms.pth --epochs 5000.

For DETR:

python3.8 training.py --outdir [where model will be saved] --arch detr --config [file path] --weight [file path] --epochs [no of epochs].

For Example,
    python3.8 training.py --outdir out_test/ --arch detr --config configs/detr_256_6_6_torchvision.yaml --weight pretrained_model/converted_model.pth --epochs 5000

Inference:

For DyHead with FPN backbone:

python3.8 inference.py --outdir out_dyhead_fpn/ --arch dyhead-fpn --config configs/dyhead_r50_atss_fpn_1x.yaml --save True.

Inference Time:
```
 Evaluation Time : {} ms  108.9015007019043
 Evaluation Time : {} ms  103.93381118774414
```
For DyHead with Swin-T transformer backbone: python3.8 inference.py --outdir out_dyhead_swint/ --arch dyhead-swint --config configs/dyhead_swint_atss_fpn_2x_ms.yaml --save True.

Inference Time: Evaluation Time : {} ms 157.5005054473877. Evaluation Time : {} ms 153.02109718322754
For DETR:

python3.8 inference.py --outdir out_detr/ --arch detr --config configs/detr_256_6_6_torchvision.yaml --save True.

Inference Time:
```
  Evaluation Time : {} ms  71.02847099304199
  Evaluation Time : {} ms  92.53978729248047
```

Summary:

As you can see from output, DETR is slighly faster than DyHead.However,DETR is not that accurate as DyHead in predicting all the ballons.

Please check the above output.

We can try other DyHead configs such as dyhead_swint_atss_fpn_2x_ms.yaml and check the output.

Here the idea is to demonstrate how to use trasformer based object detection using Detectron2 framework. Please feel free to share your feedback.

References:

Reach me @

LinkedIn GitHub Medium

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
custom_balloon		custom_balloon
object_detection_with_detectron2		object_detection_with_detectron2
object_detection_with_hugginggface		object_detection_with_hugginggface
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Object Detection with Trasformer using Huggingface

Dependencies

Dataset Preparation

Usage

Training:

Profiling

Inference:

Output:

Original Image:

DETR Output:

Cond-DETR Output:

Deformable DETR Output:

YOLOS Output:

Summary

Object Detection with Trasformer Using Detectron2

Dependencies

Dataset Preparation

Usage

Enviornment Setup

Training

Inference:

Summary:

References:

About

Releases

Packages

Contributors 2

Languages

satya15july/object_detection_with_transformer

Folders and files

Latest commit

History

Repository files navigation

Introduction

Object Detection with Trasformer using Huggingface

Dependencies

Dataset Preparation

Usage

Training:

Profiling

Inference:

Output:

Original Image:

DETR Output:

Cond-DETR Output:

Deformable DETR Output:

YOLOS Output:

Summary

Object Detection with Trasformer Using Detectron2

Dependencies

Dataset Preparation

Usage

Enviornment Setup

Training

Inference:

Summary:

References:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages