Skip to content

Object Detection with Transformers : DETR, Conditional DETR, Deformable DETR, Dynamic Head

Notifications You must be signed in to change notification settings

satya15july/object_detection_with_transformer

Repository files navigation

Introduction

This project try to use Detectron2 and Huggingface framework to do object detection using transformer based architecture.

Object Detection with Trasformer using Huggingface

With Huggingface framework, the following transformer based architectures are used

  • DETR(Detection Transformer).
  • Conditional DETR
  • Deformable DETR.
  • YOLOS.

Dependencies

  • Install Huggingface by following steps mentioned in link.
  • pip install pytorch-lightning

Dataset Preparation

Ballon Dataset is converted to COCO format & present inside custom_balloon folder.

Usage

Training:

Currently Huggingface only supports following trasformer based object detection algorithm:

  • DETR
  • Conditional DETR
  • Deformable DETR
  • YOLOS

Run the below command for training

  • python3.8 training.py --arch [detr|cond-detr|yolos|def-detr] --path model_output/[detr|cond-detr|yolos|def-detr] --epochs 5000 --profile True.

     --path model_output:  Use different folder for different architecture.
    

Profiling

obj-detec-params

Inference:

Run the below command

python3.8 inference.py --model model_out/detr --arch detr/cond-detr/yolos/def-detr

Evaluation time with different model is as follows:

  • Evaluation Time for arch: detr is 762.916088104248 ms.
  • Evaluation Time for arch: yolos is 384.78732109069824 ms.
  • Evaluation Time for arch: cond-detr is 776.5250205993652 ms.
  • Evaluation Time for arch: def-detr is 2585.845708847046 ms.

Output:

Original Image:

original

DETR Output:

output_detr

Cond-DETR Output:

output_cond-detr

Deformable DETR Output:

output_def-detr

YOLOS Output:

output_yolos

Summary

In the original image, only 7 balloons are present and it was detected correctly with Cond-Detr & Def-Detr.

Detr model able to predict only 6 balloons & misses 1 prediction.Yolos is able to predict only 5 balloons &

misses 2 predictions.However,Yolos is the fastest architecture among all,whereas Def-Detr takes longer time than others.(Note: All the models were trained for 500 epochs).

So there is a clear trade-off between accuracy & speed.Please check the profiling data mentioned above.

Accuracy can be improved by finetuning the hyper parameters or with more training.

But the clear winner in terms of speed is Yolos & in terms of accuracy it's Cond-Detr & Def-Detr.

Object Detection with Trasformer Using Detectron2

Dependencies

  1. Install detectron2. (Prefer to use Conda version).

  2. Install DyHead by following the steps present in DynamicHead.

    You may face some building issue related to CUDA in DynamicHead/dyhead/csrc/cuda/{deform_conv_cuda.cu, SigmoidFocalLoss_cuda.cu}. Try to Fix them.Otherwise,let me know what is the error you are facing.

Dataset Preparation

Balloon dataset is converted to COCO format & is present inside custom_balloon folder.

If you want to convert balloon dataset in to coco format & use it in Detectron2.Then, follow the below steps.

-   Download the balloon dataset from https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip.
-   git clone https://github.com/woctezuma/VIA2COCO
-   cd VIA2COCO/
    git checkout fixes
-   run convert_coco.py

Usage

Enviornment Setup

Training

  • For DyHead with FPN backbone:

    python3.8 training.py --outdir [where model will be saved] --arch dyhead-fpn --config [file path] --weight [file path] --epochs [no of epochs].
    
    For Example,
        python3.8 training.py --outdir out_dyhead_fpn/ --arch dyhead-fpn --config configs/dyhead_r50_atss_fpn_1x.yaml --weight pretrained_model/dyhead_r50_atss_fpn_1x.pth --epochs 5000
    
  • For DyHead with Swin-T transformer backbone:

    python3.8 training.py --outdir [where model will be saved] --arch dyhead-swint --config [file path] --weight [file path] --epochs [no of epochs]
    
    For Example,
        python3.8 training.py --outdir out_dyhead_swint/ --arch dyhead-swint --config configs/dyhead_swint_atss_fpn_2x_ms.yaml --weight pretrained_model/dyhead_swint_atss_fpn_2x_ms.pth --epochs 5000.
    
  • For DETR:

    python3.8 training.py --outdir [where model will be saved] --arch detr --config [file path] --weight [file path] --epochs [no of epochs].
    
    For Example,
        python3.8 training.py --outdir out_test/ --arch detr --config configs/detr_256_6_6_torchvision.yaml --weight pretrained_model/converted_model.pth --epochs 5000
    

Inference:

  • For DyHead with FPN backbone:

    python3.8 inference.py --outdir out_dyhead_fpn/ --arch dyhead-fpn --config configs/dyhead_r50_atss_fpn_1x.yaml --save True.

    Inference Time:

     Evaluation Time : {} ms  108.9015007019043
     Evaluation Time : {} ms  103.93381118774414
    

    dyhead_output2

  • For DyHead with Swin-T transformer backbone: python3.8 inference.py --outdir out_dyhead_swint/ --arch dyhead-swint --config configs/dyhead_swint_atss_fpn_2x_ms.yaml --save True.

    Inference Time: Evaluation Time : {} ms 157.5005054473877. Evaluation Time : {} ms 153.02109718322754

  • For DETR:

    python3.8 inference.py --outdir out_detr/ --arch detr --config configs/detr_256_6_6_torchvision.yaml --save True.

    Inference Time:

      Evaluation Time : {} ms  71.02847099304199
      Evaluation Time : {} ms  92.53978729248047
    

detr_output2

Summary:

As you can see from output, DETR is slighly faster than DyHead.However,DETR is not that accurate as DyHead in predicting all the ballons.

Please check the above output.

We can try other DyHead configs such as dyhead_swint_atss_fpn_2x_ms.yaml and check the output.

Here the idea is to demonstrate how to use trasformer based object detection using Detectron2 framework. Please feel free to share your feedback.

References:


Reach me @

LinkedIn GitHub Medium

About

Object Detection with Transformers : DETR, Conditional DETR, Deformable DETR, Dynamic Head

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages