Skip to content

Entropy-based Attention Clustering for detection Transformers

Notifications You must be signed in to change notification settings

GSavathrakis/ENACT

Repository files navigation

ENACT: Entropy-based Attention Clustering for detection Transformers

This is the official implementation of the paper ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers.

ENACT It is a plug-in module, used for clustering the input of Detection Transformers, based on their entropy which is learnable. In its current state, it can be plugged only in Detection Transformers that have a Multi-Head Self-Attention module in their encoder.
In this repository, we plug ENACT to three such models, which are the DETR, Conditional DETR and Anchor DETR.

We provide comparisons in GPU memory usage, training and inference times (in seconds per image) between detection transformer models, with and without ENACT.

model backbone epochs batch GPU (GB) train time inf time
DETR-C5 R50 300 8 36.5 0.0541 0.0482
DETR-C5 + ENACT R50 300 8 23.5 0.0488 0.0472
Conditional DETR-C5 R101 50 8 46.6 0.0826 0.0637
Conditional DETR-C5 + ENACT R101 50 8 36.7 0.0779 0.0605
Anchor DETR-DC5 R50 50 4 29.7 0.0999 0.0712
Anchor DETR-DC5 + ENACT R50 50 4 17.7 0.0845 0.0608
All experiments were done using the COCO 2017 train118k set for training, and val5k for validation. The precisions are computed based on the validation performance. We also provide logs and checkpoints for the models trained using ENACT.
model AP AP50 APS APM APL url
DETR-C5 40.6 61.6 19.9 44.3 60.2 -
DETR-C5 + ENACT 39.0 59.1 18.3 42.2 57.0 model | log
Conditional DETR-C5 42.8 63.7 21.7 46.6 60.9 -
Conditional DETR-C5 + ENACT 41.5 62.2 21.3 45.5 59.3 model | log
Anchor DETR-DC5 44.3 64.9 25.1 48.1 61.1 -
Anchor DETR-DC5 + ENACT 42.9 63.5 25.0 46.8 58.5 model | log

Setup instructions

Initially, clone the repository.

git clone https://github.com/GSavathrakis/ENACT.git
cd ENACT

Download the data

You should download the MS COCO dataset. This module was trained on the COCO 2017 dataset. The structure of the downloaded files should be the following:

path_to_coco/
├── train2017/
├── val2017/
└── annotations/
	├── instances_train2017.json
	└── instances_val2017.json

Install using conda

Subsequently, set up an anaconda environment. This repo was tested on python 3.10 with cuda 11.7

conda create -n "env name" python="3.10 or above"
conda activate "env name"

Next you need to install cuda in your conda environment and the additional packages

conda install nvidia/label/cuda-11.7.0::cuda
pip install torch==2.0.0 torchvision cython scipy pycocotools tqdm numpy==1.23 opencv-python

Install using Docker

Alternatively, you can create a docker container using the Dockerfile and the .yml files provided.

docker compose build
docker compose up

Training

In order to train one of the detection transformers, with the ENACT module, you should run:

python "Path to one of the DETR variants models"/main.py --coco_path "Path to COCO dataset" --output_dir "Path to the directory where you want to save checkpoints"

For example, if you want to train the Anchor-DETR model with ENACT you should run:

python Anchor-DETR-ENACT/main.py --coco_path "Path to COCO dataset" --output_dir "Path to the directory where you want to save checkpoints"

Inference

You can also evaluate the ENACT module on the three models using the pretrained models that can be downloaded from the links in the second table.
For example if you want to evaluate the DETR model with ENACT you should run:

python DETR-ENACT/main.py --coco_path "Path to COCO dataset" --output_dir "Path to the directory where you want to save checkpoints" --resume "Path to DETR-ENACT checkpoint" --eval

Citation

If you find this work useful for your research, please cite:

@misc{savathrakis2024enactentropybasedclusteringattention,
      title={ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers}, 
      author={Giorgos Savathrakis and Antonis Argyros},
      year={2024},
      eprint={2409.07541},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.07541}, 
}

as well as the works of the transformer networks used:

@InProceedings{10.1007/978-3-030-58452-8_13,
  author="Carion, Nicolas
  and Massa, Francisco
  and Synnaeve, Gabriel
  and Usunier, Nicolas
  and Kirillov, Alexander
  and Zagoruyko, Sergey",
  editor="Vedaldi, Andrea
  and Bischof, Horst
  and Brox, Thomas
  and Frahm, Jan-Michael",
  title="End-to-End Object Detection with Transformers",
  booktitle="Computer Vision -- ECCV 2020",
  year="2020",
  publisher="Springer International Publishing",
  address="Cham",
  pages="213--229",
  abstract="We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster R-CNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.",
  isbn="978-3-030-58452-8"
}

@inproceedings{wang2022anchor,
  title={Anchor detr: Query design for transformer-based detector},
  author={Wang, Yingming and Zhang, Xiangyu and Yang, Tong and Sun, Jian},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  volume={36},
  pages={2567--2575},
  year={2022}
}

@inproceedings{meng2021conditional,
  title={Conditional detr for fast training convergence},
  author={Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={3651--3660},
  year={2021}
}

Acknowledgements

Expand

About

Entropy-based Attention Clustering for detection Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages