Official PyTorch implementation of our paper
- Title: DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
- Authors: Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu
- Institutes: National University of Defense Technology, Sichuan University and Westlake University
In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-language knowledge to VG. Specifically, we propose DARA, a novel PETL method comprising Domain-aware Adapters (DA Adapters) and Relation-aware Adapters (RA Adapters) for VG. DA Adapters first transfer intra-modality representations to be more fine-grained for the VG domain. Then RA Adapters share weights to bridge the relation between two modalities, improving spatial reasoning. Empirical results on widely-used benchmarks demonstrate that DARA achieves the best accuracy while saving numerous updated parameters compared to the full fine-tuning and other PETL methods. Notably, with only 2.13% tunable backbone parameters, DARA improves average accuracy by 0.81% across the three benchmarks compared to the baseline model.
📌 We confirm that the relevant code and implementation details will be uploaded by June. Please be patient.
Please consider citing our paper in your publications, if our findings help your research.
@misc{liu2024dara,
title={{DARA}: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding},
author={Ting Liu and Xuyang Liu and Siteng Huang and Honggang Chen and Quanjun Yin and Long Qin and Donglin Wang and Yue Hu},
year={2024},
eprint={2405.06217},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
For any question about our paper or code, please contact Ting Liu or Xuyang Liu.