Skip to content

ruohaoguo/UniTR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection [TMM 2024]

Created by Ruohao Guo, Xianghua Ying*, Yanyu Qi, Liao Qu

This repository contains PyTorch implementation for paper "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection".

In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dualstream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing SOTA approaches.


image

Usage

Installation

conda create -n unitr python=3.8 -y
conda activate unitr
pip install torch==1.11.0 torchvision==0.12.0
pip install timm opencv-python einops
pip install tensorboardX pycocotools imageio scipy moviepy thop

Co-object Saliency Detection

Training

  • co-segmentation and co-saliency object detection (training data: COCO2017):
cd ./co_object_saliency_detection
python main.py
cd ./co_object_saliency_detection
python finetune.py

Inference

cd ./co_object_saliency_detection
python generate_maps_cos.py
cd ./co_object_saliency_detection
python generate_maps_cosod.py
cd ./co_object_saliency_detection
python generate_maps_vsod.py

Evaluation

cd ./co_object_saliency_detection
python generate_maps_cos.py
cd ./co_object_saliency_detection/eval
sh eval_cosod.sh
cd ./co_object_saliency_detection/eval
sh eval_vsod.sh

Multi-modal Saliency Detection

Training

  • RGB-T salient object detection (training data: VT5000):
cd ./multi_modal_saliency_detection/train
python train_rgbt.py
  • RGB-D salient object detection (training data: NLPR_NJUD):
cd ./multi_modal_saliency_detection/train
python train_rgbd.py

Inference

cd ./multi_modal_saliency_detection/test
python generate_maps_rgbt.py
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbd.py

Evaluation

cd ./multi_modal_saliency_detection/eval
python eval_rgbt.py
cd ./multi_modal_saliency_detection/eval
python eval_rgbd.py

FAQ

If you want to improve the usability or any piece of advice, please feel free to contant directly (ruohguo@foxmail.com).

Acknowledgement

Thanks SSNM, Swin, UFO, and SwinNet contribution to the community!

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@ARTICLE{10444934,
  author={Guo, Ruohao and Ying, Xianghua and Qi, Yanyu and Qu, Liao},
  journal={IEEE Transactions on Multimedia}, 
  title={UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection}, 
  year={2024},
  volume={26},
  pages={7622-7635},
  doi={10.1109/TMM.2024.3369922}}

About

Official Implementation of "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection" [TMM 2024].

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published