Cross-Modal Fusion and Progressive Decoding Network For RGB-D Salient Object Detection

The paper was accepted by the International Journal of Computer Vision on January 11, 2024. The paper link is: - Link

CPNet

Most existing RGB-D salient object detection (SOD) methods tend to achieve higher performance by integrating additional modules, such as feature enhancement and edge generation. There is no doubt that these modules will inevitably produce feature redundancy and performance degradation. To this end, we exquisitely design a crossmodal fusion and progressive decoding network to achieve RGB-D SOD tasks. The designed network structure only includes three indispensable parts: feature encoding, feature fusion and feature decoding. Specifically, in the feature encoding part, we adopt a two-stream Swin Transformer encoder to extract multi-level and multi-scale features from RGB images and depth images respectively to model global information. In the feature fusion part, we design a cross-modal attention fusion module, which can leverage the attention mechanism to fuse multi-modality and multi-level features. In the feature decoding part, we design a progressive decoder to gradually fuse low-level features and filter noise information to accurately predict salient objects. Extensive experimental results on 6 benchmarks demonstrated that our network surpasses 12 state-of-the-art methods in terms of four metrics. In addition, it is also verified that for the RGB-D SOD task, the addition of the feature enhancement module and the edge generation module is not conducive to improving the detection performance under this framework, which provides new insights into the salient object detection task. Our codes will be available at https://github.com/hu-xh/CPNet.

Network Architecture

Results and Saliency maps

We perform quantitative comparisons and qualitative comparisons with 12 RGB-D SOD methods on six RGB-D datasets.

Prerequisites

Python 3.6
Pytorch 1.10.2
Torchvision 0.11.3
Numpy 1.19.2

Pretrained Model

Download the following pth and put it into main folder

Swin-B with the fetch code:ja95.

Datasets

Train Datasets with the fetch code:1234.
Test Datasets with the fetch code:1234.

Results

You can download the tested results map at - [Baidu Pan link] (https://pan.baidu.com/s/1PlmqAvlAwSzsH2YGR4VzKQ) with the fetch code:dq2w.

You can download the results pth - [Baidu Pan link] (https://pan.baidu.com/s/1x6wQf-RceapsZanH4PfbGg) with the fetch code:50lu.

Contact

Feel free to send e-mails to me (1558239392@qq.com).

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
cpts		cpts
figs		figs
models		models
test_maps		test_maps
CPNet_test.py		CPNet_test.py
CPNet_train.py		CPNet_train.py
README.md		README.md
data.py		data.py
options.py		options.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpts

cpts

figs

figs

models

models

test_maps

test_maps

CPNet_test.py

CPNet_test.py

CPNet_train.py

CPNet_train.py

README.md

README.md

data.py

data.py

options.py

options.py

utils.py

utils.py

Repository files navigation

Cross-Modal Fusion and Progressive Decoding Network For RGB-D Salient Object Detection

CPNet

Network Architecture

Results and Saliency maps

Prerequisites

Pretrained Model

Datasets

Results

Contact

About

Releases

Packages

Languages

hu-xh/CPNet

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Fusion and Progressive Decoding Network For RGB-D Salient Object Detection

CPNet

Network Architecture

Results and Saliency maps

Prerequisites

Pretrained Model

Datasets

Results

Contact

About

Resources

Stars

Watchers

Forks

Languages