Skip to content

Namn23/ISFM

Repository files navigation

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion


ISFM is a novel Mamba-based interactive spatial-frequency fusion framework designed for Multi-Modal Image Fusion (MMIF). It aims to fully exploit the complementarity of domain-specific characteristics by incorporating frequency information into the spatial fusion process and leveraging Mamba to capture long-range dependencies. Specifically, we propose a Multi-scale Frequency Fusion to adaptively integrates low-frequency and high-frequency components of different modalities in multiple scales. To fully explore the complementarity of domain-specific characteristics, we propose an Interactive Spatial-Frequency Fusion including a Frequency-Guided Mamba and a Frequency-Guided Gate. By combining these modules, our ISFM comprehensively integrates complementary information in the spatial and frequency domains. Extensive experiments on six MMIF datasets demonstrate that our method can achieve better performance than other state-of-the-art methods.

News

Exciting news! Our paper has been accepted by the TIP 2026! 🎉🎉 Paper

Table of Contents

Introduction

ISFM is a Mamba-based interactive spatial-frequency fusion framework for Multi-Modal Image Fusion (MMIF). This repository provides the training and testing code, along with pretrained weights for reproducing the results in our paper.

Contributions

  • We introduce a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. It provides a distinct perspective for spatial-frequency fusion.
  • We propose a Multi-scale Frequency Fusion (MFF) to effectively fuse frequency information across multiple scales. In addition, we propose an Interactive Spatial Frequency Fusion (ISF) to fully exploit the complementarity of spatial-frequency information.
  • Extensive experiments on IVIF and MIF tasks validate the effectiveness of our method. We also validate our method in helping high-level computer vision tasks.

Results

Quantitative Comparison

Evaluation of Downstream Tasks

Visualizations

Qualitative Comparison

Comparison with state-of-the-art methods on MMIF datasets.

Feature Map Visualization

To validate the effectiveness of the proposed modules, we visualize the extracted features of different modules.

Frequency Domain Decomposition

To visually validate the effectiveness of our frequency domain fusion mechanism, we conduct two kinds of visualization experiments. First, we show the DWT decomposition of the source images and the corresponding features fused by the proposed MFF.

Second, we visualize the effect of the high-frequency enhancement operation.

Evaluation of Downstream Tasks

We further evaluate the effectiveness of our method in two downstream tasks, i.e., object detection and semantic segmentation.

Reproduction

Requirements

  • Python 3.8
  • PyTorch 2.0.1
  • CUDA 11.7
  • mamba-ssm 1.2.0

Installation

# Create a virtual environment
conda create -n ISFM python=3.8 -y
conda activate ISFM

# Install dependencies
pip install -r requirements.txt

Datasets

We use the following datasets. Please organize the files following the dataset directory structure.

Datasets Download link
MSRS Download here
RoadScene Download here
FMB Download here
Harvard Download here

The dataset directory structure is organized as follows. Please open your configuration file and modify INPUT.ROOT_DIR to point to the path of your downloaded dataset:

data/
├── train/
│ ├── vi/ # Visible image
│ └── ir/ # Infrared image
└── test/
├── vi/
└── ir/

Usage

The configuration is defined in the .yaml files (e.g., configs/train.yaml). Before running the code, please modify the paths to match your local environment.

Train

To train the ISFM model from scratch, run:

python train.py --config configs/train.yaml

The training logs and model checkpoints will be automatically saved in output/exp_name/.

Test

Pre-trained weights are included in best/checkpoints/. To evaluate a specific model, modify TEST.CHECKPOINT_PATH to point to your pretrained weight, then run:

python test.py --config configs/test.yaml

Note: You can also override the config options directly from the command line without modifying the yaml file:

python test.py --config configs/test.yaml TEST.CHECKPOINT_PATH "checkpoints/best.pth"

The testing process produces the following outputs:

  • Fusion Results: The fused images will be saved in the output directory.
  • Evaluation Logs: The quantitative metrics (e.g., EN, SSIM, VIF) will be recorded in a .log file within the output folder.

If you already have the fused images and only want to calculate the metrics (or evaluate results from other methods), you can run the evaluation script:

# Calculate metrics for existing images
python eval/test_metric.py

Citation

If you find ISFM useful in your research, please consider citing:

@article{zhu2026isfm,
      title={Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion}, 
      author={Zhu, Yixin and Lv, Long and Zhang, Pingping and Liu, Xuehu and Tang, Tongdan and Tian, Feng and Sun, Weibing and Lu, Huchuan},
      journal={arXiv preprint arXiv:2602.04405},
      year={2026},
}

About

【TIP 2026】Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages