Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion

Yixin Zhu, Long Lv, Pingping Zhang^*, Xuehu Liu, Tongdan Tang,
Feng Tian, Weibing Sun, Huchuan Lu

TIP 2026 Paper

ISFM is a novel Mamba-based interactive spatial-frequency fusion framework designed for Multi-Modal Image Fusion (MMIF). It aims to fully exploit the complementarity of domain-specific characteristics by incorporating frequency information into the spatial fusion process and leveraging Mamba to capture long-range dependencies. Specifically, we propose a Multi-scale Frequency Fusion to adaptively integrates low-frequency and high-frequency components of different modalities in multiple scales. To fully explore the complementarity of domain-specific characteristics, we propose an Interactive Spatial-Frequency Fusion including a Frequency-Guided Mamba and a Frequency-Guided Gate. By combining these modules, our ISFM comprehensively integrates complementary information in the spatial and frequency domains. Extensive experiments on six MMIF datasets demonstrate that our method can achieve better performance than other state-of-the-art methods.

News

Exciting news! Our paper has been accepted by the TIP 2026! 🎉🎉 Paper

Introduction

ISFM is a Mamba-based interactive spatial-frequency fusion framework for Multi-Modal Image Fusion (MMIF). This repository provides the training and testing code, along with pretrained weights for reproducing the results in our paper.

Contributions

We introduce a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. It provides a distinct perspective for spatial-frequency fusion.
We propose a Multi-scale Frequency Fusion (MFF) to effectively fuse frequency information across multiple scales. In addition, we propose an Interactive Spatial Frequency Fusion (ISF) to fully exploit the complementarity of spatial-frequency information.
Extensive experiments on IVIF and MIF tasks validate the effectiveness of our method. We also validate our method in helping high-level computer vision tasks.

Results

Quantitative Comparison

Evaluation of Downstream Tasks

Visualizations

Qualitative Comparison

Comparison with state-of-the-art methods on MMIF datasets.

Feature Map Visualization

To validate the effectiveness of the proposed modules, we visualize the extracted features of different modules.

Frequency Domain Decomposition

To visually validate the effectiveness of our frequency domain fusion mechanism, we conduct two kinds of visualization experiments. First, we show the DWT decomposition of the source images and the corresponding features fused by the proposed MFF.

Second, we visualize the effect of the high-frequency enhancement operation.

Evaluation of Downstream Tasks

We further evaluate the effectiveness of our method in two downstream tasks, i.e., object detection and semantic segmentation.

Reproduction

Requirements

Python 3.8
PyTorch 2.0.1
CUDA 11.7
mamba-ssm 1.2.0

Installation

# Create a virtual environment
conda create -n ISFM python=3.8 -y
conda activate ISFM

# Install dependencies
pip install -r requirements.txt

Datasets

We use the following datasets. Please organize the files following the dataset directory structure.

Datasets	Download link
MSRS	Download here
RoadScene	Download here
FMB	Download here
Harvard	Download here

The dataset directory structure is organized as follows. Please open your configuration file and modify INPUT.ROOT_DIR to point to the path of your downloaded dataset:

data/
├── train/
│ ├── vi/ # Visible image
│ └── ir/ # Infrared image
└── test/
├── vi/
└── ir/

Usage

The configuration is defined in the .yaml files (e.g., configs/train.yaml). Before running the code, please modify the paths to match your local environment.

Train

To train the ISFM model from scratch, run:

python train.py --config configs/train.yaml

The training logs and model checkpoints will be automatically saved in output/exp_name/.

Test

Pre-trained weights are included in best/checkpoints/. To evaluate a specific model, modify TEST.CHECKPOINT_PATH to point to your pretrained weight, then run:

python test.py --config configs/test.yaml

Note: You can also override the config options directly from the command line without modifying the yaml file:

python test.py --config configs/test.yaml TEST.CHECKPOINT_PATH "checkpoints/best.pth"

The testing process produces the following outputs:

Fusion Results: The fused images will be saved in the output directory.
Evaluation Logs: The quantitative metrics (e.g., EN, SSIM, VIF) will be recorded in a .log file within the output folder.

If you already have the fused images and only want to calculate the metrics (or evaluate results from other methods), you can run the evaluation script:

# Calculate metrics for existing images
python eval/test_metric.py

Citation

If you find ISFM useful in your research, please consider citing:

@article{zhu2026isfm,
      title={Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion}, 
      author={Zhu, Yixin and Lv, Long and Zhang, Pingping and Liu, Xuehu and Tang, Tongdan and Tian, Feng and Sun, Weibing and Lu, Huchuan},
      journal={arXiv preprint arXiv:2602.04405},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
best/checkpoints		best/checkpoints
configs		configs
datasets		datasets
defaults		defaults
eval		eval
loss		loss
lr_scheduler		lr_scheduler
modeling		modeling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion

News

Table of Contents

Introduction

Contributions

Results

Quantitative Comparison

Evaluation of Downstream Tasks

Visualizations

Qualitative Comparison

Feature Map Visualization

Frequency Domain Decomposition

Evaluation of Downstream Tasks

Reproduction

Requirements

Installation

Datasets

Usage

Train

Test

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion

News

Table of Contents

Introduction

Contributions

Results

Quantitative Comparison

Evaluation of Downstream Tasks

Visualizations

Qualitative Comparison

Feature Map Visualization

Frequency Domain Decomposition

Evaluation of Downstream Tasks

Reproduction

Requirements

Installation

Datasets

Usage

Train

Test

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages