ISFM is a novel Mamba-based interactive spatial-frequency fusion framework designed for Multi-Modal Image Fusion (MMIF). It aims to fully exploit the complementarity of domain-specific characteristics by incorporating frequency information into the spatial fusion process and leveraging Mamba to capture long-range dependencies. Specifically, we propose a Multi-scale Frequency Fusion to adaptively integrates low-frequency and high-frequency components of different modalities in multiple scales. To fully explore the complementarity of domain-specific characteristics, we propose an Interactive Spatial-Frequency Fusion including a Frequency-Guided Mamba and a Frequency-Guided Gate. By combining these modules, our ISFM comprehensively integrates complementary information in the spatial and frequency domains. Extensive experiments on six MMIF datasets demonstrate that our method can achieve better performance than other state-of-the-art methods.
Exciting news! Our paper has been accepted by the TIP 2026! 🎉🎉 Paper
ISFM is a Mamba-based interactive spatial-frequency fusion framework for Multi-Modal Image Fusion (MMIF). This repository provides the training and testing code, along with pretrained weights for reproducing the results in our paper.
- We introduce a novel Interactive Spatial-Frequency Fusion Mamba (ISFM) framework for MMIF. It provides a distinct perspective for spatial-frequency fusion.
- We propose a Multi-scale Frequency Fusion (MFF) to effectively fuse frequency information across multiple scales. In addition, we propose an Interactive Spatial Frequency Fusion (ISF) to fully exploit the complementarity of spatial-frequency information.
- Extensive experiments on IVIF and MIF tasks validate the effectiveness of our method. We also validate our method in helping high-level computer vision tasks.
Comparison with state-of-the-art methods on MMIF datasets.
To validate the effectiveness of the proposed modules, we visualize the extracted features of different modules.
To visually validate the effectiveness of our frequency domain fusion mechanism, we conduct two kinds of visualization experiments. First, we show the DWT decomposition of the source images and the corresponding features fused by the proposed MFF.
Second, we visualize the effect of the high-frequency enhancement operation.We further evaluate the effectiveness of our method in two downstream tasks, i.e., object detection and semantic segmentation.
- Python 3.8
- PyTorch 2.0.1
- CUDA 11.7
- mamba-ssm 1.2.0
# Create a virtual environment
conda create -n ISFM python=3.8 -y
conda activate ISFM
# Install dependencies
pip install -r requirements.txt
We use the following datasets. Please organize the files following the dataset directory structure.
| Datasets | Download link |
|---|---|
| MSRS | Download here |
| RoadScene | Download here |
| FMB | Download here |
| Harvard | Download here |
The dataset directory structure is organized as follows. Please open your configuration file and modify INPUT.ROOT_DIR to point to the path of your downloaded dataset:
data/
├── train/
│ ├── vi/ # Visible image
│ └── ir/ # Infrared image
└── test/
├── vi/
└── ir/The configuration is defined in the .yaml files (e.g., configs/train.yaml). Before running the code, please modify the paths to match your local environment.
To train the ISFM model from scratch, run:
python train.py --config configs/train.yamlThe training logs and model checkpoints will be automatically saved in output/exp_name/.
Pre-trained weights are included in best/checkpoints/. To evaluate a specific model, modify TEST.CHECKPOINT_PATH to point to your pretrained weight, then run:
python test.py --config configs/test.yamlNote: You can also override the config options directly from the command line without modifying the yaml file:
python test.py --config configs/test.yaml TEST.CHECKPOINT_PATH "checkpoints/best.pth"The testing process produces the following outputs:
- Fusion Results: The fused images will be saved in the output directory.
- Evaluation Logs: The quantitative metrics (e.g., EN, SSIM, VIF) will be recorded in a
.logfile within the output folder.
If you already have the fused images and only want to calculate the metrics (or evaluate results from other methods), you can run the evaluation script:
# Calculate metrics for existing images
python eval/test_metric.pyIf you find ISFM useful in your research, please consider citing:
@article{zhu2026isfm,
title={Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion},
author={Zhu, Yixin and Lv, Long and Zhang, Pingping and Liu, Xuehu and Tang, Tongdan and Tian, Feng and Sun, Weibing and Lu, Huchuan},
journal={arXiv preprint arXiv:2602.04405},
year={2026},
}















