Skip to content

[ICLRW 2024] Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

License

Notifications You must be signed in to change notification settings

seekerhuang/HarMA

Repository files navigation

HarMA

PWC

PWC

This repo is the official implementation of "Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment"(ICLRW 2024).

Introduction 🌍

This paper proposes a framework for efficient remote sensing using Harmonized Transfer Learning and Modality Alignment (HarMA), addressing key challenges in the field of remote sensing image-text retrieval. HarMA leverages a unified perspective on multimodal transfer learning to enhance task performance, modality alignment, and single-modality uniform alignment. The core innovation lies in the hierarchical multimodal adapter inspired by the human brain's information processing, which integrates shared mini-adapters to improve fine-grained semantic alignment. By employing parameter-efficient fine-tuning, HarMA significantly reduces training overhead while achieving state-of-the-art performance on popular multimodal retrieval tasks without relying on external data or other tricks. This approach outperforms fully fine-tuned models with minimal parameter adjustments, making it a versatile and resource-efficient solution for remote sensing applications. Experiments validate the effectiveness of HarMA, showcasing its potential to enhance vision and language representations in remote sensing tasks.

pipeline

Implementation 💻

Environments 🌐

Set up the environment by running:

pip install -r requirements.txt

Datasets 📚

All experiments are based on the RSITMD and RSICD datasets.

Download the images from Baidu Disk or Google Drive and modify the configs/yaml file accordingly:

image_root: './images/datasets_name/'

The annotation files for the datasets are located in the data/finetune directory.

Training 📈

Download the GeoRSCLIP pre-trained model from this link and place it in the models/pretrain/ directory.

If you encounter environmental issues, you can modify the get_dist_launch function in run.py. For example, for a 2-GPU setup:

elif args.dist == 'f2':
        return "CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 /root/miniconda3/bin/python -W ignore -m torch.distributed.launch --master_port 9999 --nproc_per_node=2 " \
               "--nnodes=1 "

Start training with:

python run.py --task 'itr_rsitmd_vit' --dist "f2" --config 'configs/Retrieval_rsitmd_vit.yaml' --output_dir './checkpoints/HARMA/full_rsitmd_vit'

python run.py --task 'itr_rsicd_vit' --dist "f2" --config 'configs/Retrieval_rsicd_vit.yaml' --output_dir './checkpoints/HARMA/full_rsicd_vit'

python run.py --task 'itr_rsitmd_geo' --dist "f2" --config 'configs/Retrieval_rsitmd_geo.yaml' --output_dir './checkpoints/HARMA/full_rsitmd_geo'

python run.py --task 'itr_rsicd_geo' --dist "f2" --config 'configs/Retrieval_rsicd_geo.yaml' --output_dir './checkpoints/HARMA/full_rsicd_geo'

Testing 🧪

To evaluate the model, change if_evaluation to True in the configs/yaml, then run:

python run.py --task 'itr_rsitmd_vit' --dist "f2" --config 'configs/Retrieval_rsitmd_vit.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsitmd_vit/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsicd_vit' --dist "f2" --config 'configs/Retrieval_rsicd_vit.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsicd_vit/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsitmd_geo' --dist "f2" --config 'configs/Retrieval_rsitmd_geo.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsitmd_geo/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsicd_geo' --dist "f2" --config 'configs/Retrieval_rsicd_geo.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsicd_geo/checkpoint_best.pth' --evaluate

Note: We provide a Jupyter notebook for direct execution. Please refer to the begin.ipynb file. If you want to test or use the pre-trained models directly, you can download the checkpoints from Checkpoints-v1.0.0.

Citation 📜

If you find this paper or repository useful for your work, please give it a star ⭐ and cite it as follows:

@article{huang2024efficient,
  title={Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment},
  author={Huang, Tengjun},
  journal={arXiv preprint arXiv:2404.18253},
  year={2024}
}

Acknowledgement 🙏

This code builds upon the excellent work of PIR by Pan et al.

About

[ICLRW 2024] Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published