HarMA

This repo is the official implementation of "Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment"(ICLRW 2024).

Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

Introduction 🌍

This paper proposes a framework for efficient remote sensing using Harmonized Transfer Learning and Modality Alignment (HarMA), addressing key challenges in the field of remote sensing image-text retrieval. HarMA leverages a unified perspective on multimodal transfer learning to enhance task performance, modality alignment, and single-modality uniform alignment. The core innovation lies in the hierarchical multimodal adapter inspired by the human brain's information processing, which integrates shared mini-adapters to improve fine-grained semantic alignment. By employing parameter-efficient fine-tuning, HarMA significantly reduces training overhead while achieving state-of-the-art performance on popular multimodal retrieval tasks without relying on external data or other tricks. This approach outperforms fully fine-tuned models with minimal parameter adjustments, making it a versatile and resource-efficient solution for remote sensing applications. Experiments validate the effectiveness of HarMA, showcasing its potential to enhance vision and language representations in remote sensing tasks.

Implementation 💻

Environments 🌐

Set up the environment by running:

pip install -r requirements.txt

Datasets 📚

All experiments are based on the RSITMD and RSICD datasets.

Download the images from Baidu Disk or Google Drive and modify the configs/yaml file accordingly:

image_root: './images/datasets_name/'

The annotation files for the datasets are located in the data/finetune directory.

Training 📈

Download the GeoRSCLIP pre-trained model from this link and place it in the models/pretrain/ directory.

If you encounter environmental issues, you can modify the get_dist_launch function in run.py. For example, for a 2-GPU setup:

elif args.dist == 'f2':
        return "CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 /root/miniconda3/bin/python -W ignore -m torch.distributed.launch --master_port 9999 --nproc_per_node=2 " \
               "--nnodes=1 "

Start training with:

python run.py --task 'itr_rsitmd_vit' --dist "f2" --config 'configs/Retrieval_rsitmd_vit.yaml' --output_dir './checkpoints/HARMA/full_rsitmd_vit'

python run.py --task 'itr_rsicd_vit' --dist "f2" --config 'configs/Retrieval_rsicd_vit.yaml' --output_dir './checkpoints/HARMA/full_rsicd_vit'

python run.py --task 'itr_rsitmd_geo' --dist "f2" --config 'configs/Retrieval_rsitmd_geo.yaml' --output_dir './checkpoints/HARMA/full_rsitmd_geo'

python run.py --task 'itr_rsicd_geo' --dist "f2" --config 'configs/Retrieval_rsicd_geo.yaml' --output_dir './checkpoints/HARMA/full_rsicd_geo'

Testing 🧪

To evaluate the model, change if_evaluation to True in the configs/yaml, then run:

python run.py --task 'itr_rsitmd_vit' --dist "f2" --config 'configs/Retrieval_rsitmd_vit.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsitmd_vit/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsicd_vit' --dist "f2" --config 'configs/Retrieval_rsicd_vit.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsicd_vit/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsitmd_geo' --dist "f2" --config 'configs/Retrieval_rsitmd_geo.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsitmd_geo/checkpoint_best.pth' --evaluate

python run.py --task 'itr_rsicd_geo' --dist "f2" --config 'configs/Retrieval_rsicd_geo.yaml' --output_dir './checkpoints/HARMA/test' --checkpoint './checkpoints/HARMA/full_rsicd_geo/checkpoint_best.pth' --evaluate

Note: We provide a Jupyter notebook for direct execution. Please refer to the begin.ipynb file. If you want to test or use the pre-trained models directly, you can download the checkpoints from Checkpoints-v1.0.0.

Citation 📜

If you find this paper or repository useful for your work, please give it a star ⭐ and cite it as follows:

@article{huang2024efficient,
  title={Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment},
  author={Huang, Tengjun},
  journal={arXiv preprint arXiv:2404.18253},
  year={2024}
}

Acknowledgement 🙏

This code builds upon the excellent work of PIR by Pan et al.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
assets		assets
configs		configs
data/finetune		data/finetune
dataset		dataset
models		models
open_clip		open_clip
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Retrieval.py		Retrieval.py
begin.ipynb		begin.ipynb
mytools.py		mytools.py
optim.py		optim.py
requirements.txt		requirements.txt
run.py		run.py
scheduler.py		scheduler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HarMA

Introduction 🌍

Implementation 💻

Environments 🌐

Datasets 📚

Training 📈

Testing 🧪

Citation 📜

Acknowledgement 🙏

About

Releases 1

Packages

Languages

License

seekerhuang/HarMA

Folders and files

Latest commit

History

Repository files navigation

HarMA

Introduction 🌍

Implementation 💻

Environments 🌐

Datasets 📚

Training 📈

Testing 🧪

Citation 📜

Acknowledgement 🙏

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages