[MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval]

Official PyTorch implementation and pretrained models of paper: MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval.

Run SimMIM with distributed training:

python run_beit3_finetuning.py --model ‘beit3_large_patch16_384’ --input_size 224 --task ‘coco_retrieval’ --batch_size 128 --layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model ‘beit3.spm’ --data_path ‘path/to/your/dataset’ --output_dir 'coco_retrieval_output/' --log_dir '/coco_retrieval_log/' --weight_decay 0.05  --save_ckpt_freq 1 --finetune 'beit3_large_itc_patch16_224.pth' --num_max_bpe_tokens 64 --ffn_adapt --ffn_mode both --ffn_num 64 --ffn_adapter_scalar 0.6 --ffn_enhance

model specifics the name of model we use in this experiments.
log_dir is the folder dir that stores the ouput log.
task specifics using coco or flickr30k dataset
data_path is the folder dir that stores the datasets.
finetune specifics the dir to pretrained weight of BEiT-3 model.
ffn_adapt specifics to use the Multiway-Adapter
ffn_num specifics the mid dimension of the NewKnowledge Extractor
ffn_enhance specifics to use the Alignment Enhancer

Citation

If you find this repository useful, please consider citing works:

@article{long2023multiway,
  title={MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval},
  author={Long, Zijun and Killick, George and McCreadie, Richard and Camarasa, Gerardo Aragon},
  journal={arXiv preprint arXiv:2309.01516},
  year={2023}
}

Acknowledgement

This repository is built using the BEiT-3 repository and the timm library.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
adapter.py		adapter.py
adapter_multiway.py		adapter_multiway.py
beit3.spm		beit3.spm
config.py		config.py
custom_modules.py		custom_modules.py
datasets.py		datasets.py
engine_for_finetuning.py		engine_for_finetuning.py
glossary.py		glossary.py
modeling_finetune.py		modeling_finetune.py
modeling_utils.py		modeling_utils.py
multiway_encoder.py		multiway_encoder.py
optim_factory.py		optim_factory.py
randaug.py		randaug.py
requirements.txt		requirements.txt
run_beit3_finetuning.py		run_beit3_finetuning.py
utils.py		utils.py

longkukuhi/MultiWay-Adapter

Folders and files

Latest commit

History

Repository files navigation

[MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval]

Run SimMIM with distributed training:

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages