[Project Page] | [arXiv] | [Data Repository]
We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.
- Platform: Linux
- Hardware: A100 PCIe 40G
In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the LLaVa-1.5 architecture provided by Transformers, which is seamlessly integrated from the huggingface.co model hub.
pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
As detailed in our paper, the DALL-E dataset utilizes a generative method. Initially, we randomly select textual descriptions from MS-COCO captions and subsequently use these as prompts to generate images via DALL-E. Following this, we craft questions related to the contents of images using ChatGPT-4. To conclude the process, we generate the original answers with LLaVa-1.5 as reference.
Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!
For SVIT dataset is curated by randomly selecting questions from the complex reasoning QA pairs
.
Images are sourced from Visual Genom.
For answer references, we utilize outputs generated by LLaVa-1.5.
We incorporate the original image-question pairs directly from the VQAv2 dataset. Answers are provided as references, produced by the LLaVa-1.5 model.
Download our processed json files:
https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode border \
--patch_size 6 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode four_corner \
--patch_size 32 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--pixel_attack \
--epsilon 32 \
--alpha_weight 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
If you find this project useful in your research, please consider citing our paper:
@article{
lu2024testtime,
title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
author={Lu, Dong and Pang, Tianyu
and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
journal={arXiv preprint arXiv:2402.08577},
year={2024},
}