Skip to content

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

Notifications You must be signed in to change notification settings

sail-sg/AnyDoor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Test-Time Backdoor Attacks on
Multimodal Large Language Models

[Project Page] | [arXiv] | [Data Repository


TL, DR:

We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.

Teaser image

Requirements

  • Platform: Linux
  • Hardware: A100 PCIe 40G

In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the LLaVa-1.5 architecture provided by Transformers, which is seamlessly integrated from the huggingface.co model hub.

pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e

Dataset Generation

DALL-E

Teaser image

As detailed in our paper, the DALL-E dataset utilizes a generative method. Initially, we randomly select textual descriptions from MS-COCO captions and subsequently use these as prompts to generate images via DALL-E. Following this, we craft questions related to the contents of images using ChatGPT-4. To conclude the process, we generate the original answers with LLaVa-1.5 as reference.

Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!

SVIT

For SVIT dataset is curated by randomly selecting questions from the complex reasoning QA pairs. Images are sourced from Visual Genom. For answer references, we utilize outputs generated by LLaVa-1.5.

VQAv2

We incorporate the original image-question pairs directly from the VQAv2 dataset. Answers are provided as references, produced by the LLaVa-1.5 model.

Processed Files

Download our processed json files:

https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link

Test-Time Backdoor Attack

Overview of our AnyDoor

Teaser image

Quick Start

Border Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode border \
    --patch_size 6 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Corner Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode four_corner \
    --patch_size 32 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Pixel Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --pixel_attack \
    --epsilon 32 \
    --alpha_weight 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Visualization

Teaser image

Under continuously changing scenes

Teaser image

Bibtex

If you find this project useful in your research, please consider citing our paper:

@article{
      lu2024testtime,
      title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
      author={Lu, Dong and Pang, Tianyu 
        and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
      journal={arXiv preprint arXiv:2402.08577},
      year={2024},
      }

About

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages