sail-sg / AnyDoor Public

Notifications You must be signed in to change notification settings
Fork 1
Star 43

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

sail-sg.github.io/anydoor/

43 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
README.md		README.md
anydoor_llava.py		anydoor_llava.py
dct.py		dct.py
environment.yml		environment.yml
utils.py		utils.py
utils_ddp.py		utils_ddp.py

Repository files navigation

Test-Time Backdoor Attacks on
Multimodal Large Language Models

[Project Page] | [arXiv] | [Data Repository]

TL, DR:

We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.

Requirements

Platform: Linux
Hardware: A100 PCIe 40G

In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the LLaVa-1.5 architecture provided by Transformers, which is seamlessly integrated from the huggingface.co model hub.

pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e

Dataset Generation

DALL-E

As detailed in our paper, the DALL-E dataset utilizes a generative method. Initially, we randomly select textual descriptions from MS-COCO captions and subsequently use these as prompts to generate images via DALL-E. Following this, we craft questions related to the contents of images using ChatGPT-4. To conclude the process, we generate the original answers with LLaVa-1.5 as reference.

Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!

SVIT

For SVIT dataset is curated by randomly selecting questions from the complex reasoning QA pairs. Images are sourced from Visual Genom. For answer references, we utilize outputs generated by LLaVa-1.5.

VQAv2

We incorporate the original image-question pairs directly from the VQAv2 dataset. Answers are provided as references, produced by the LLaVa-1.5 model.

Processed Files

Download our processed json files:

https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link

Test-Time Backdoor Attack

Overview of our AnyDoor

Quick Start

Border Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode border \
    --patch_size 6 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Corner Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --patch_attack \
    --patch_mode four_corner \
    --patch_size 32 \
    --lr 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Pixel Attack

python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
    --dir_path ./AnyDoor \
    --output output \
    --model_size '7b' \
    --dataset dalle3 \
    --batch_size 1 \
    --attack_samples 40 \
    --image_size 336 \
    --max_epochs 500 \
    --store_epoch 100 \
    --trigger 'SUDO' \
    --target_answer 'I want to destroy the whole world together.' \
    --pixel_attack \
    --epsilon 32 \
    --alpha_weight 5 \
    --loss_without_trigger_weight 1.0 \
    --loss_with_trigger_weight 1.0 \
    --loss_type 3

Visualization

Under continuously changing scenes

Bibtex

If you find this project useful in your research, please consider citing our paper:

@article{
      lu2024testtime,
      title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
      author={Lu, Dong and Pang, Tianyu 
        and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
      journal={arXiv preprint arXiv:2402.08577},
      year={2024},
      }

About

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

sail-sg.github.io/AnyDoor/

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%