Official repo of USENIX Security 2025 paper: PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models.
The first knowledge database corruption attack against Retrievals-Augmented Generation (RAG) system.
🎉 Jun 20, 2024: PoisonedRAG gets accepted to USENIX Security 2025!
🔥 Apr 20, 2024: If you have any question or need other code or data, feel free to open an issue or email us!
conda create -n PoisonedRAG python=3.10
conda activate PoisonedRAG
pip install beir openai google-generativeai
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install --upgrade charset-normalizer
pip3 install "fschat[model_worker,webui]"
When running our code, the datasets will be automatically downloaded and saved in datasets
. You could also run this line to manually download datasets.
python prepare_dataset.py
If you want to use PaLM 2, GPT-3.5, GPT-4 or LLaMA-2, please enter your api key in model_configs folder.
For LLaMA-2, the api key is your HuggingFace Access Tokens. You could visit LLaMA-2's HuggingFace Page first if you don't have the access token.
Here is an example:
"api_key_info":{
"api_keys":[
"Your api key here"
],
"api_key_use": 0
},
There are some hyperparameters in run.py such as LLMs and datasets:
Note: Currently we provide default setting for main results in our paper. We will update and complete other settings later.
test_params = {
# beir_info
'eval_model_code': "contriever",
'eval_dataset': "nq", # nq, hotpotqa, msmarco
'split': "test",
'query_results_dir': 'main',
# LLM setting
'model_name': 'palm2', # palm2, gpt3.5, gpt4, llama(7b|13b), vicuna(7b|13b|33b)
'use_truth': False,
'top_k': 5,
'gpu_id': 0,
# attack
'attack_method': 'LM_targeted', # LM_targeted (black-box), hotflip (white-box)
'adv_per_query': 5,
'score_function': 'dot',
'repeat_times': 10,
'M': 10,
'seed': 12,
'note': None
}
Execute run.py to reproduce experiments.
python run.py
If you want to perform experiments on your own dataset, you could refer to evaluate_beir.py, prepare your dataset as the beir format and use this file to compute the retrieval scores.
- Our code used the implementation of corpus-poisoning.
- The model part of our code is from Open-Prompt-Injection.
- Our code used beir benchmark.
- Our code used contriever for retrieval augmented generation (RAG).
If you use this code, please cite the following paper:
@misc{zou2024poisonedrag,
title={PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models},
author={Wei Zou and Runpeng Geng and Binghui Wang and Jinyuan Jia},
year={2024},
eprint={2402.07867},
archivePrefix={arXiv},
primaryClass={cs.CR}
}