🎉 Paper
This repository contains the code and data for the paper Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models. The paper explores the topic of latent jailbreak and presents a novel approach to evaluate the text safety and output robustness for large language models.
The data used in this paper is included in the data
directory.
Templates for latent jailbreak prompts.
cd src
python BELLE_7B_2M.py
python ChatGLM2-6B.py
python ChatGPT.py --api_key 'your key'
python finetune.py
If you use the code or data in this repository, please cite the following paper.
@misc{qiu2023latent,
title={Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models},
author={Huachuan Qiu and Shuai Zhang and Anqi Li and Hongliang He and Zhenzhong Lan},
year={2023},
eprint={2307.08487},
archivePrefix={arXiv},
primaryClass={cs.CL}
}