Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60% and 3.30%, respectively, and enhances inference performance by 3.50% and 1.10%. Notably, it reduces training time by 56.83% and 57.63%, respectively.

Attention

Pharmacist is a safety alignment data curation solution that enhances defense against harmful fine-tuning while significantly reducing training cost by selecting a high-quality and safety-critical core subset from the original alignment data. It can be seamlessly integrated with existing alignment-stage defense methods such as Vaccine, RepNoise, Booster, and T-Vaccine.

Package requirement

The package requirement is listed in environment.yml. Run the following code to install the packages with anaconda.

conda env create -f environment.yml

Data preparation

For finetuning task, we first need to run the following scripts to prepare the sueprvised finetuning data.

cd sst2
python build_dataset.py
cd ../gsm8k
python build_dataset.py
cd ../ag_news
python build_dataset.py
cd ..

Huggingface Llama2 access

Llama2-7B is a gated repo, which need a formal request to get access to the model. Check out https://huggingface.co/meta-llama/Llama-2-7b-hf. After applying permission from meta, you should be able to access the model, but you first need to enter your token in the file huggingface_token.txt.

Example command to run

We prepare scripts for re-producing all the experiments in the paper.

This script takes the alignment dataset BeaverTail as an example. We first run the data selection script to filter a high-quality and safety-critical core subset from the original dataset.

cd script/data_selection
bash  train_selector_dataset_bt_all.sh

Then, we perform the alignment on the core subset.

cd script/alignment
bash  sft_dataset_bt_all.sh

Then we finetune the model using 10% of harmful data with a total number of 1000 samples from GSM8K dataset.

cd script/finetune
bash sft_ep_dataset_bt_gsm8k.sh
cd ../..

Citation

If you feel our project is useful, you may cite our paper with the following bibtex.

@misc{liu2025pharmacistsafetyalignmentdata,
      title={Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning}, 
      author={Guozhi Liu and Qi Mu and Tiansheng Huang and Xinhua Wang and Li Shen and Weiwei Lin and Zhang Li},
      year={2025},
      eprint={2510.10085},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2510.10085}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
agnews		agnews
alpaca		alpaca
data		data
gsm8k		gsm8k
images		images
loss_func		loss_func
models		models
poison/evaluation		poison/evaluation
script		script
selector		selector
sst2		sst2
svamp		svamp
README.md		README.md
environment.yml		environment.yml
huggingface_token.txt		huggingface_token.txt
pharmacist.yml		pharmacist.yml
pharmacist_pip.txt		pharmacist_pip.txt
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Attention

Package requirement

Data preparation

Huggingface Llama2 access

Example command to run

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Attention

Package requirement

Data preparation

Huggingface Llama2 access

Example command to run

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages