we propose SelfCI, a complementary self-distillation framework that decouples information suppression from task resolution for Contextual Integrity in LLMs. This directory contains the training and evaluation scripts for our SelfCI and the GRPO baseline CI-RL.
- Python
>=3.12 - A CUDA-capable Linux environment suitable for
flash-attnandvllm uvfor dependency management
Install dependencies:
uv syncCreate a local environment file:
cp .env.example .envUpdate .env with the credentials you need:
HF_TOKENfor gated Hugging Face models or datasetsWANDB_API_KEYand optionallyWANDB_PROJECTfor Weights & Biases logging
All launchers load .env automatically.
This section covers the full SelfCI workflow: generating feedback for a pre-existing dataset, training the self-distilled model, and running the CI-RL (GRPO) baseline.
generate_feedback_all.py is a crucial preprocessing step for SelfCI. It generates allowed_feedbacks and disallowed_feedbacks for a pre-existing contextual-integrity dataset, then saves the enriched dataset locally or pushes it to the Hugging Face Hub.
Because SelfCI is a self-distillation setup, the model used here should match the model used later in SelfCI training.
Basic usage:
uv run python generate_feedback_all.py \
--model_path Qwen/Qwen2.5-7B-InstructThe output dataset is augmented with allowed_feedbacks and disallowed_feedbacks, which are required by the SelfCI training pipeline.
run_selfci.sh launches the SelfCI training pipeline with dual feedback (allowed and disallowed feedback branches) and a weighted KL objective.
Basic usage:
./run_selfci.sh
./run_selfci.sh Qwen/Qwen2.5-7B-Instruct Qwen/Qwen2.5-7B-Instruct --epochs 2Common overrides:
GPU_ID=0 \
DATASET_PATH=/path/to/feedback_augmented_dataset_or_hf_repo \
./run_selfci.sh \
--backward-weight 0.5 \
--forward-weight 0.5 \
--teacher-update ema \
--epochs 2Notes:
DATASET_PATHto override the feedbask-augmented dataset repo or local pathBACKWARD_WEIGHTandFORWARD_WEIGHTto control the weights of KL divergence from each teacher.- By default, training outputs are written under
./output/.
For the complete interface:
./run_selfci.sh --helprun_grpo.sh launches CI-RL (GRPO), one of the baselines used for comparison.
Basic usage:
./run_grpo.sh
./run_grpo.sh meta-llama/Llama-3.1-8B-Instruct --epochs 2Common overrides:
GPU_ID=0 \
./run_grpo.sh \
Qwen/Qwen2.5-7B-Instruct \
--batch-size 8 \
--num-generations 8 \
--mini-batch-size 64 \
--epochs 2For the complete interface:
./run_grpo.sh --helprun_test.sh evaluates a model or checkpoint on the contextual-integrity test split.
Basic usage:
./run_test.sh /path/to/model_or_checkpoint
./run_test.sh Qwen/Qwen2.5-7B-Instruct --num-runs 3 --output-dir ./output/eval_qwenNotes:
GPU_IDselects the CUDA device used for evaluation.- If the target path contains a LoRA adapter, the evaluator can merge it into a temporary model directory before running test.
For all evaluation options:
./run_test.sh /path/to/model_or_checkpoint --helpIf you find our work useful, please consider citing:
@article{
title={It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs},
author={Park, Sangwoo and Yeo, Woongyeong and Lee, Seanie and Choi, Yumin and Lee, Hyomin and Kim, Kangsan and Baek, Jinheon and Oh, Seong Joon and Hwang, Sung Ju},
journal={arXiv preprint arXiv:2605.20258},
year={2026}
}
