Official code for Context-Aware Hierarchical Learning (CAHL): A Two-Step Paradigm towards Safer LLMs.
CAHL implements a safety training and evaluation pipeline for tool-use LLM scenarios, with three model variants:
- StruQ
- ISE
- CAHL
The code is mainly used for:
- Supervised fine-tuning on TCA datasets
- Inference-time evaluation on benign and attack test sets
- Attack-related metric reporting, with optional AlpacaEval capability evaluation
The tca/ includes Tool-Completion Attack and benchmark components:
The src/ folder contains additional experimental code for attacks, training, and model on StruQ baseline pipeline.
Python 3.10+ and CUDA are recommended.
cd tca
pip install -r requirements.txtNotes:
- The training pipeline uses
torch+transformers+trl
Example (tca pipeline):
cd tca/train
python train_tool.py --cfg cahl.yamlExample (src pipeline):
cd src
python train/train.py train/training2.yamlEvaluation entry: tca/test/tcb_test.py.
Edit tca/test/tcb_test.yaml and set at least:
model_name_or_path: path to the model to evaluateresult_path: output directory for evaluation resultsmodel_type:struq/ise/cahl
Run:
cd tca/test
python tcb_test.py --cfg tcb_test.yamlExample (src attack/evaluation script):
cd src
python attack/test_ICAseqQformer.py \
-m <MODEL_PATH> \
-a none completion_real \This script supports multiple attack modes (e.g., none, naive, ignore, escape_separation, completion_real).
@inproceedings{ma2025contextaware,
title={Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer {LLM}s},
author={Tengyun Ma and Jiaqi Yao and Daojing He and Shihao Peng and YU LI and Shaohui Liu and Zhuotao Tian},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://arxiv.org/abs/2512.03720}
}