Our code is mainly based on verl. To prepare the environment, please follow these steps:
conda create -n delta python==3.12
conda activate delta
pip install torch==2.9.1
pip install flash_attn==2.8.3
pip install sglang==0.5.6
cd verl-DelTA
pip install -e.
pip install math-verifyWe provide an example for DelTA training in the script verl-DelTA/recipe/dapo/srcs/run_DelTA.sh.
If you find our work helpful, please kindly cite as
@misc{zhang2026deltadiscriminativetokencredit,
title={DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards},
author={Kaiyi Zhang and Wei Wu and Yankai Lin},
year={2026},
eprint={2605.21467},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.21467},
}