This repo contains the framework used by K2V to train models. We developed this framework based on verl.
We recommend to use a fresh new conda environment to install verl and its dependencies.
conda create --name verl python=3.11 -y
conda activate verlInstall the necessary dependencies.
git clone https://github.com/superfarther/verl.git
pip install -r requirements_K2V.txtInstall the verl from source.
pip install --no-deps -e .K2V uses vLLM as the inference framework. Notice that vLLM often strictly limit your pytorch version and will directly override your installed pytorch. As a countermeasure, it is recommended to install vLLM first with the pytorch they needed. Overall, we need to ensure that the versions of the following dependencies are consistent with those specified in requirements_K2V.txt.
- torch and torch series
- vLLM
- pyarrow
- tensordict
- nvidia-cudnn-cu12
-
Deploy a judge model using vLLM to verify the model's reasoning process. For example, we can use Qwen2.5-7B-Instruct as the judge model.
CUDA_VISIBLE_DEVICES=4,5,6,7 vllm serve Qwen/Qwen2.5-7B-Instruct--tensor-parallel-size 4 --gpu_memory_utilization 0.7
-
We provide example data, which is stored in the
K2V-example/data. Additionally, a example configuration file is available atK2V-example/config.sh. Before starting the training, you need to fill in the relevant paths in the configuration file.- train_files: Path of training data
- val_files: Path of validation data
- rollout_data_dir: Rollout data generated during training will be saved to this directory.
- validation_data_dir: Validation result will be saved to this directory.
- default_local_dir: Checkpoint will be saved to this directory.
- log_file: Path of log file
- checklist_judge_model_url: Service endpoint for the judge model deployed with vLLM.
-
Start training
bash K2V-example/config.sh