Skip to content

vllm rollout generates incorrect tokens during gkd #6478

@mccatec

Description

@mccatec

Describe the bug

Image

The student model always generates '!' during rollout.

I tested the same prompt with swift infer; the output seems normal.

Image

Your hardware and system info
CUDA: 12.4.1
system: Centos 7 kernel 5.10.0
GPU: H20 141G
torch: 2.8.0

Additional context
training script:
# CUDA_VISIBLE_DEVICES=0 swift rollout --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct IMAGE_MAX_TOKEN_NUM=1536 NPROC_PER_NODE=4 PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' CUDA_VISIBLE_DEVICES=1,2,3,4 swift rlhf --rlhf_type gkd --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct --teacher_model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-32B-Instruct --system system.txt --train_type full --freeze_vit true --freeze_aligner true --dataset data/shop_dinein.jsonl --seq_kd false --lmbda 1 --beta 1 --torch_dtype bfloat16 --num_train_epochs 2 --per_device_train_batch_size 4 --learning_rate 1e-5 --gradient_accumulation_steps 1 --save_steps 250 --save_total_limit 2 --logging_steps 1 --max_length 9000 --max_completion_length 4096 --output_dir output/qwen3vl-4b --warmup_ratio 0.05 --save_only_model true --dataloader_num_workers 8 --dataset_num_proc 2 --deepspeed zero2 --teacher_deepspeed zero3 --attn_impl flash_attn --use_vllm true --vllm_mode server --vllm_server_host 127.0.0.1 --vllm_server_port 8000

infer test script:
NPROC_PER_NODE=1 CUDA_VISIBLE_DEVICES=0 swift infer --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct --infer_backend vllm --val_dataset data/shop_dinein.jsonl --vllm_gpu_memory_utilization 0.9 --vllm_max_model_len 9000 --max_new_tokens 2048 --write_batch_size 2

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions