vllm rollout generates incorrect tokens during gkd

**Describe the bug**

<img width="2734" height="1528" alt="Image" src="https://github.com/user-attachments/assets/d2aad42e-91fe-4f76-a4a4-744163779f4c" />

The student model always generates '!' during rollout.

I tested the same prompt with swift infer; the output seems normal.

<img width="2650" height="510" alt="Image" src="https://github.com/user-attachments/assets/69639737-0f93-433e-8909-5ec2a2ddc4f7" />

**Your hardware and system info**
CUDA: 12.4.1
system: Centos 7 kernel 5.10.0
GPU: H20 141G
torch: 2.8.0


**Additional context**
training script:
`# CUDA_VISIBLE_DEVICES=0 swift rollout --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct
IMAGE_MAX_TOKEN_NUM=1536 
NPROC_PER_NODE=4 
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' 
CUDA_VISIBLE_DEVICES=1,2,3,4 
swift rlhf 
    --rlhf_type gkd 
    --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct 
    --teacher_model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-32B-Instruct 
    --system system.txt 
    --train_type full 
    --freeze_vit true 
    --freeze_aligner true 
    --dataset data/shop_dinein.jsonl 
    --seq_kd false 
    --lmbda 1 
    --beta 1 
    --torch_dtype bfloat16 
    --num_train_epochs 2 
    --per_device_train_batch_size 4 
    --learning_rate 1e-5 
    --gradient_accumulation_steps 1 
    --save_steps 250 
    --save_total_limit 2 
    --logging_steps 1 
    --max_length 9000 
    --max_completion_length 4096 
    --output_dir output/qwen3vl-4b 
    --warmup_ratio 0.05 
    --save_only_model true 
    --dataloader_num_workers 8 
    --dataset_num_proc 2 
    --deepspeed zero2 
    --teacher_deepspeed zero3 
    --attn_impl flash_attn 
    --use_vllm true 
    --vllm_mode server 
    --vllm_server_host 127.0.0.1 
    --vllm_server_port 8000`

infer test script:
`NPROC_PER_NODE=1 
CUDA_VISIBLE_DEVICES=0 
swift infer 
    --model /home/xx/models/huggingface.co/Qwen/Qwen3-VL-4B-Instruct 
    --infer_backend vllm 
    --val_dataset data/shop_dinein.jsonl 
    --vllm_gpu_memory_utilization 0.9 
    --vllm_max_model_len 9000 
    --max_new_tokens 2048 
    --write_batch_size 2`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm rollout generates incorrect tokens during gkd #6478

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm rollout generates incorrect tokens during gkd #6478

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions