Qwen25VL 72B GRPO training (lora) would hang for no reason.

Hi guys,

I use 8 * A100 80g to implement qwen25vl 72B grpo training. But when starting, the whole process hang in the begining. 

![Image](https://github.com/user-attachments/assets/bd8992be-f65b-45e4-93ff-16f28a1ef660)

Do you have any ideas to solve this problem? 

Or any best practice about using qwen25vl 72B into grpo training?


Here is my sh command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
MAX_PIXELS=640000 \
swift rlhf \
    --rlhf_type grpo \
    --model /mnt2/models/Qwen__Qwen2.5-VL-72B-Instruct \
    --train_type lora \
    --dataset /ossfs/workspace/data_process/xx.json \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --max_length 2048 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --eval_steps 1000 \
    --save_steps 2000 \
    --learning_rate 1e-6 \
    --save_total_limit 2 \
    --logging_steps 1 \
    --output_dir /mnt2/xx \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --max_completion_length 2048 \
    --external_plugins examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_ui_acc uiformat \
    --num_generations 4 \
    --use_vllm true \
    --vllm_gpu_memory_utilization 0.3 \
    --vllm_max_model_len 2048 \
    --deepspeed zero3_offload \
    --temperature 1.1 \
    --top_p 1.0 \
    --top_k 80 \
    --log_completions true \
    --num_infer_workers 8 \
    --tensor_parallel_size 8 \
    --async_generate false \
    --offload_optimizer true \
    --offload_model true \
    --gc_collect_after_offload true \
    --move_model_batches 16 \
    --sleep_level 1 \
    --report_to swanlab \

Here are my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
peft 0.14.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions