Skip to content

Qwen25VL 72B GRPO training (lora) would hang for no reason. #3592

@sys-reasoner

Description

@sys-reasoner

Hi guys,

I use 8 * A100 80g to implement qwen25vl 72B grpo training. But when starting, the whole process hang in the begining.

Image

Do you have any ideas to solve this problem?

Or any best practice about using qwen25vl 72B into grpo training?

Here is my sh command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
MAX_PIXELS=640000
swift rlhf
--rlhf_type grpo
--model /mnt2/models/Qwen__Qwen2.5-VL-72B-Instruct
--train_type lora
--dataset /ossfs/workspace/data_process/xx.json
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--eval_steps 1000
--save_steps 2000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 1
--output_dir /mnt2/xx
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--external_plugins examples/train/grpo/plugin/plugin.py
--reward_funcs external_ui_acc uiformat
--num_generations 4
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--vllm_max_model_len 2048
--deepspeed zero3_offload
--temperature 1.1
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 8
--async_generate false
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--move_model_batches 16
--sleep_level 1
--report_to swanlab \

Here are my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
peft 0.14.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions