-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Hi guys,
I use 8 * A100 80g to implement qwen25vl 72B grpo training. But when starting, the whole process hang in the begining.
Do you have any ideas to solve this problem?
Or any best practice about using qwen25vl 72B into grpo training?
Here is my sh command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8
MAX_PIXELS=640000
swift rlhf
--rlhf_type grpo
--model /mnt2/models/Qwen__Qwen2.5-VL-72B-Instruct
--train_type lora
--dataset /ossfs/workspace/data_process/xx.json
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--eval_steps 1000
--save_steps 2000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 1
--output_dir /mnt2/xx
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 2048
--external_plugins examples/train/grpo/plugin/plugin.py
--reward_funcs external_ui_acc uiformat
--num_generations 4
--use_vllm true
--vllm_gpu_memory_utilization 0.3
--vllm_max_model_len 2048
--deepspeed zero3_offload
--temperature 1.1
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 8
--tensor_parallel_size 8
--async_generate false
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--move_model_batches 16
--sleep_level 1
--report_to swanlab \
Here are my related libraries:
vllm 0.7.3
trl 0.16.0.dev0
transformers 4.49.0
torch 2.5.1+cu121
peft 0.14.0
