fix grpo nan #3075

Jintao-Huang · 2025-02-12T04:52:02Z

No description provided.

yangqiancheng-yuan · 2025-02-12T06:17:05Z

The nan problem still exists for internvl-2.5 8b when finetuning with the GRPO. Here is the training scripts: nproc_per_node=7

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=$nproc_per_node
swift rlhf
--rlhf_type grpo
--model InternVL2_5-8B
--reward_funcs accuracy format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.7
--vllm_max_model_len 8192
--train_type full
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 200
--save_steps 200
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir output
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 7
--temperature 0.7
--system 'examples/train/grpo/prompt.txt'
--deepspeed zero2
--sequence_parallel_size 1 \

The model is trained on 8*A100.

could you please help with the problem.@Jintao-Huang

fix grpo nan

e091386

hjh0119 approved these changes Feb 12, 2025

View reviewed changes

Jintao-Huang linked an issue Feb 12, 2025 that may be closed by this pull request

GRPO potential bug #3069

Closed

Jintao-Huang merged commit 48a59cb into modelscope:main Feb 12, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix grpo nan #3075

fix grpo nan #3075

Uh oh!

Jintao-Huang commented Feb 12, 2025

Uh oh!

Uh oh!

yangqiancheng-yuan commented Feb 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix grpo nan #3075

fix grpo nan #3075

Uh oh!

Conversation

Jintao-Huang commented Feb 12, 2025

Uh oh!

Uh oh!

yangqiancheng-yuan commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yangqiancheng-yuan commented Feb 12, 2025 •

edited

Loading