Skip to content

Conversation

Jintao-Huang
Copy link
Collaborator

No description provided.

@Jintao-Huang Jintao-Huang linked an issue Feb 12, 2025 that may be closed by this pull request
@Jintao-Huang Jintao-Huang merged commit 48a59cb into modelscope:main Feb 12, 2025
2 checks passed
@yangqiancheng-yuan
Copy link

yangqiancheng-yuan commented Feb 12, 2025

image The nan problem still exists for internvl-2.5 8b when finetuning with the GRPO. Here is the training scripts: nproc_per_node=7

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=$nproc_per_node
swift rlhf
--rlhf_type grpo
--model InternVL2_5-8B
--reward_funcs accuracy format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.7
--vllm_max_model_len 8192
--train_type full
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 200
--save_steps 200
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir output
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 7
--temperature 0.7
--system 'examples/train/grpo/prompt.txt'
--deepspeed zero2
--sequence_parallel_size 1 \

The model is trained on 8*A100.

could you please help with the problem.@Jintao-Huang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GRPO potential bug

3 participants