fix：fix bugs in cosine reward of GRPO #3358

youyc22 · 2025-03-03T13:39:09Z

PR type

[√] Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

use tokenizer to calculate output length
使用模型分词器计算长度

Experiment results

Paste your experiment result here(if needed).

hjh0119 · 2025-03-03T14:57:50Z

thanks! LGTM

…m_mp2 * commit '7fd5e12b6e87b77e140ea93bd2938f754e3a9504': Support vllm LLMEngine (modelscope#3370) fix swift app format (modelscope#3367) add grpo openr1 multimodal experiment (modelscope#3368) update docs (modelscope#3365) Support the <video> token for Ovis2 models (modelscope#3364) update docs (modelscope#3349) Remove entry including invalid `ROADMAP` link from English & Chinese documentation (modelscope#3357) fix：fix bugs in cosine reward of GRPO (modelscope#3358) support phi4-multimodal (modelscope#3350) fix max_memory (modelscope#3347) # Conflicts: # swift/llm/infer/infer_engine/utils.py

youyc22 added 2 commits March 3, 2025 21:36

Update grpo_trainer.py

8fecfbe

Update orm.py

9554a20

youyc22 mentioned this pull request Mar 3, 2025

cosine reward in GRPO is inconsistent with expectations #3353

Closed

hjh0119 approved these changes Mar 3, 2025

View reviewed changes

hjh0119 merged commit 70a8cb8 into modelscope:main Mar 3, 2025

Jintao-Huang linked an issue Mar 4, 2025 that may be closed by this pull request

cosine reward in GRPO is inconsistent with expectations #3353

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix：fix bugs in cosine reward of GRPO #3358

fix：fix bugs in cosine reward of GRPO #3358

Uh oh!

youyc22 commented Mar 3, 2025 •

edited

Loading

Uh oh!

hjh0119 commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix：fix bugs in cosine reward of GRPO #3358

fix：fix bugs in cosine reward of GRPO #3358

Uh oh!

Conversation

youyc22 commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

hjh0119 commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

youyc22 commented Mar 3, 2025 •

edited

Loading