Skip to content

Conversation

youyc22
Copy link
Contributor

@youyc22 youyc22 commented Mar 3, 2025

PR type

  • [√] Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

use tokenizer to calculate output length
使用模型分词器计算长度

Experiment results

Paste your experiment result here(if needed).

@hjh0119
Copy link
Collaborator

hjh0119 commented Mar 3, 2025

thanks! LGTM

@hjh0119 hjh0119 merged commit 70a8cb8 into modelscope:main Mar 3, 2025
@Jintao-Huang Jintao-Huang linked an issue Mar 4, 2025 that may be closed by this pull request
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Mar 4, 2025
…m_mp2

* commit '7fd5e12b6e87b77e140ea93bd2938f754e3a9504':
  Support vllm LLMEngine (modelscope#3370)
  fix swift app format (modelscope#3367)
  add grpo openr1 multimodal experiment (modelscope#3368)
  update docs (modelscope#3365)
  Support the <video> token for Ovis2 models (modelscope#3364)
  update docs (modelscope#3349)
  Remove entry including invalid `ROADMAP` link from English & Chinese documentation  (modelscope#3357)
  fix:fix bugs in cosine reward of GRPO (modelscope#3358)
  support phi4-multimodal (modelscope#3350)
  fix max_memory (modelscope#3347)

# Conflicts:
#	swift/llm/infer/infer_engine/utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cosine reward in GRPO is inconsistent with expectations

2 participants