Skip to content

Conversation

hjh0119
Copy link
Collaborator

@hjh0119 hjh0119 commented Feb 12, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

support cosine reward and repetition penalty reward in https://arxiv.org/abs/2502.03373

Experiment results

Paste your experiment result here(if needed).

@hjh0119 hjh0119 changed the title add Cosine reward funtion for GRPO cosine reward for GRPO Feb 12, 2025
@hjh0119 hjh0119 changed the title cosine reward for GRPO cosine and repetition reward for GRPO Feb 12, 2025
@hjh0119 hjh0119 merged commit 4671df7 into modelscope:main Feb 13, 2025
1 of 2 checks passed
@hjh0119 hjh0119 deleted the consine_reward branch February 13, 2025 11:54
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Feb 13, 2025
…soth_fast_grpo

* commit '69cd9fdc135e1b7ae47357fc78d66aa15e426e74':
  fix get_device (modelscope#3097)
  cosine and repetition reward for GRPO (modelscope#3079)
  Feat: Eval custom dataset (modelscope#3093)
  support grpo vllm lora (modelscope#3095)
  fix grpo temperature 0.7->0.9 (modelscope#3091)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants