Skip to content

oscillating reward with the default qwen3_1_7b.yaml recipe #560

@sfc-gh-kganesan

Description

@sfc-gh-kganesan

🐛 Describe the bug

Hi Team,
Great work with torchforge!
I am trying to get a decent reward graph for gs8k recipe and running into a reward oscillation issue.
Image

python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
using the default recipe the reward is oscillating it would be great if the hyperparameters used or any other setup related info that needs to be taken care can be provided

Versions

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions