🐛 Describe the bug
Hi Team,
Great work with torchforge!
I am trying to get a decent reward graph for gs8k recipe and running into a reward oscillation issue.

python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
using the default recipe the reward is oscillating it would be great if the hyperparameters used or any other setup related info that needs to be taken care can be provided
Versions
No response