v0.2.7
What's Changed
- update: always compute and report
grad_normby @jcao-ai in #193 - feat: add fullly on-policy support for GRPO by @jcao-ai in #196
- VAPO: Positive example lm loss by @xlu451 in #195
- Bump python version to 3.12 in Docker by @bddppq in #198
- minor update: rename
fully_on_policy->on_policyby @jcao-ai in #200 - feat: support hf model revision by @jcao-ai in #199
- print env preset for users by @foreverlms in #201
- optimize: GRPO logits memory by @jcao-ai in #203
- fix: Fix minor issue in slurm launch by @lfengad in #205
- [Under Test] support dp balanced token-mean GRPO loss by @jcao-ai in #204
- feat: support lr decay scheduler by @jcao-ai in #206
Full Changelog: v0.2.6...v0.2.7