v0.2.7

jcao-ai released this 08 Aug 10:26

· 370 commits to main since this release

b22643a

What's Changed

update: always compute and report grad_norm by @jcao-ai in #193
feat: add fullly on-policy support for GRPO by @jcao-ai in #196
VAPO: Positive example lm loss by @xlu451 in #195
Bump python version to 3.12 in Docker by @bddppq in #198
minor update: rename fully_on_policy->on_policy by @jcao-ai in #200
feat: support hf model revision by @jcao-ai in #199
print env preset for users by @foreverlms in #201
optimize: GRPO logits memory by @jcao-ai in #203
fix: Fix minor issue in slurm launch by @lfengad in #205
[Under Test] support dp balanced token-mean GRPO loss by @jcao-ai in #204
feat: support lr decay scheduler by @jcao-ai in #206

Full Changelog: v0.2.6...v0.2.7

Contributors

jcao-ai, bddppq, and 3 other contributors

Assets 2