v0.3.7
What's Changed
- fix dpsk hf convert by @yy-code-nv in #415
- Fix: pass norm_topk_prob for qwen3_vl_moe and intern_vl by @kane-vln in #414
- Fix named buffer init by @yy-code-nv in #411
- Fix moe implementation by @yy-code-nv in #416
- Customize build_model with extra hf_config_args by @kane-vln in #418
- RFC: refactor trainer for better customization by @foreverlms in #412
- rfc: colocated mode by @lfengad in #413
- feat: full custom case example with readme by @lfengad in #421
- Add hooks for SFT validation by @foreverlms in #419
- feat: Refine for custom example by @lfengad in #422
- Update deepseek weight mapping for GRPO(vllm >= 0.10.0) by @kane-vln in #410
- feat: update non-text rollout cases interface handling. by @lfengad in #425
- feat: unbiased kl estimate by @xlu451 in #423
- feat: control of weight version in DAPO case. by @lfengad in #426
- fix: data type specification fixed and refine by @lfengad in #429
- fix qwen3 moe weight exporting by @foreverlms in #428
- fix(controller): guard zero division when no policy replicas registered by @xlu451 in #435
Full Changelog: v0.3.6...v0.3.7