Release v0.3.7 · nvidia-cosmos/cosmos-rl

What's Changed

fix dpsk hf convert by @yy-code-nv in #415
Fix: pass norm_topk_prob for qwen3_vl_moe and intern_vl by @kane-vln in #414
Fix named buffer init by @yy-code-nv in #411
Fix moe implementation by @yy-code-nv in #416
Customize build_model with extra hf_config_args by @kane-vln in #418
RFC: refactor trainer for better customization by @foreverlms in #412
rfc: colocated mode by @lfengad in #413
feat: full custom case example with readme by @lfengad in #421
Add hooks for SFT validation by @foreverlms in #419
feat: Refine for custom example by @lfengad in #422
Update deepseek weight mapping for GRPO(vllm >= 0.10.0) by @kane-vln in #410
feat: update non-text rollout cases interface handling. by @lfengad in #425
feat: unbiased kl estimate by @xlu451 in #423
feat: control of weight version in DAPO case. by @lfengad in #426
fix: data type specification fixed and refine by @lfengad in #429
fix qwen3 moe weight exporting by @foreverlms in #428
fix(controller): guard zero division when no policy replicas registered by @xlu451 in #435

Full Changelog: v0.3.6...v0.3.7