Release v0.4.0 · nvidia-cosmos/cosmos-rl

What's Changed

Support DiffusionNFT by @Dinghow in #465
Fix OOM when prompt is very long. by @foreverlms in #498
Fix qwen3_vl not supported in rollout. by @foreverlms in #501
Always init distribution. by @foreverlms in #500
few fix for nemotron-nano-v3 MoE support by @jcao-ai in #502
Fix training hang when resuming from ckpt. by @foreverlms in #503
fix: Resume reproduceable of shuffle case by @lfengad in #504
support MoE aux-free loadbalancing by @jcao-ai in #506
Remove a lot of log of loading weight by @foreverlms in #507
feat: Mesh aware dispatch for avla requirement by @lfengad in #497
Fix: control of eps by @littlespray in #511
fix: Add config check and force. by @lfengad in #512
Fix: step control for sft-trainer by @littlespray in #510
fix: simple fix for rollout mesh info set by @lfengad in #514
[vla] support pi05 on b1k by @fwd4 in #515
Support diffusers-based Cosmos-Predict2.5 by @Dinghow in #499
fix: resume sft train step missing fix and max_keep -1 support by @lfengad in #517
Update doc for diffusers-based Cosmos-Predict2.5 by @Dinghow in #518
Update docs for DDRL by @Dinghow in #521
fix: pin transformers < 5.0.0 by @lfengad in #519
Enable no upper limit retry for fetching prompts by default. by @foreverlms in #516
[vla] add doc by @fwd4 in #520
feat: support sft multiple replica control by @lfengad in #509
Make vLLM as an optional dependency by @foreverlms in #525
Optimize reward service by @Dinghow in #527
fix: ci flaky 20260130 by @xlu451 in #529
Add data packer factory mode by @YurongYou in #528
Update batch_sampler factory API by @YurongYou in #524
Colocated but separated processes for Policy and Rollout by @foreverlms in #508
fix: Enhance prompt fetch efficiency by @lfengad in #532
Fix pickscore by @Dinghow in #536
feat: support batch group reward calc by @lfengad in #538
Update the remote reward for DDRL by @Dinghow in #539
Raise FA not installed error in runtime, not initialized time. by @foreverlms in #540
Update DDRL configs by @Dinghow in #541
feat: qwen3-vl user define version with i4 vlm dataloader trainable by @lfengad in #533
fix: sft-trainer weight upload at final step by @littlespray in #542
Dynamic enqueue timeout for remote reward by @Dinghow in #545
Support non-euqal worldsize for policy and rollout in colocated-separated mode by @foreverlms in #544
Check max_keep across all previous runs; log best ckpt across runs by @YurongYou in #546
Revert changes in config from_dict function by @YurongYou in #549
feat: support pi05 sft by @littlespray in #488
fix: Fix laucnh multi node by @lfengad in #553
Support moe aux_loss by @kane-vln in #543
Robust ckpt saving and loading by @YurongYou in #550
Fix: checkpoint manager init error in datafetcher by @YurongYou in #557
Support async ckpt delete in checkpoint manager by @YurongYou in #555
Fix: SFT failed to start if no ckpt to resume by @YurongYou in #556

New Contributors

@littlespray made their first contribution in #511
@YurongYou made their first contribution in #528

Full Changelog: v0.3.9...v0.4.0

What's Changed

Support DiffusionNFT by @Dinghow in #465
Fix OOM when prompt is very long. by @foreverlms in #498
Fix qwen3_vl not supported in rollout. by @foreverlms in #501
Always init distribution. by @foreverlms in #500
few fix for nemotron-nano-v3 MoE support by @jcao-ai in #502
Fix training hang when resuming from ckpt. by @foreverlms in #503
fix: Resume reproduceable of shuffle case by @lfengad in #504
support MoE aux-free loadbalancing by @jcao-ai in #506
Remove a lot of log of loading weight by @foreverlms in #507
feat: Mesh aware dispatch for avla requirement by @lfengad in #497
Fix: control of eps by @littlespray in #511
fix: Add config check and force. by @lfengad in #512
Fix: step control for sft-trainer by @littlespray in #510
fix: simple fix for rollout mesh info set by @lfengad in #514
[vla] support pi05 on b1k by @fwd4 in #515
Support diffusers-based Cosmos-Predict2.5 by @Dinghow in #499
fix: resume sft train step missing fix and max_keep -1 support by @lfengad in #517
Update doc for diffusers-based Cosmos-Predict2.5 by @Dinghow in #518
Update docs for DDRL by @Dinghow in #521
fix: pin transformers < 5.0.0 by @lfengad in #519
Enable no upper limit retry for fetching prompts by default. by @foreverlms in #516
[vla] add doc by @fwd4 in #520
feat: support sft multiple replica control by @lfengad in #509
Make vLLM as an optional dependency by @foreverlms in #525
Optimize reward service by @Dinghow in #527
fix: ci flaky 20260130 by @xlu451 in #529
Add data packer factory mode by @YurongYou in #528
Update batch_sampler factory API by @YurongYou in #524
Colocated but separated processes for Policy and Rollout by @foreverlms in #508
fix: Enhance prompt fetch efficiency by @lfengad in #532
Fix pickscore by @Dinghow in #536
feat: support batch group reward calc by @lfengad in #538
Update the remote reward for DDRL by @Dinghow in #539
Raise FA not installed error in runtime, not initialized time. by @foreverlms in #540
Update DDRL configs by @Dinghow in #541
feat: qwen3-vl user define version with i4 vlm dataloader trainable by @lfengad in #533
fix: sft-trainer weight upload at final step by @littlespray in #542
Dynamic enqueue timeout for remote reward by @Dinghow in #545
Support non-euqal worldsize for policy and rollout in colocated-separated mode by @foreverlms in #544
Check max_keep across all previous runs; log best ckpt across runs by @YurongYou in #546
Revert changes in config from_dict function by @YurongYou in #549
feat: support pi05 sft by @littlespray in #488
fix: Fix laucnh multi node by @lfengad in #553
Support moe aux_loss by @kane-vln in #543
Robust ckpt saving and loading by @YurongYou in #550
Fix: checkpoint manager init error in datafetcher by @YurongYou in #557
Support async ckpt delete in checkpoint manager by @YurongYou in #555
Fix: SFT failed to start if no ckpt to resume by @YurongYou in #556
Update pyproject.toml by @lfengad in #558

New Contributors

@littlespray made their first contribution in #511
@YurongYou made their first contribution in #528

Full Changelog: v0.3.9...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

What's Changed

New Contributors

Contributors

Uh oh!