Release v0.4.1 · nvidia-cosmos/cosmos-rl

What's Changed

Upstream core change sync from vlm project by @jcao-ai in #554
Fix potential teacher model weight loading error. by @foreverlms in #560
Fix checkpoint offload and loading to CPU by @YurongYou in #562
Support uncentralized mode for colocated training by @Dinghow in #561
fix: Fix HF model tie case handling by @lfengad in #565
Support set primary adapter in multi-lora by @Dinghow in #566
fix upload issue for sft trainer by @littlespray in #564
feat: save safetensor async by @lfengad in #569
fix: Refine custom loader usage by @lfengad in #571
Add micro_batch & mini_batch for Diffusion RL by @Dinghow in #570
Support Load-Balanced Dynamic Batching by @kane-vln in #522
slurm launcher - improve the discovery of cosmos_rl path by @vinjn in #573
Fix WFM remote reward async mode by @Dinghow in #574
[vla] support robotwin by @fwd4 in #575
More metrics for Diffusion RL by @Dinghow in #572
fix: try simplify for large scale communication by @lfengad in #576
Feat/refactor pi05 by @littlespray in #563
fix: more option for reduce on cpu by @lfengad in #579
Fix DiffusionNFT beta config by @Dinghow in #581
Fix the image/video range for reward computing by @Dinghow in #580
Feat: support comet dataset for pi05 by @littlespray in #567
Remove unused metrics for Diffusion RL by @Dinghow in #582
Update controller to support datapacker factory mode by @YurongYou in #583
feat: dataloader broadcast for non dp ranks by @lfengad in #585
fix: resume issue regarding validation by @lfengad in #587
fix: separate_model_parts in HF-Model by @jcao-ai in #588
Fixes for multi-lr optimizer and model resuming by @jcao-ai in #590
Remove dp_replicate from dp_cp_tp mesh dimension names by @jcao-ai in #591
Added --slurm-job-time argument in dispatch_job.py by @vinjn in #589
fix moe weight loading by @jcao-ai in #593
fix moe weight loading by @jcao-ai in #594
fix: accurate weight version control & cumem of nccl default off by @lfengad in #592
Reuse max_num_steps for load_balanced_max_steps by @kane-vln in #595
fix: check ckpt loaded info for extra info validation by @lfengad in #596
add qwen3-vl merger supprot by @jcao-ai in #598
fix: Validation hang when n_generation > 1 by @lfengad in #597
fix: Ignore version control when policy scaling by @lfengad in #599
Add WAN2.2 support for reward service by @yufanhuangNV in #602
fix: untable issue for nccl group usage by @lfengad in #603
Support SANA for DiffusionNFT by @Dinghow in #586
Fix reward media_type & WFM local reward service issue by @Dinghow in #604
Fuse TP/EP with DP for non-moe layer by @kane-vln in #600
Support batched GenEval reward by @Dinghow in #605
Add Qwen3VL merged visual forward and FSDP pure-text dummy pass by @Xuanmeng-Zhang in #606
Support UnifiedReward by @Dinghow in #608
Update slurm launch script by @YurongYou in #578
Add LR scheduler for diffusion RL by @Dinghow in #610
Correct DDRL preset configs & Add docs for parallelism by @Dinghow in #613
Fix the media_type for video latency benchmark by @Dinghow in #612
Fix bugs on batch sampler by @YurongYou in #609
Add example for SANA post-training by @Dinghow in #611
fix: RL batch sampler epoch set by @lfengad in #614
fix: Automatically import models in policy.model by @lfengad in #615
Sync updates from VLM branch by @jcao-ai in #616
Fix the configuration page import error by @Dinghow in #618
feat: on-policy distill slurm supoort and doc by @lfengad in #619
fix: upgrade lint ruff to 0.12.7 in pre-commit for compatible with i4 by @lfengad in #620

New Contributors

@Xuanmeng-Zhang made their first contribution in #606

Full Changelog: v0.4.0...v0.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!