Release v0.4.2 · nvidia-cosmos/cosmos-rl

What's Changed

Fix SkippingSampler bug by @YurongYou in #621
feat: tests combined to script by @lfengad in #622
[vla] support cosmos-policy by @fwd4 in #617
fix: lint for docs also (compatible with i4 integration) by @lfengad in #623
Support Qwen3-5 SFT by @kane-vln in #607
Rename check_transformers_version to is_transformers_version_compatible by @kane-vln in #624
Saving ckpt when receiving signals by @foreverlms in #601
Update Wan2pt2 server config & client example by @Dinghow in #629
[vla] support robotwin env setup and test in ci by @fwd4 in #627
Support multi-reward training for diffusion RL by @Dinghow in #630
Support export_safetensors for diffusion models by @Dinghow in #626
feat: gb200 container setup by @lfengad in #633
feat: Slurm more option for mount by @lfengad in #632
[vla] fix pi05 compatibility issues on libero by @littlespray in #625
Fix: pack visual_pos_masks for qwen3_vl_moe when seq_pack enabled by @kane-vln in #635
Enable mixed precision training for diffusion RL by @Dinghow in #634
fix: docs and check for profiler by @lfengad in #637
fix: RL part resume epoch setting with tests added by @lfengad in #636
Support Sequence Packing for Qwen3.5 by @kane-vln in #639
fix: RL version compatible basically runnable for vllm 0.17 by @lfengad in #641
Add tutorial for diffusion SFT & RL by @Dinghow in #644
Add e2e test for diffusion RL by @Dinghow in #643
Sync changes from nemotron branch by @jcao-ai in #645
fix: slurm stability ehancement by @lfengad in #646
fix: support _StridedShard DTensor placements for weight sync by @kane-vln in #650
Add support for FA3 from internal flash_attn_3_nv. Also fix the flash_attn_varlen_func for FA3 by @yufanhuangNV in #648
Set attention implementation to flash_attention_2 by default for HFModel by @kane-vln in #651
fix: compatible with flash-attn-3 tuple output by @lfengad in #653
Support validation for remote reward by @Dinghow in #655
Fix SFT checkpointing barrier for multi-replica by @Dinghow in #657
[vla] support maniskill env by @fwd4 in #656
Fix: relax import assert by @lfengad in #658
feat: Hook for ckpt handling by @lfengad in #659
Sync changes from dev/nemotron by @jcao-ai in #660
Support batched remote reward computation by @Dinghow in #661
Add compute_default_rope_parameters for default RoPE when using Transformers ≥ 5.0 by @kane-vln in #654
Support GRPO for Qwen3.5 by @kane-vln in #647
Sync changes from VLM by @jcao-ai in #662

Full Changelog: v0.4.1...v0.4.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!