v0.4.2
What's Changed
- Fix SkippingSampler bug by @YurongYou in #621
- feat: tests combined to script by @lfengad in #622
- [vla] support cosmos-policy by @fwd4 in #617
- fix: lint for docs also (compatible with i4 integration) by @lfengad in #623
- Support Qwen3-5 SFT by @kane-vln in #607
- Rename check_transformers_version to is_transformers_version_compatible by @kane-vln in #624
- Saving ckpt when receiving signals by @foreverlms in #601
- Update Wan2pt2 server config & client example by @Dinghow in #629
- [vla] support robotwin env setup and test in ci by @fwd4 in #627
- Support multi-reward training for diffusion RL by @Dinghow in #630
- Support export_safetensors for diffusion models by @Dinghow in #626
- feat: gb200 container setup by @lfengad in #633
- feat: Slurm more option for mount by @lfengad in #632
- [vla] fix pi05 compatibility issues on libero by @littlespray in #625
- Fix: pack visual_pos_masks for qwen3_vl_moe when seq_pack enabled by @kane-vln in #635
- Enable mixed precision training for diffusion RL by @Dinghow in #634
- fix: docs and check for profiler by @lfengad in #637
- fix: RL part resume epoch setting with tests added by @lfengad in #636
- Support Sequence Packing for Qwen3.5 by @kane-vln in #639
- fix: RL version compatible basically runnable for vllm 0.17 by @lfengad in #641
- Add tutorial for diffusion SFT & RL by @Dinghow in #644
- Add e2e test for diffusion RL by @Dinghow in #643
- Sync changes from nemotron branch by @jcao-ai in #645
- fix: slurm stability ehancement by @lfengad in #646
- fix: support _StridedShard DTensor placements for weight sync by @kane-vln in #650
- Add support for FA3 from internal flash_attn_3_nv. Also fix the flash_attn_varlen_func for FA3 by @yufanhuangNV in #648
- Set attention implementation to flash_attention_2 by default for HFModel by @kane-vln in #651
- fix: compatible with flash-attn-3 tuple output by @lfengad in #653
- Support validation for remote reward by @Dinghow in #655
- Fix SFT checkpointing barrier for multi-replica by @Dinghow in #657
- [vla] support maniskill env by @fwd4 in #656
- Fix: relax import assert by @lfengad in #658
- feat: Hook for ckpt handling by @lfengad in #659
- Sync changes from dev/nemotron by @jcao-ai in #660
- Support batched remote reward computation by @Dinghow in #661
- Add compute_default_rope_parameters for default RoPE when using Transformers ≥ 5.0 by @kane-vln in #654
- Support GRPO for Qwen3.5 by @kane-vln in #647
- Sync changes from VLM by @jcao-ai in #662
Full Changelog: v0.4.1...v0.4.2