Skip to content

v0.4.2

Choose a tag to compare

@lfengad lfengad released this 31 Mar 12:44
· 32 commits to main since this release
72e9227

What's Changed

  • Fix SkippingSampler bug by @YurongYou in #621
  • feat: tests combined to script by @lfengad in #622
  • [vla] support cosmos-policy by @fwd4 in #617
  • fix: lint for docs also (compatible with i4 integration) by @lfengad in #623
  • Support Qwen3-5 SFT by @kane-vln in #607
  • Rename check_transformers_version to is_transformers_version_compatible by @kane-vln in #624
  • Saving ckpt when receiving signals by @foreverlms in #601
  • Update Wan2pt2 server config & client example by @Dinghow in #629
  • [vla] support robotwin env setup and test in ci by @fwd4 in #627
  • Support multi-reward training for diffusion RL by @Dinghow in #630
  • Support export_safetensors for diffusion models by @Dinghow in #626
  • feat: gb200 container setup by @lfengad in #633
  • feat: Slurm more option for mount by @lfengad in #632
  • [vla] fix pi05 compatibility issues on libero by @littlespray in #625
  • Fix: pack visual_pos_masks for qwen3_vl_moe when seq_pack enabled by @kane-vln in #635
  • Enable mixed precision training for diffusion RL by @Dinghow in #634
  • fix: docs and check for profiler by @lfengad in #637
  • fix: RL part resume epoch setting with tests added by @lfengad in #636
  • Support Sequence Packing for Qwen3.5 by @kane-vln in #639
  • fix: RL version compatible basically runnable for vllm 0.17 by @lfengad in #641
  • Add tutorial for diffusion SFT & RL by @Dinghow in #644
  • Add e2e test for diffusion RL by @Dinghow in #643
  • Sync changes from nemotron branch by @jcao-ai in #645
  • fix: slurm stability ehancement by @lfengad in #646
  • fix: support _StridedShard DTensor placements for weight sync by @kane-vln in #650
  • Add support for FA3 from internal flash_attn_3_nv. Also fix the flash_attn_varlen_func for FA3 by @yufanhuangNV in #648
  • Set attention implementation to flash_attention_2 by default for HFModel by @kane-vln in #651
  • fix: compatible with flash-attn-3 tuple output by @lfengad in #653
  • Support validation for remote reward by @Dinghow in #655
  • Fix SFT checkpointing barrier for multi-replica by @Dinghow in #657
  • [vla] support maniskill env by @fwd4 in #656
  • Fix: relax import assert by @lfengad in #658
  • feat: Hook for ckpt handling by @lfengad in #659
  • Sync changes from dev/nemotron by @jcao-ai in #660
  • Support batched remote reward computation by @Dinghow in #661
  • Add compute_default_rope_parameters for default RoPE when using Transformers ≥ 5.0 by @kane-vln in #654
  • Support GRPO for Qwen3.5 by @kane-vln in #647
  • Sync changes from VLM by @jcao-ai in #662

Full Changelog: v0.4.1...v0.4.2