v0.4.1
What's Changed
- Upstream core change sync from
vlmproject by @jcao-ai in #554 - Fix potential teacher model weight loading error. by @foreverlms in #560
- Fix checkpoint offload and loading to CPU by @YurongYou in #562
- Support uncentralized mode for colocated training by @Dinghow in #561
- fix: Fix HF model tie case handling by @lfengad in #565
- Support set primary adapter in multi-lora by @Dinghow in #566
- fix upload issue for sft trainer by @littlespray in #564
- feat: save safetensor async by @lfengad in #569
- fix: Refine custom loader usage by @lfengad in #571
- Add micro_batch & mini_batch for Diffusion RL by @Dinghow in #570
- Support Load-Balanced Dynamic Batching by @kane-vln in #522
- slurm launcher - improve the discovery of cosmos_rl path by @vinjn in #573
- Fix WFM remote reward async mode by @Dinghow in #574
- [vla] support robotwin by @fwd4 in #575
- More metrics for Diffusion RL by @Dinghow in #572
- fix: try simplify for large scale communication by @lfengad in #576
- Feat/refactor pi05 by @littlespray in #563
- fix: more option for reduce on cpu by @lfengad in #579
- Fix DiffusionNFT beta config by @Dinghow in #581
- Fix the image/video range for reward computing by @Dinghow in #580
- Feat: support comet dataset for pi05 by @littlespray in #567
- Remove unused metrics for Diffusion RL by @Dinghow in #582
- Update controller to support datapacker factory mode by @YurongYou in #583
- feat: dataloader broadcast for non dp ranks by @lfengad in #585
- fix: resume issue regarding validation by @lfengad in #587
- fix:
separate_model_partsin HF-Model by @jcao-ai in #588 - Fixes for multi-lr optimizer and model resuming by @jcao-ai in #590
- Remove dp_replicate from
dp_cp_tpmesh dimension names by @jcao-ai in #591 - Added
--slurm-job-timeargument indispatch_job.pyby @vinjn in #589 - fix moe weight loading by @jcao-ai in #593
- fix moe weight loading by @jcao-ai in #594
- fix: accurate weight version control & cumem of nccl default off by @lfengad in #592
- Reuse max_num_steps for load_balanced_max_steps by @kane-vln in #595
- fix: check ckpt loaded info for extra info validation by @lfengad in #596
- add qwen3-vl merger supprot by @jcao-ai in #598
- fix: Validation hang when n_generation > 1 by @lfengad in #597
- fix: Ignore version control when policy scaling by @lfengad in #599
- Add WAN2.2 support for reward service by @yufanhuangNV in #602
- fix: untable issue for nccl group usage by @lfengad in #603
- Support SANA for DiffusionNFT by @Dinghow in #586
- Fix reward media_type & WFM local reward service issue by @Dinghow in #604
- Fuse TP/EP with DP for non-moe layer by @kane-vln in #600
- Support batched GenEval reward by @Dinghow in #605
- Add Qwen3VL merged visual forward and FSDP pure-text dummy pass by @Xuanmeng-Zhang in #606
- Support UnifiedReward by @Dinghow in #608
- Update slurm launch script by @YurongYou in #578
- Add LR scheduler for diffusion RL by @Dinghow in #610
- Correct DDRL preset configs & Add docs for parallelism by @Dinghow in #613
- Fix the media_type for video latency benchmark by @Dinghow in #612
- Fix bugs on batch sampler by @YurongYou in #609
- Add example for SANA post-training by @Dinghow in #611
- fix: RL batch sampler epoch set by @lfengad in #614
- fix: Automatically import models in policy.model by @lfengad in #615
- Sync updates from VLM branch by @jcao-ai in #616
- Fix the configuration page import error by @Dinghow in #618
- feat: on-policy distill slurm supoort and doc by @lfengad in #619
- fix: upgrade lint ruff to 0.12.7 in pre-commit for compatible with i4 by @lfengad in #620
New Contributors
- @Xuanmeng-Zhang made their first contribution in #606
Full Changelog: v0.4.0...v0.4.1