v0.4.0
What's Changed
- Support DiffusionNFT by @Dinghow in #465
- Fix OOM when prompt is very long. by @foreverlms in #498
- Fix qwen3_vl not supported in rollout. by @foreverlms in #501
- Always init distribution. by @foreverlms in #500
- few fix for
nemotron-nano-v3MoE support by @jcao-ai in #502 - Fix training hang when resuming from ckpt. by @foreverlms in #503
- fix: Resume reproduceable of shuffle case by @lfengad in #504
- support MoE aux-free loadbalancing by @jcao-ai in #506
- Remove a lot of log of loading weight by @foreverlms in #507
- feat: Mesh aware dispatch for avla requirement by @lfengad in #497
- Fix: control of eps by @littlespray in #511
- fix: Add config check and force. by @lfengad in #512
- Fix: step control for sft-trainer by @littlespray in #510
- fix: simple fix for rollout mesh info set by @lfengad in #514
- [vla] support pi05 on b1k by @fwd4 in #515
- Support diffusers-based Cosmos-Predict2.5 by @Dinghow in #499
- fix: resume sft train step missing fix and max_keep -1 support by @lfengad in #517
- Update doc for diffusers-based Cosmos-Predict2.5 by @Dinghow in #518
- Update docs for DDRL by @Dinghow in #521
- fix: pin transformers < 5.0.0 by @lfengad in #519
- Enable no upper limit retry for fetching prompts by default. by @foreverlms in #516
- [vla] add doc by @fwd4 in #520
- feat: support sft multiple replica control by @lfengad in #509
- Make vLLM as an optional dependency by @foreverlms in #525
- Optimize reward service by @Dinghow in #527
- fix: ci flaky 20260130 by @xlu451 in #529
- Add data packer factory mode by @YurongYou in #528
- Update
batch_samplerfactory API by @YurongYou in #524 - Colocated but separated processes for Policy and Rollout by @foreverlms in #508
- fix: Enhance prompt fetch efficiency by @lfengad in #532
- Fix pickscore by @Dinghow in #536
- feat: support batch group reward calc by @lfengad in #538
- Update the remote reward for DDRL by @Dinghow in #539
- Raise FA not installed error in runtime, not initialized time. by @foreverlms in #540
- Update DDRL configs by @Dinghow in #541
- feat: qwen3-vl user define version with i4 vlm dataloader trainable by @lfengad in #533
- fix: sft-trainer weight upload at final step by @littlespray in #542
- Dynamic enqueue timeout for remote reward by @Dinghow in #545
- Support non-euqal worldsize for policy and rollout in colocated-separated mode by @foreverlms in #544
- Check
max_keepacross all previous runs; log best ckpt across runs by @YurongYou in #546 - Revert changes in config
from_dictfunction by @YurongYou in #549 - feat: support pi05 sft by @littlespray in #488
- fix: Fix laucnh multi node by @lfengad in #553
- Support moe aux_loss by @kane-vln in #543
- Robust ckpt saving and loading by @YurongYou in #550
- Fix: checkpoint manager init error in datafetcher by @YurongYou in #557
- Support async ckpt delete in checkpoint manager by @YurongYou in #555
- Fix: SFT failed to start if no ckpt to resume by @YurongYou in #556
New Contributors
- @littlespray made their first contribution in #511
- @YurongYou made their first contribution in #528
Full Changelog: v0.3.9...v0.4.0
What's Changed
- Support DiffusionNFT by @Dinghow in #465
- Fix OOM when prompt is very long. by @foreverlms in #498
- Fix qwen3_vl not supported in rollout. by @foreverlms in #501
- Always init distribution. by @foreverlms in #500
- few fix for
nemotron-nano-v3MoE support by @jcao-ai in #502 - Fix training hang when resuming from ckpt. by @foreverlms in #503
- fix: Resume reproduceable of shuffle case by @lfengad in #504
- support MoE aux-free loadbalancing by @jcao-ai in #506
- Remove a lot of log of loading weight by @foreverlms in #507
- feat: Mesh aware dispatch for avla requirement by @lfengad in #497
- Fix: control of eps by @littlespray in #511
- fix: Add config check and force. by @lfengad in #512
- Fix: step control for sft-trainer by @littlespray in #510
- fix: simple fix for rollout mesh info set by @lfengad in #514
- [vla] support pi05 on b1k by @fwd4 in #515
- Support diffusers-based Cosmos-Predict2.5 by @Dinghow in #499
- fix: resume sft train step missing fix and max_keep -1 support by @lfengad in #517
- Update doc for diffusers-based Cosmos-Predict2.5 by @Dinghow in #518
- Update docs for DDRL by @Dinghow in #521
- fix: pin transformers < 5.0.0 by @lfengad in #519
- Enable no upper limit retry for fetching prompts by default. by @foreverlms in #516
- [vla] add doc by @fwd4 in #520
- feat: support sft multiple replica control by @lfengad in #509
- Make vLLM as an optional dependency by @foreverlms in #525
- Optimize reward service by @Dinghow in #527
- fix: ci flaky 20260130 by @xlu451 in #529
- Add data packer factory mode by @YurongYou in #528
- Update
batch_samplerfactory API by @YurongYou in #524 - Colocated but separated processes for Policy and Rollout by @foreverlms in #508
- fix: Enhance prompt fetch efficiency by @lfengad in #532
- Fix pickscore by @Dinghow in #536
- feat: support batch group reward calc by @lfengad in #538
- Update the remote reward for DDRL by @Dinghow in #539
- Raise FA not installed error in runtime, not initialized time. by @foreverlms in #540
- Update DDRL configs by @Dinghow in #541
- feat: qwen3-vl user define version with i4 vlm dataloader trainable by @lfengad in #533
- fix: sft-trainer weight upload at final step by @littlespray in #542
- Dynamic enqueue timeout for remote reward by @Dinghow in #545
- Support non-euqal worldsize for policy and rollout in colocated-separated mode by @foreverlms in #544
- Check
max_keepacross all previous runs; log best ckpt across runs by @YurongYou in #546 - Revert changes in config
from_dictfunction by @YurongYou in #549 - feat: support pi05 sft by @littlespray in #488
- fix: Fix laucnh multi node by @lfengad in #553
- Support moe aux_loss by @kane-vln in #543
- Robust ckpt saving and loading by @YurongYou in #550
- Fix: checkpoint manager init error in datafetcher by @YurongYou in #557
- Support async ckpt delete in checkpoint manager by @YurongYou in #555
- Fix: SFT failed to start if no ckpt to resume by @YurongYou in #556
- Update pyproject.toml by @lfengad in #558
New Contributors
- @littlespray made their first contribution in #511
- @YurongYou made their first contribution in #528
Full Changelog: v0.3.9...v0.4.0