v0.3.9
What's Changed
- Support RL for world foundational model by @Dinghow in #432
- Optimize weights loading by @kane-vln in #427
- Change the default valute of num_rollout for wfm rl by @Dinghow in #446
- Support VLA RL with physics simulator by @fwd4 in #444
- Update document for world foundational model by @Dinghow in #448
- Fix HSDP for WFM RL by @Dinghow in #453
- Update parallelism settings for WFM RL by @Dinghow in #455
- [vla] support vla in colocated mode by @fwd4 in #457
- Support on-policy distillation by @lfengad in #452
- Don't upload completion for validation by @foreverlms in #458
- feat: image rewards by @xlu451 in #456
- Fix potential risks of sharing same empty list object in list by @foreverlms in #459
- [vla] fix 2 issues by @fwd4 in #460
- fix: Fault tolerence fro distill by @lfengad in #461
- Fix WFM parallelism & align DDRL experiments settings by @Dinghow in #463
- Update doc for DDRL by @Dinghow in #464
- [vla] refactor vla simulator by @fwd4 in #466
- feat: Pattern-based parameter freezing by @xlu451 in #462
- Update example for DDRL by @Dinghow in #469
- [vla] fix validation problem by @fwd4 in #468
- Add diffusers SFT for SanaVideo by @yy-code-nv in #451
- [vla] support b1k simulation by @fwd4 in #472
- feat: distillation more refine by @lfengad in #467
- Fix: pin nvidia-nvshmem-cu12 version to 3.4.5 by @lfengad in #474
- [vla] continuous simulation by @fwd4 in #473
- Fix dependencies for image rewards by @Dinghow in #477
- Fix video rewards by @Dinghow in #479
- Update clients and doc for rewards by @Dinghow in #481
- Fix the tensor export method for Qwen3VL-MoE and Deepseek-V3 MoE after enabling DeepEP by @yufanhuangNV in #478
- Fix xformers compatibility by @yy-code-nv in #484
- feat: support async rollout engine by @jingxu9x in #382
- feat: add prompt&image validation for image reward services by @xlu451 in #486
- [vla] support openpi + libero sim by @fwd4 in #485
- feat(reward-service): PickScore service by @xlu451 in #487
- Remove ambiguous fake video tensor by @Dinghow in #489
- feat(reward-service): hpsv3 by @xlu451 in #490
- docs: hpsv3 pickscore request example by @xlu451 in #493
- flag for drop_last of DataLoader by @foreverlms in #491
- feat: distillation support topk and jsd in KL calculation by @lfengad in #480
- Fix for atten backend for HF qwen3-vl by @foreverlms in #494
New Contributors
Full Changelog: v0.3.8...v0.3.9