v0.3.8
What's Changed
- fix: refactor of weight mapper no need unsplit map specification by @lfengad in #436
- Remove unsupported fields from Rollout parallelism config. by @foreverlms in #437
- fix: add support for reasoning vla / avla usage by @lfengad in #439
- Disable DeepEP for architectures older than Hopper by @bastefaniak in #441
- Fix n_local_experts computation in DeepseekV3 and Qwen3 MoE by @bastefaniak in #440
- feat: off policy sequence masking by @xlu451 in #431
- fix: slurm launch dp replica support by @lfengad in #443
- fix: fix deepep usage due to synchronize issue by @lfengad in #445
New Contributors
- @bastefaniak made their first contribution in #441
Full Changelog: v0.3.7...v0.3.8