Skip to content

v0.3.8

Choose a tag to compare

@foreverlms foreverlms released this 12 Dec 07:39
· 216 commits to main since this release
ad97196

What's Changed

  • fix: refactor of weight mapper no need unsplit map specification by @lfengad in #436
  • Remove unsupported fields from Rollout parallelism config. by @foreverlms in #437
  • fix: add support for reasoning vla / avla usage by @lfengad in #439
  • Disable DeepEP for architectures older than Hopper by @bastefaniak in #441
  • Fix n_local_experts computation in DeepseekV3 and Qwen3 MoE by @bastefaniak in #440
  • feat: off policy sequence masking by @xlu451 in #431
  • fix: slurm launch dp replica support by @lfengad in #443
  • fix: fix deepep usage due to synchronize issue by @lfengad in #445

New Contributors

Full Changelog: v0.3.7...v0.3.8