Skip to content

v1.4.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 17 May 15:50
· 33 commits to main since this release

中文版

新特性

  1. 新增 model_type 支持:bailing_moeqwen3_asr
  2. 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
  3. transformer_block / transformer_layer 进行重构,通过可继承的方式便于新模型的接入。
  4. 兼容 Python 3.13。
  5. 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
  6. 新增 padding_mask 支持,修复了在 padding_free=False 时,moe_aux_loss 对 padding token 错误计算 routing loss 的问题。

English Version

New Features

  1. Added model_type support for bailing_moe and qwen3_asr.
  2. Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
  3. Refactored transformer_block / transformer_layer with an inheritable design to simplify the integration of new models.
  4. Added compatibility with Python 3.13.
  5. Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
  6. Added padding_mask support, fixing an issue where moe_aux_loss incorrectly computed routing loss on padding tokens when padding_free=False.

What's Changed

Full Changelog: v1.3.0...v1.4.0