v1.4.0
中文版
新特性
- 新增
model_type支持:bailing_moe、qwen3_asr。 - 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
- 对
transformer_block/transformer_layer进行重构,通过可继承的方式便于新模型的接入。 - 兼容 Python 3.13。
- 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
- 新增
padding_mask支持,修复了在padding_free=False时,moe_aux_loss对 padding token 错误计算 routing loss 的问题。
English Version
New Features
- Added
model_typesupport forbailing_moeandqwen3_asr. - Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
- Refactored
transformer_block/transformer_layerwith an inheritable design to simplify the integration of new models. - Added compatibility with Python 3.13.
- Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
- Added
padding_masksupport, fixing an issue wheremoe_aux_lossincorrectly computed routing loss on padding tokens whenpadding_free=False.
What's Changed
- [bugfix] fix MTP & mcore 0.15 (NPU) by @Jintao-Huang in #67
- compat python 3.13 by @Jintao-Huang in #68
- compat lint py313 by @Jintao-Huang in #69
- compat lint py3.13 by @Jintao-Huang in #70
- [model] support bailing by @Jintao-Huang in #55
- update gpt_model by @Jintao-Huang in #71
- refactor transformer_block by @Jintao-Huang in #72
- [bugfix] fix tie_word_embeddings by @Jintao-Huang in #74
- [bugfix] fix qwen3_vl by @Jintao-Huang in #73
- remove hf_grouped lora error by @Jintao-Huang in #75
- [model] support qwen3_next gdn by @Jintao-Huang in #76
- compat megatron.core 0.18 by @Jintao-Huang in #77
- [model] support qwen3_asr by @Jintao-Huang in #78
- Support padding mask by @Jintao-Huang in #79
- compat peft 0.19 by @Jintao-Huang in #80
- [readme] Update readme by @Jintao-Huang in #81
- [docs] update readme by @Jintao-Huang in #82
- [bugfix] fix minimax qk_norm sp by @Jintao-Huang in #83
Full Changelog: v1.3.0...v1.4.0