modify NPU Qwen3.5 Megatron practice by hazelduan · Pull Request #9382 · modelscope/ms-swift

hazelduan · 2026-05-19T09:57:59Z

PR type

Document Updates

PR information

Add USE_MCORE_GDN=0 for Qwen 3.5 practice. The core reason is that MindSpeed has not yet completed the adaptation for GDN.

gemini-code-assist

Code Review

This pull request updates the NPU support documentation for Qwen3.5, introducing the USE_MCORE_GDN=0 environment variable and detailing specific version constraints and functional limitations when using Megatron-SWIFT. Feedback focuses on improving clarity by marking the GDN disablement as mandatory for Qwen3.5 and suggesting explicit branch-switching instructions in the installation guide to avoid known mask issues.

gemini-code-assist · 2026-05-19T10:00:24Z

 export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
 export MEGATRON_LM_PATH=<your_local_megatron_lm_path>
+
+# 5. 如需回退到 transformers 的 GatedDeltaNet 实现，可关闭 Megatron GDN


The comment describes setting USE_MCORE_GDN=0 as optional ('If you need to roll back...'), but the explanation in the Qwen3.5 section (line 267) indicates it is mandatory for the current NPU environment because MindSpeed hasn't adapted the native GDN kernel yet. Updating the comment to reflect this requirement would be clearer for users.

Suggested change

# 5. 如需回退到 transformers 的 GatedDeltaNet 实现，可关闭 Megatron GDN

# 5. 关闭 Megatron GDN（Qwen3.5 在当前 NPU 环境下必须设置）

gemini-code-assist · 2026-05-19T10:00:24Z

+1. 当前 NPU 文档中约定的 MindSpeed 训练组合是 `Megatron-LM v0.15.3 + MindSpeed core_r0.15.3`。这个版本的 `megatron-core` 还没有包含 `0.16` 才引入的 `core.ssm.gated_delta_net` 原生 GDN 内核。
+2. `ms-swift>=4.1.0` 默认使用 Megatron 原生 GDN（`USE_MCORE_GDN=1`），而这条路径要求 `megatron-core>=0.16`。因此在当前 NPU 环境下，需要显式设置 `USE_MCORE_GDN=0`，将 GDN 切回由 `mcore-bridge` 包装的 transformers 原生实现，再配合 ms-swift 内置的 Qwen3.5 FLA NPU 补丁，把 `chunk_gated_delta_rule` 重定向到 MindSpeed Triton 算子。
+3. 这条回退路径的已知代价是：transformers 版 GDN 不支持 packing，也不支持 GDN 的 TP/CP。
+4. 此外，transformers 版 GDN 在 NPU + flash-attn 组合下还有一个已知 mask 链路问题：`padding_free=False` 时，GDN 会读到 trainer 处理后的 `attention_mask`，而不是实际需要的 `attention_mask_2d`，从而触发 `aclnnFlashAttentionScore` 异步报错。该问题已在 `mcore-bridge` 的 `qwen3_5_npu` 分支修复，NPU 用户需要使用包含该修复的版本。


The text mentions that a fix for the mask issue is available in the qwen3_5_npu branch of mcore-bridge. However, the installation instructions in section 3 (line 215) default to the main branch. Users following the guide sequentially might encounter the aclnnFlashAttentionScore error. It would be helpful to explicitly mention that users need to switch to this branch (e.g., git checkout qwen3_5_npu) during installation.

Jintao-Huang · 2026-05-19T11:05:59Z

 export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
 export MEGATRON_LM_PATH=<your_local_megatron_lm_path>
+
+# 5. 如需回退到 transformers 的 GatedDeltaNet 实现，可关闭 Megatron GDN


英文文档部分也同步修改一下

在下一个commit提交～

jiaqiw09 · 2026-05-19T12:45:12Z

 export MEGATRON_LM_PATH=<your_local_megatron_lm_path>
+
+# 5. 如需回退到 transformers 的 GatedDeltaNet 实现，可关闭 Megatron GDN
+export USE_MCORE_GDN=0


补充一下英文文档

jiaqiw09 · 2026-05-19T12:49:51Z

+2. `ms-swift>=4.1.0` 默认使用 Megatron 原生 GDN（`USE_MCORE_GDN=1`），而这条路径要求 `megatron-core>=0.16`。因此在当前 NPU 环境下，需要显式设置 `USE_MCORE_GDN=0`，将 GDN 切回由 `mcore-bridge` 包装的 transformers 原生实现，再配合 ms-swift 内置的 Qwen3.5 FLA NPU 补丁，把 `chunk_gated_delta_rule` 重定向到 MindSpeed Triton 算子。
+3. 这条回退路径的已知代价是：transformers 版 GDN 不支持 packing，也不支持 GDN 的 TP/CP。
+4. 此外，transformers 版 GDN 在 NPU + flash-attn 组合下还有一个已知 mask 链路问题：`padding_free=False` 时，GDN 会读到 trainer 处理后的 `attention_mask`，而不是实际需要的 `attention_mask_2d`，从而触发 `aclnnFlashAttentionScore` 异步报错。该问题已在 `mcore-bridge` 的 `qwen3_5_npu` 分支修复，NPU 用户需要使用包含该修复的版本。
+5. 后续如果 MindSpeed 提供 `core_r0.16.x` 适配分支，上述 `USE_MCORE_GDN=0` 和 transformers GDN 功能受限这两个约束就可以一并解除。


可以收束一下，明确说明以下几点

当前支持的megatron版本为 0.15.3, 走mcore_brdige底层transformers原生实现
a. 不支持GDN的 tp/cp
b. attentionmask问题

未来等ascend npu支持版本切换到 0.16 之后，可以不设置环境变量，走megatron 原生GDN

hazel added 2 commits May 19, 2026 17:35

Document USE_MCORE_GDN fallback on NPU

0f2eb10

Document Qwen3.5 NPU GDN constraints

1f6aa48

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Jintao-Huang reviewed May 19, 2026

View reviewed changes

Jintao-Huang approved these changes May 19, 2026

View reviewed changes

jiaqiw09 reviewed May 19, 2026

View reviewed changes

hazel added 2 commits May 19, 2026 22:21

en version

ef79b49

modify format

5ae7995

Jintao-Huang merged commit cbb0afb into modelscope:main May 19, 2026
1 check passed

Jintao-Huang pushed a commit that referenced this pull request May 24, 2026

[docs] modify NPU Qwen3.5 Megatron practice (#9382)

47dfb2b

llan-ml mentioned this pull request May 25, 2026

【NPU】Qwen3.5 & MindSpeed 0.16.0: GKD training raises AssertionError #9410

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modify NPU Qwen3.5 Megatron practice #9382

modify NPU Qwen3.5 Megatron practice #9382
Jintao-Huang merged 4 commits into
modelscope:mainfrom
hazelduan:main

hazelduan commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Jintao-Huang May 19, 2026

Uh oh!

hazelduan May 19, 2026

Uh oh!

jiaqiw09 May 19, 2026

Uh oh!

jiaqiw09 May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# 5. 如需回退到 transformers 的 GatedDeltaNet 实现，可关闭 Megatron GDN
	# 5. 关闭 Megatron GDN（Qwen3.5 在当前 NPU 环境下必须设置）

Conversation

hazelduan commented May 19, 2026

PR type

PR information

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang May 19, 2026

Choose a reason for hiding this comment

Uh oh!

hazelduan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

jiaqiw09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

jiaqiw09 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants