Skip to content

Document Qwen3.5 FLA patch for NPU support#9237

Merged
addsubmuldiv merged 2 commits into
modelscope:mainfrom
hazelduan:patch-1
Apr 30, 2026
Merged

Document Qwen3.5 FLA patch for NPU support#9237
addsubmuldiv merged 2 commits into
modelscope:mainfrom
hazelduan:patch-1

Conversation

@hazelduan
Copy link
Copy Markdown
Contributor

Added detailed explanation of the Qwen3.5 FLA patch for NPU support, including its functionality and impact on the transformers library.

PR type

  • [ npu patcher] Document Updates

PR information

verified version -- transformers==5.2.0 triton-ascend==3.2.0 flash-linear-attention==0.4.1 torch==2.7.1

Added detailed explanation of the Qwen3.5 FLA patch for NPU support, including its functionality and impact on the transformers library.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation to include a detailed explanation of the Qwen3.5 linear attention patch for NPU support, outlining how it redirects Triton operators to MindSpeed implementations. A review comment identified a version mismatch for the 'flash-linear-attention' package in the documentation, which has been addressed with a correction.

- 该 patch 主要覆盖的是 **Qwen3.5 linear attention 的 gated-delta-rule 路径**;
- 它并不等价于“将整个 fla 包完整替换为 MindSpeed”;
- 若需要这条路径生效,请确保当前环境中可以正确导入 MindSpeed。
- 精度对齐验证版本:torch 2.7.1 + MindSpeed 0.12.1 + flash-linear-attention 4.1.0 + triton-ascend 3.2.0 + transformers 5.2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The version for flash-linear-attention is listed as 4.1.0, but the pull request description specifies 0.4.1. This discrepancy should be corrected to ensure accuracy for users setting up the environment.

Suggested change
- 精度对齐验证版本:torch 2.7.1 + MindSpeed 0.12.1 + flash-linear-attention 4.1.0 + triton-ascend 3.2.0 + transformers 5.2.0
- 精度对齐验证版本:torch 2.7.1 + MindSpeed 0.12.1 + flash-linear-attention 0.4.1 + triton-ascend 3.2.0 + transformers 5.2.0

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

Please also supplement the English document, thanks.

@addsubmuldiv addsubmuldiv merged commit ae0a9be into modelscope:main Apr 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants