NPU patch FLA by hazelduan · Pull Request #9195 · modelscope/ms-swift

hazelduan · 2026-04-23T08:22:08Z

PR type

New Feature -- Add NPU patch FLA to improve performance.

PR information

Add npu patcher for Qwen 3.5 dense model. After patching, a throughput of 1.59x was achieved on the NPU, and the single-step time decreased from 7.945s/it to 5.006s/it. Accuracy comparisons with GPUs were implemented on Transformers 5.2.0, and unpatched single-step steady-state time on the GPU is 6.959 s/it.

Experiment results

npu patch and no patch comparison

gemini-code-assist

Code Review

This pull request implements the chunk_gated_delta_rule using Triton kernels and integrates it into the NPU patching logic for Qwen 3.5 models. It adds optimized NPU implementations for RMSNorm and Rotary Position Embeddings, along with a file-locking mechanism to safely handle concurrent Triton compilation on Ascend hardware. Feedback focuses on enhancing the robustness of the gradient calculation by casting to float32, cleaning up unused imports, fixing typographical and style issues in error messages, and ensuring the compilation lock file is user-specific to prevent permission conflicts.

gemini-code-assist · 2026-04-23T08:25:18Z

+from typing import Optional
+
+import torch
+import copy


The copy module is imported but not used in this file. Removing unused imports helps maintain code cleanliness and reduces overhead.

gemini-code-assist · 2026-04-23T08:25:18Z

+    @wraps(make_launcher_stub)
+    def _locked_make_npu_launcher_stub(*args, **kwargs):
+        lock_path = os.environ.get('SWIFT_TRITON_ASCEND_LAUNCHER_LOCK',
+                                   _DEFAULT_TRITON_ASCEND_LAUNCHER_LOCK_PATH)
+        with open(lock_path, 'w') as lock_file:
+            fcntl.flock(lock_file, fcntl.LOCK_EX)
+            try:
+                return make_launcher_stub(*args, **kwargs)
+            finally:
+                fcntl.flock(lock_file, fcntl.LOCK_UN)
+
+    _locked_make_npu_launcher_stub._swift_compile_lock = True
+    ascend_driver.make_npu_launcher_stub = _locked_make_npu_launcher_stub
+    logger.info(
+        'Patched Ascend Triton launcher compilation with file lock: %s.',
+        os.environ.get('SWIFT_TRITON_ASCEND_LAUNCHER_LOCK', _DEFAULT_TRITON_ASCEND_LAUNCHER_LOCK_PATH),
+    )


The hardcoded lock file path in /tmp can lead to PermissionError in multi-user environments if the file is created by one user and accessed by another. It is better to use a user-specific lock path. Additionally, the logic can be simplified by defining the lock path once and using 'a' mode to avoid unnecessary truncation.

Suggested change

@wraps(make_launcher_stub)

def _locked_make_npu_launcher_stub(*args, **kwargs):

lock_path = os.environ.get('SWIFT_TRITON_ASCEND_LAUNCHER_LOCK',

_DEFAULT_TRITON_ASCEND_LAUNCHER_LOCK_PATH)

with open(lock_path, 'w') as lock_file:

fcntl.flock(lock_file, fcntl.LOCK_EX)

try:

return make_launcher_stub(*args, **kwargs)

finally:

fcntl.flock(lock_file, fcntl.LOCK_UN)

_locked_make_npu_launcher_stub._swift_compile_lock = True

ascend_driver.make_npu_launcher_stub = _locked_make_npu_launcher_stub

logger.info(

'Patched Ascend Triton launcher compilation with file lock: %s.',

os.environ.get('SWIFT_TRITON_ASCEND_LAUNCHER_LOCK', _DEFAULT_TRITON_ASCEND_LAUNCHER_LOCK_PATH),

)

lock_path = os.environ.get('SWIFT_TRITON_ASCEND_LAUNCHER_LOCK')

if not lock_path:

lock_path = _DEFAULT_TRITON_ASCEND_LAUNCHER_LOCK_PATH

if hasattr(os, 'getuid'):

lock_path += f'.{os.getuid()}'

@wraps(make_launcher_stub)

def _locked_make_npu_launcher_stub(*args, **kwargs):

with open(lock_path, 'a') as lock_file:

fcntl.flock(lock_file, fcntl.LOCK_EX)

try:

return make_launcher_stub(*args, **kwargs)

finally:

fcntl.flock(lock_file, fcntl.LOCK_UN)

_locked_make_npu_launcher_stub._swift_compile_lock = True

ascend_driver.make_npu_launcher_stub = _locked_make_npu_launcher_stub

logger.info('Patched Ascend Triton launcher compilation with file lock: %s.', lock_path)

jiaqiw09 · 2026-04-23T09:24:03Z

+    _QWEN3_5_PATCH_TABLE = ((modeling_qwen3_5, {
+        'Qwen3_5RMSNorm': NpuQwen3_5RMSNorm,
+        'apply_rotary_pos_emb': npu_apply_rotary_pos_emb_qwen3_5,
+        'Qwen3_5MLP.forward': npu_swiglu_forward,


from transformers import qwen3_5挪到此处，做一次try except，然后对table进行增补；qwen3-next同理可操作
对flash-linear的patch也在此处进行，做对应的if else区分可以，后期可继续沿用此方法演进，保证transformers版本的兼容问题

在最新提交中解决

format change Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Add NPU patch FLA to improve performance.

571783d

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

jiaqiw09 reviewed Apr 23, 2026

View reviewed changes

hazel and others added 10 commits April 23, 2026 20:59

delete fallback

131e3f1

Update swift/model/chunk_gated_delta_rule.py

492c1ba

format change Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

format change

54ca05d

fix lint hint

87f79e8

fix format error

9753b64

fix format

cda4108

Polish chunk gated delta rule messages

9538925

format change

50ee10f

double quote

2bf0b6a

format change

5dcdcdc

addsubmuldiv approved these changes Apr 25, 2026

View reviewed changes

Jintao-Huang approved these changes Apr 25, 2026

View reviewed changes

Jintao-Huang merged commit 0f13dce into modelscope:main Apr 25, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPU patch FLA#9195

NPU patch FLA#9195
Jintao-Huang merged 11 commits into
modelscope:mainfrom
hazelduan:npu-patcher

hazelduan commented Apr 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

jiaqiw09 Apr 23, 2026

Uh oh!

hazelduan Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hazelduan commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

jiaqiw09 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

hazelduan Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hazelduan commented Apr 23, 2026 •

edited

Loading