Npu patcher refactor by addsubmuldiv · Pull Request #9223 · modelscope/ms-swift

addsubmuldiv · 2026-04-28T01:45:43Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Summary

This PR refactors the NPU patch logic into swift/model/npu_patch/, keeps swift/model/npu_patcher.py as the compatible import entry, and adds a switch to disable only model-level NPU patches when debugging.

Changes

Split NPU patches by responsibility: env, FSDP, MindSpeed, model, and shared utils.
Reorganized model patches by model family in model.py for easier maintenance.
Kept existing import-time behavior: importing swift.model on NPU still applies patches by default.
Added --enable_npu_model_patch false to skip only model-related patches.
Updated NPU and command-line parameter docs.
Fix some patch error on transformers 5.x

gemini-code-assist

Code Review

This pull request refactors NPU-specific monkey patches into a structured npu_patch package and introduces a new command-line argument, --enable_npu_model_patch, to control model-level patches. The update includes specific compatibility and performance patches for Qwen series models (Qwen2, Qwen3, Qwen3.5, and MoE variants) on Ascend NPU. Feedback focuses on improving library compatibility by moving top-level imports of newer Transformers models into the patching logic, optimizing token counting with torch.bincount, and reducing host-device synchronization in MoE forward passes.

gemini-code-assist · 2026-04-28T01:47:41Z

+from transformers.models.qwen3 import modeling_qwen3
+from transformers.models.qwen3_moe import modeling_qwen3_moe
+from transformers.models.qwen3_vl_moe import modeling_qwen3_vl_moe


These top-level imports from transformers.models.qwen3* will cause an ImportError if the user is using an older version of the transformers library (e.g., < 4.48). Since these models are relatively new, it is safer to use the import_optional_module helper inside the apply_patch function, similar to how qwen3_5 is handled at line 516.

gemini-code-assist · 2026-04-28T01:47:41Z

+    tokens_per_expert = torch.histc(
+        router_indices.to(torch.float), bins=self.num_experts, min=0, max=self.num_experts).to(torch.int64)


Using torch.histc for counting integer occurrences is generally less efficient and more complex than torch.bincount on NPU. torch.bincount avoids the need for casting to float and specifying min/max ranges.

tokens_per_expert = torch.bincount(router_indices.view(-1), minlength=self.num_experts)

gemini-code-assist · 2026-04-28T01:47:41Z

+    cpu_group_list = group_list.to('cpu', non_blocking=False)
+    cpu_group_list = [0] + cpu_group_list.tolist()
+    split_size = [cpu_group_list[i + 1] - cpu_group_list[i] for i in range(len(cpu_group_list) - 1)]


This logic for calculating split_size is redundant and involves an unnecessary host-device synchronization. tokens_per_experts (calculated at line 324) already contains the counts needed for split_size. You can simply convert it to a list. Note that any .tolist() or .item() call on a tensor causes a synchronization point which can impact performance in the forward pass.

Suggested change

cpu_group_list = group_list.to('cpu', non_blocking=False)

cpu_group_list = [0] + cpu_group_list.tolist()

split_size = [cpu_group_list[i + 1] - cpu_group_list[i] for i in range(len(cpu_group_list) - 1)]

split_size = tokens_per_experts.tolist()

Copilot

Pull request overview

This PR refactors Ascend NPU monkey-patch logic into a dedicated swift/model/npu_patch/ package while preserving the existing import entrypoint (swift/model/npu_patcher.py) and introducing a startup flag intended to disable only model-level NPU patches for debugging.

Changes:

Moved patch implementations into modular files (env.py, fsdp.py, mindspeed.py, model.py, utils.py) under swift/model/npu_patch/.
Kept backwards compatibility by retaining swift/model/npu_patcher.py as the import entry and applying patches on import.
Added a new CLI argument --enable_npu_model_patch (documented in EN/ZH) intended to skip only model-level patches.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
swift/model/npu_patcher.py	Compatibility entrypoint that imports the new package and applies patches on import
swift/model/npu_patch/init.py	Central patch application + new argv-based switch for model patches
swift/model/npu_patch/env.py	Sets default `HCCL_CONNECT_TIMEOUT` on NPU
swift/model/npu_patch/fsdp.py	Refactors Accelerate FSDP2 fp32-cast patch into its own module
swift/model/npu_patch/mindspeed.py	Extracts MindSpeed TE CP compatibility patch
swift/model/npu_patch/model.py	Consolidates model-family-specific NPU patches (Qwen2/3/3.5 + MoE variants)
swift/model/npu_patch/utils.py	Shared helpers for optional imports + patch-map application
swift/arguments/base_args/base_args.py	Adds `enable_npu_model_patch` argument to BaseArguments
docs/source_en/Instruction/Command-line-parameters.md	Documents `enable_npu_model_patch` (EN)
docs/source_en/BestPractices/NPU-support.md	Adds best-practice section describing the model patch switch (EN)
docs/source/Instruction/Command-line-parameters.md	Documents `enable_npu_model_patch` (ZH)
docs/source/BestPractices/NPU-support.md	Adds best-practice section describing the model patch switch (ZH)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    for i, arg in enumerate(sys.argv):
+        if arg in _ENABLE_NPU_MODEL_PATCH_ARGS:
+            if i + 1 >= len(sys.argv) or sys.argv[i + 1].startswith('--'):
+                raise ValueError('--enable_npu_model_patch requires a value: true or false.')
+            return _parse_model_patch_enabled(sys.argv[i + 1])
+        if any(arg.startswith(f'{name}=') for name in _ENABLE_NPU_MODEL_PATCH_ARGS):
+            value = arg.split('=', 1)[1]
+            return _parse_model_patch_enabled(value)
+    return True
+


Jintao-Huang · 2026-04-28T05:36:39Z

thanks!

addsubmuldiv added 4 commits April 27, 2026 19:22

[model] split npu patcher

b0a6c53

[model] refactor npu patch readability

45dca79

[model] add npu patcher switch

ef125c4

[model] refine npu model patch switch

22edce4

Copilot AI review requested due to automatic review settings April 28, 2026 01:45

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

fix lint

5180b02

Copilot AI reviewed Apr 28, 2026

View reviewed changes

update

98d67bf

Jintao-Huang approved these changes Apr 28, 2026

View reviewed changes

addsubmuldiv merged commit 7313772 into modelscope:main Apr 28, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Npu patcher refactor#9223

Npu patcher refactor#9223
addsubmuldiv merged 6 commits into
modelscope:mainfrom
addsubmuldiv:npu_patcher_refactor

addsubmuldiv commented Apr 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Jintao-Huang commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		tokens_per_expert = torch.histc(
		router_indices.to(torch.float), bins=self.num_experts, min=0, max=self.num_experts).to(torch.int64)

Conversation

addsubmuldiv commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Summary

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Jintao-Huang commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

addsubmuldiv commented Apr 28, 2026 •

edited

Loading