Npu doc update by addsubmuldiv · Pull Request #9245 · modelscope/ms-swift

addsubmuldiv · 2026-04-29T16:26:52Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Restructure the NPU support guide to make support scope, usage paths, and environment setup easier to find.
Add container/local installation guidance, NPU availability checks, and optional MindSpeed/Megatron-SWIFT setup.
Add an end-to-end quick start using ModelScope model and dataset IDs.
Improve training, LoRA save/merge/resume, inference, deployment, evaluation, and release sections.
Expand the NPU FAQ with common troubleshooting notes.
Sync the English document with the Chinese version so both versions map one-to-one.

Co-authored-by: Copilot <copilot@github.com>

Copilot

Pull request overview

Restructures the Ascend NPU support guide to make the supported scope, recommended setup paths, and end-to-end usage (train → merge → infer → deploy → troubleshoot) easier to follow, and keeps the English/Chinese docs aligned.

Changes:

Reorganizes the guide with “support scope”, “usage path”, and clearer environment preparation (container + local) plus an NPU availability check.
Adds an end-to-end ModelScope quick start and expands training/LoRA merge-resume/inference/deployment/evaluation/release sections.
Expands the NPU FAQ with practical troubleshooting guidance and syncs EN/ZH structure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
docs/source_en/BestPractices/NPU-support.md	Major restructure + quick start + expanded sections/FAQ for NPU workflows (EN).
docs/source/BestPractices/NPU-support.md	Mirrors the same restructure/content improvements to keep parity with EN (ZH).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T16:31:05Z

-| **DPO**   | Qwen3-8B            | deepspeed | vllm-ascend    | Atlas 900 A2 PODc |
-| **PPO**   | Qwen2.5-7B-Instruct | deepspeed | vllm-ascend    | Atlas 900 A2 PODc |
-| **PPO**   | Qwen3-8B            | deepspeed | vllm-ascend    | Atlas 900 A2 PODc |
+If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-models).


The link target ../Instruction/Export-and-push.md#push-models appears to be a broken anchor. The referenced doc uses the heading "Push Model" (anchor is likely #push-model), so this link won’t jump to the intended section. Please update the fragment to match the actual heading/anchor (or drop the fragment and link to the doc section without an anchor).

Suggested change

If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-models).

If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-model).

gemini-code-assist

Code Review

This pull request significantly enhances the NPU support documentation in both Chinese and English. Key additions include a comprehensive support scope matrix, detailed instructions for container-based environment setup, a quick-start guide for a full LoRA training and deployment workflow, and an extensive FAQ section for troubleshooting common NPU issues. The review feedback identifies several critical improvements: correcting the arguments for the image building script, verifying potentially incorrect version numbers for vLLM packages, and ensuring SOC versions and environment variables are configured for maximum compatibility and safety.

gemini-code-assist · 2026-04-29T16:31:56Z

+```shell
+git clone https://github.com/modelscope/modelscope.git
+cd modelscope
+DOCKER_REGISTRY=ms-swift python docker/build_image.py \


The build_image.py script in the ModelScope repository uses the --repository argument to set the image name, rather than the DOCKER_REGISTRY environment variable. Setting the environment variable will likely be ignored, resulting in the default repository name being used, which will cause subsequent steps (like docker run) to fail because the expected image name won't exist.

Suggested change

DOCKER_REGISTRY=ms-swift python docker/build_image.py \

python docker/build_image.py --repository ms-swift \

gemini-code-assist · 2026-04-29T16:31:56Z

 pip install vllm==0.14.0
 pip install vllm-ascend==0.14.0rc1


The version numbers 0.14.0 for vllm and 0.14.0rc1 for vllm-ascend appear to be incorrect or refer to non-public packages. Upstream vllm is currently at version 0.6.x, and vllm-ascend is typically at 0.1.x on PyPI. If these versions are hosted on a specific index, please provide the --index-url in the command; otherwise, users will encounter "No matching distribution found" errors.

gemini-code-assist · 2026-04-29T16:31:56Z

+```shell
+git clone https://github.com/modelscope/modelscope.git
+cd modelscope
+DOCKER_REGISTRY=ms-swift python docker/build_image.py \


The build_image.py script in the ModelScope repository uses the --repository argument to set the image name, rather than the DOCKER_REGISTRY environment variable. Setting the environment variable will likely be ignored, resulting in the default repository name being used, which will cause subsequent steps (like docker run) to fail because the expected image name won't exist.

Suggested change

DOCKER_REGISTRY=ms-swift python docker/build_image.py \

python docker/build_image.py --repository ms-swift \

gemini-code-assist · 2026-04-29T16:31:57Z

 pip install vllm==0.14.0
 pip install vllm-ascend==0.14.0rc1


The version numbers 0.14.0 for vllm and 0.14.0rc1 for vllm-ascend appear to be incorrect or refer to non-public packages. Upstream vllm is currently at version 0.6.x, and vllm-ascend is typically at 0.1.x on PyPI. If these versions are hosted on a specific index, please provide the --index-url in the command; otherwise, users will encounter "No matching distribution found" errors.

gemini-code-assist · 2026-04-29T16:31:57Z

+DOCKER_REGISTRY=ms-swift python docker/build_image.py \
+  --image_type ascend \
+  --python_version 3.11.11 \
+  --soc_version ascend910b1 \


The soc_version is set to ascend910b1, but the document states that the test environment uses 910B3 (line 26). It is recommended to use the specific SOC version matching the hardware for better compatibility and performance.

Suggested change

--soc_version ascend910b1 \

--soc_version ascend910b3 \

gemini-code-assist · 2026-04-29T16:31:57Z

+跑 Megatron-SWIFT 前，`PYTHONPATH` 和 `MEGATRON_LM_PATH` 必须指向同一份 Megatron-LM 源码树。否则 Python 可能能启动，但实际导入到的是另一套 Megatron/MindSpeed 组合，后续报错会很像模型或参数问题。
+
+```shell
+export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>


It is safer to prepend the Megatron-LM path to PYTHONPATH to ensure it takes precedence over any existing installations in the environment.

Suggested change

export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>

export PYTHONPATH=<your_local_megatron_lm_path>:$PYTHONPATH

gemini-code-assist · 2026-04-29T16:31:57Z

+DOCKER_REGISTRY=ms-swift python docker/build_image.py \
+  --image_type ascend \
+  --python_version 3.11.11 \
+  --soc_version ascend910b1 \


The soc_version is set to ascend910b1, but the document states that the test environment uses 910B3 (line 26). It is recommended to use the specific SOC version matching the hardware for better compatibility and performance.

Suggested change

--soc_version ascend910b1 \

--soc_version ascend910b3 \

gemini-code-assist · 2026-04-29T16:31:57Z

+Before running Megatron-SWIFT, `PYTHONPATH` and `MEGATRON_LM_PATH` must point to the same Megatron-LM source tree. Otherwise Python may start successfully while importing a different Megatron/MindSpeed combination, which can make later errors look like model or argument issues.
+
+```shell
+export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>


It is safer to prepend the Megatron-LM path to PYTHONPATH to ensure it takes precedence over any existing installations in the environment.

Suggested change

export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>

export PYTHONPATH=<your_local_megatron_lm_path>:$PYTHONPATH

addsubmuldiv and others added 2 commits April 28, 2026 20:07

update npu doc

d87cf19

update docs

debb326

Co-authored-by: Copilot <copilot@github.com>

Copilot AI review requested due to automatic review settings April 29, 2026 16:26

Copilot started reviewing on behalf of addsubmuldiv April 29, 2026 16:27 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Jintao-Huang approved these changes Apr 30, 2026

View reviewed changes

Merge branch 'main' into npu_doc_update

547f195

addsubmuldiv merged commit 76aaf5a into modelscope:main May 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Npu doc update#9245

Npu doc update#9245
addsubmuldiv merged 3 commits into
modelscope:mainfrom
addsubmuldiv:npu_doc_update

addsubmuldiv commented Apr 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-models).
	If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-model).

	DOCKER_REGISTRY=ms-swift python docker/build_image.py \
	python docker/build_image.py --repository ms-swift \

	export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
	export PYTHONPATH=<your_local_megatron_lm_path>:$PYTHONPATH

Conversation

addsubmuldiv commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

addsubmuldiv commented Apr 29, 2026 •

edited

Loading