Skip to content

Npu doc update#9245

Merged
addsubmuldiv merged 3 commits into
modelscope:mainfrom
addsubmuldiv:npu_doc_update
May 6, 2026
Merged

Npu doc update#9245
addsubmuldiv merged 3 commits into
modelscope:mainfrom
addsubmuldiv:npu_doc_update

Conversation

@addsubmuldiv
Copy link
Copy Markdown
Collaborator

@addsubmuldiv addsubmuldiv commented Apr 29, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

  • Restructure the NPU support guide to make support scope, usage paths, and environment setup easier to find.
  • Add container/local installation guidance, NPU availability checks, and optional MindSpeed/Megatron-SWIFT setup.
  • Add an end-to-end quick start using ModelScope model and dataset IDs.
  • Improve training, LoRA save/merge/resume, inference, deployment, evaluation, and release sections.
  • Expand the NPU FAQ with common troubleshooting notes.
  • Sync the English document with the Chinese version so both versions map one-to-one.

addsubmuldiv and others added 2 commits April 28, 2026 20:07
Co-authored-by: Copilot <copilot@github.com>
Copilot AI review requested due to automatic review settings April 29, 2026 16:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restructures the Ascend NPU support guide to make the supported scope, recommended setup paths, and end-to-end usage (train → merge → infer → deploy → troubleshoot) easier to follow, and keeps the English/Chinese docs aligned.

Changes:

  • Reorganizes the guide with “support scope”, “usage path”, and clearer environment preparation (container + local) plus an NPU availability check.
  • Adds an end-to-end ModelScope quick start and expands training/LoRA merge-resume/inference/deployment/evaluation/release sections.
  • Expands the NPU FAQ with practical troubleshooting guidance and syncs EN/ZH structure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/source_en/BestPractices/NPU-support.md Major restructure + quick start + expanded sections/FAQ for NPU workflows (EN).
docs/source/BestPractices/NPU-support.md Mirrors the same restructure/content improvements to keep parity with EN (ZH).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| **DPO** | Qwen3-8B | deepspeed | vllm-ascend | Atlas 900 A2 PODc |
| **PPO** | Qwen2.5-7B-Instruct | deepspeed | vllm-ascend | Atlas 900 A2 PODc |
| **PPO** | Qwen3-8B | deepspeed | vllm-ascend | Atlas 900 A2 PODc |
If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-models).
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link target ../Instruction/Export-and-push.md#push-models appears to be a broken anchor. The referenced doc uses the heading "Push Model" (anchor is likely #push-model), so this link won’t jump to the intended section. Please update the fragment to match the actual heading/anchor (or drop the fragment and link to the doc section without an anchor).

Suggested change
If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-models).
If you need to publish NPU-trained checkpoints, merged models, or quantized models to ModelScope/HuggingFace, use the push capability in `swift export`. For complete arguments and examples, see [Export and Push](../Instruction/Export-and-push.md#push-model).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the NPU support documentation in both Chinese and English. Key additions include a comprehensive support scope matrix, detailed instructions for container-based environment setup, a quick-start guide for a full LoRA training and deployment workflow, and an extensive FAQ section for troubleshooting common NPU issues. The review feedback identifies several critical improvements: correcting the arguments for the image building script, verifying potentially incorrect version numbers for vLLM packages, and ensuring SOC versions and environment variables are configured for maximum compatibility and safety.

```shell
git clone https://github.com/modelscope/modelscope.git
cd modelscope
DOCKER_REGISTRY=ms-swift python docker/build_image.py \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The build_image.py script in the ModelScope repository uses the --repository argument to set the image name, rather than the DOCKER_REGISTRY environment variable. Setting the environment variable will likely be ignored, resulting in the default repository name being used, which will cause subsequent steps (like docker run) to fail because the expected image name won't exist.

Suggested change
DOCKER_REGISTRY=ms-swift python docker/build_image.py \
python docker/build_image.py --repository ms-swift \

Comment on lines 179 to 180
pip install vllm==0.14.0
pip install vllm-ascend==0.14.0rc1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The version numbers 0.14.0 for vllm and 0.14.0rc1 for vllm-ascend appear to be incorrect or refer to non-public packages. Upstream vllm is currently at version 0.6.x, and vllm-ascend is typically at 0.1.x on PyPI. If these versions are hosted on a specific index, please provide the --index-url in the command; otherwise, users will encounter "No matching distribution found" errors.

```shell
git clone https://github.com/modelscope/modelscope.git
cd modelscope
DOCKER_REGISTRY=ms-swift python docker/build_image.py \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The build_image.py script in the ModelScope repository uses the --repository argument to set the image name, rather than the DOCKER_REGISTRY environment variable. Setting the environment variable will likely be ignored, resulting in the default repository name being used, which will cause subsequent steps (like docker run) to fail because the expected image name won't exist.

Suggested change
DOCKER_REGISTRY=ms-swift python docker/build_image.py \
python docker/build_image.py --repository ms-swift \

Comment on lines 181 to 182
pip install vllm==0.14.0
pip install vllm-ascend==0.14.0rc1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The version numbers 0.14.0 for vllm and 0.14.0rc1 for vllm-ascend appear to be incorrect or refer to non-public packages. Upstream vllm is currently at version 0.6.x, and vllm-ascend is typically at 0.1.x on PyPI. If these versions are hosted on a specific index, please provide the --index-url in the command; otherwise, users will encounter "No matching distribution found" errors.

DOCKER_REGISTRY=ms-swift python docker/build_image.py \
--image_type ascend \
--python_version 3.11.11 \
--soc_version ascend910b1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The soc_version is set to ascend910b1, but the document states that the test environment uses 910B3 (line 26). It is recommended to use the specific SOC version matching the hardware for better compatibility and performance.

Suggested change
--soc_version ascend910b1 \
--soc_version ascend910b3 \

跑 Megatron-SWIFT 前,`PYTHONPATH` 和 `MEGATRON_LM_PATH` 必须指向同一份 Megatron-LM 源码树。否则 Python 可能能启动,但实际导入到的是另一套 Megatron/MindSpeed 组合,后续报错会很像模型或参数问题。

```shell
export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is safer to prepend the Megatron-LM path to PYTHONPATH to ensure it takes precedence over any existing installations in the environment.

Suggested change
export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
export PYTHONPATH=<your_local_megatron_lm_path>:$PYTHONPATH

DOCKER_REGISTRY=ms-swift python docker/build_image.py \
--image_type ascend \
--python_version 3.11.11 \
--soc_version ascend910b1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The soc_version is set to ascend910b1, but the document states that the test environment uses 910B3 (line 26). It is recommended to use the specific SOC version matching the hardware for better compatibility and performance.

Suggested change
--soc_version ascend910b1 \
--soc_version ascend910b3 \

Before running Megatron-SWIFT, `PYTHONPATH` and `MEGATRON_LM_PATH` must point to the same Megatron-LM source tree. Otherwise Python may start successfully while importing a different Megatron/MindSpeed combination, which can make later errors look like model or argument issues.

```shell
export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is safer to prepend the Megatron-LM path to PYTHONPATH to ensure it takes precedence over any existing installations in the environment.

Suggested change
export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
export PYTHONPATH=<your_local_megatron_lm_path>:$PYTHONPATH

@addsubmuldiv addsubmuldiv merged commit 76aaf5a into modelscope:main May 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants