[fix] adapt megatron and mindspeed for npu by jiaqiw09 · Pull Request #8121 · modelscope/ms-swift

jiaqiw09 · 2026-02-26T17:22:36Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

1. Fix MindSpeed Argument Compatibility Issue

MindSpeed applies a patch to TransformerConfig. Currently, MegatronModelConfig inherits from TransformerConfig, and after calling super().__init__(), the patch becomes ineffective. As a result, the corresponding MindSpeed-specific arguments are missing, which causes MindSpeed to fail during parameter validation.

This issue has been identified and is under active fixing. As a temporary workaround, we explicitly read the relevant arguments and assign default values to ensure compatibility.

Additionally, if Ascend hardware is not detected at runtime, the logic will return early to ensure proper isolation and decoupling.

2. Fix NPU Not Entering `TEDotProductAttention`

For NPU to use TEDotProductAttention, use_flash_attn must be set to True.

Two modifications are required:

swift/megatron/pipelines/train/sft.py： Add the use_flash_attn argument and directly inject it into args, avoiding the need to introduce a new argument in megatron_arguments.py.
swift/megatron/model/model_config.py Add setattr(config, 'use_flash_attn', True), This is required due to the MindSpeed compatibility issue. The additional explicit assignment ensures the execution path works correctly.

3. Add Ascend Megatron Example

Add an example for running Megatron on Ascend hardware.

Experiment results

For Qwen3-4B, using examples/ascend/megatron/train_sft_full.sh to do loss compare, it seems good.

gemini-code-assist · 2026-02-26T17:22:54Z

Summary of Changes

Hello @jiaqiw09, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing compatibility and performance for Megatron models on NPU (Ascend) hardware. It addresses a critical issue where MindSpeed-specific arguments were not being correctly applied, and ensures that flash attention is properly enabled for NPU operations. Additionally, a new example script is provided to facilitate the use of Megatron on Ascend, streamlining the development and deployment process for NPU-based training.

Highlights

MindSpeed Argument Compatibility: Implemented a temporary workaround to address missing MindSpeed-specific arguments in MegatronModelConfig by explicitly reading and assigning default values. This ensures compatibility and prevents validation failures when running on NPU.
NPU Flash Attention Activation: Ensured that TEDotProductAttention is correctly utilized on NPU by explicitly setting use_flash_attn to True in both the SFT pipeline arguments and the Megatron model configuration when an NPU is detected and attention_backend is not 'local'.
New Ascend Megatron Example: Added a new shell script example (train_sft_full.sh) to demonstrate how to run Megatron SFT on Ascend hardware, including configurations for model parallelism and flash attention.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/ascend/megatron/train_sft_full.sh
- Added a new shell script to run Megatron SFT on Ascend NPU, configuring various training parameters including model and pipeline parallelism, and flash attention.
swift/megatron/model/model_config.py
- Introduced a new method _augment_mindspeed_defaults to dynamically load and apply MindSpeed-specific default arguments to MegatronModelConfig if an NPU is available, resolving compatibility issues.
- Modified the _check_npu method to include a check for num_experts being non-zero before comparing it against MAX_NPU_EXPERTS_PER_EP.
- Added logic in get_mcore_model_config to explicitly set use_flash_attn to True for NPU when the attention backend is not 'local'.
swift/megatron/pipelines/train/sft.py
- Adjusted the SFT pipeline initialization to ensure use_flash_attn is correctly set and passed to the repatch function when running on NPU with a non-local attention backend.

Activity

The pull request is marked as Work In Progress (WIP).
The author has requested an initial review of the logic, as experiment results are still pending.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces fixes to adapt Megatron and MindSpeed for NPU, along with a new example script for Ascend hardware. The changes are generally well-implemented, but I've identified a few areas for improvement. In swift/megatron/model/model_config.py, there's a debug print statement that should be removed. More critically, a broad except Exception clause could mask important errors, and an implicit reliance on sys.argv might cause unpredictable behavior. I've provided suggestions to address these. Additionally, in the new example script, there's a minor inconsistency in the output directory path that could be confusing for users.

swift/megatron/model/model_config.py

examples/ascend/megatron/train_sft_full.sh

swift/megatron/model/model_config.py

Jintao-Huang · 2026-02-27T06:20:05Z

Thank you. I don't have any questions. Just fix the suggestions from Gemini and it should be good.

Jintao-Huang · 2026-02-27T06:29:14Z

hello! please pass the lint tests.

pip install pre-commit
pre-commit run --all-files

adapt megatron for npu

6e3a04c

jiaqiw09 changed the title ~~[WIP] adapt megatron and mindspeed for npu~~ [WIP] [fix] adapt megatron and mindspeed for npu Feb 26, 2026

fix typo

a894fc2

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

swift/megatron/model/model_config.py Outdated Show resolved Hide resolved

swift/megatron/model/model_config.py Outdated Show resolved Hide resolved

examples/ascend/megatron/train_sft_full.sh Outdated Show resolved Hide resolved

swift/megatron/model/model_config.py Show resolved Hide resolved

fix issues

945a640

jiaqiw09 changed the title ~~[WIP] [fix] adapt megatron and mindspeed for npu~~ [fix] adapt megatron and mindspeed for npu Feb 27, 2026

Jintao-Huang approved these changes Feb 27, 2026

View reviewed changes

Jintao-Huang mentioned this pull request Feb 27, 2026

npu下使用megatron训练，增加mindspeed参数报错 #8115

Closed

1 task

addsubmuldiv merged commit 2fbb485 into modelscope:main Feb 27, 2026
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] adapt megatron and mindspeed for npu#8121

[fix] adapt megatron and mindspeed for npu#8121
addsubmuldiv merged 3 commits intomodelscope:mainfrom
jiaqiw09:main

jiaqiw09 commented Feb 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Feb 27, 2026

Uh oh!

Jintao-Huang commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jiaqiw09 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

1. Fix MindSpeed Argument Compatibility Issue

2. Fix NPU Not Entering TEDotProductAttention

3. Add Ascend Megatron Example

Experiment results

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Feb 27, 2026

Uh oh!

Jintao-Huang commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiaqiw09 commented Feb 26, 2026 •

edited

Loading

2. Fix NPU Not Entering `TEDotProductAttention`