Skip to content

[fix] adapt megatron and mindspeed for npu#8121

Merged
addsubmuldiv merged 3 commits intomodelscope:mainfrom
jiaqiw09:main
Feb 27, 2026
Merged

[fix] adapt megatron and mindspeed for npu#8121
addsubmuldiv merged 3 commits intomodelscope:mainfrom
jiaqiw09:main

Conversation

@jiaqiw09
Copy link
Contributor

@jiaqiw09 jiaqiw09 commented Feb 26, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

1. Fix MindSpeed Argument Compatibility Issue

MindSpeed applies a patch to TransformerConfig. Currently, MegatronModelConfig inherits from TransformerConfig, and after calling super().__init__(), the patch becomes ineffective. As a result, the corresponding MindSpeed-specific arguments are missing, which causes MindSpeed to fail during parameter validation.

This issue has been identified and is under active fixing. As a temporary workaround, we explicitly read the relevant arguments and assign default values to ensure compatibility.

Additionally, if Ascend hardware is not detected at runtime, the logic will return early to ensure proper isolation and decoupling.

2. Fix NPU Not Entering TEDotProductAttention

For NPU to use TEDotProductAttention, use_flash_attn must be set to True.

Two modifications are required:

  • swift/megatron/pipelines/train/sft.py: Add the use_flash_attn argument and directly inject it into args, avoiding the need to introduce a new argument in megatron_arguments.py.

  • swift/megatron/model/model_config.py Add setattr(config, 'use_flash_attn', True), This is required due to the MindSpeed compatibility issue. The additional explicit assignment ensures the execution path works correctly.

3. Add Ascend Megatron Example

Add an example for running Megatron on Ascend hardware.

Experiment results

For Qwen3-4B, using examples/ascend/megatron/train_sft_full.sh to do loss compare, it seems good.

image

@jiaqiw09 jiaqiw09 changed the title [WIP] adapt megatron and mindspeed for npu [WIP] [fix] adapt megatron and mindspeed for npu Feb 26, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jiaqiw09, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing compatibility and performance for Megatron models on NPU (Ascend) hardware. It addresses a critical issue where MindSpeed-specific arguments were not being correctly applied, and ensures that flash attention is properly enabled for NPU operations. Additionally, a new example script is provided to facilitate the use of Megatron on Ascend, streamlining the development and deployment process for NPU-based training.

Highlights

  • MindSpeed Argument Compatibility: Implemented a temporary workaround to address missing MindSpeed-specific arguments in MegatronModelConfig by explicitly reading and assigning default values. This ensures compatibility and prevents validation failures when running on NPU.
  • NPU Flash Attention Activation: Ensured that TEDotProductAttention is correctly utilized on NPU by explicitly setting use_flash_attn to True in both the SFT pipeline arguments and the Megatron model configuration when an NPU is detected and attention_backend is not 'local'.
  • New Ascend Megatron Example: Added a new shell script example (train_sft_full.sh) to demonstrate how to run Megatron SFT on Ascend hardware, including configurations for model parallelism and flash attention.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/ascend/megatron/train_sft_full.sh
    • Added a new shell script to run Megatron SFT on Ascend NPU, configuring various training parameters including model and pipeline parallelism, and flash attention.
  • swift/megatron/model/model_config.py
    • Introduced a new method _augment_mindspeed_defaults to dynamically load and apply MindSpeed-specific default arguments to MegatronModelConfig if an NPU is available, resolving compatibility issues.
    • Modified the _check_npu method to include a check for num_experts being non-zero before comparing it against MAX_NPU_EXPERTS_PER_EP.
    • Added logic in get_mcore_model_config to explicitly set use_flash_attn to True for NPU when the attention backend is not 'local'.
  • swift/megatron/pipelines/train/sft.py
    • Adjusted the SFT pipeline initialization to ensure use_flash_attn is correctly set and passed to the repatch function when running on NPU with a non-local attention backend.
Activity
  • The pull request is marked as Work In Progress (WIP).
  • The author has requested an initial review of the logic, as experiment results are still pending.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces fixes to adapt Megatron and MindSpeed for NPU, along with a new example script for Ascend hardware. The changes are generally well-implemented, but I've identified a few areas for improvement. In swift/megatron/model/model_config.py, there's a debug print statement that should be removed. More critically, a broad except Exception clause could mask important errors, and an implicit reliance on sys.argv might cause unpredictable behavior. I've provided suggestions to address these. Additionally, in the new example script, there's a minor inconsistency in the output directory path that could be confusing for users.

@Jintao-Huang
Copy link
Collaborator

Thank you. I don't have any questions. Just fix the suggestions from Gemini and it should be good.

@Jintao-Huang
Copy link
Collaborator

hello! please pass the lint tests.

pip install pre-commit
pre-commit run --all-files

@jiaqiw09 jiaqiw09 changed the title [WIP] [fix] adapt megatron and mindspeed for npu [fix] adapt megatron and mindspeed for npu Feb 27, 2026
@addsubmuldiv addsubmuldiv merged commit 2fbb485 into modelscope:main Feb 27, 2026
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants