[bugfix] compat deepseek-v3 mcore 0.13.0 #6510

Jintao-Huang · 2025-11-09T16:07:03Z

No description provided.

gemini-code-assist · 2025-11-09T16:07:16Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements crucial compatibility fixes within the gpt_bridge module, primarily to ensure seamless integration of deepseek-v3 models across different versions of megatron.core. It introduces version-aware logic to handle variations in tensor parallelism splitting dimensions and refines the assignment of layer normalization weights for LoRA configurations, thereby improving the robustness and adaptability of the model bridging mechanism.

Highlights

Megatron-Core Version Compatibility: Introduced a version check for megatron.core to dynamically adapt behavior based on whether the installed version is 0.14.0rc0 or newer, ensuring broader compatibility.
Tensor Parallelism Split Dimension Fix: Adjusted the tensor parallelism split dimensions for linear_q_down_proj and linear_kv_down_proj to maintain compatibility with older megatron.core versions (prior to 0.14.0rc0), addressing a specific change in the Megatron-LM API.
LoRA QK-Layernorm Support: Added logic to correctly set layer normalization weights for linear_q_up_proj when q_lora_rank is specified and QK-Layernorm is enabled, enhancing LoRA compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces compatibility fixes for megatron-core version 0.13.0 and enhances support for the deepseek-v3 model, particularly its Multi-Latent Attention (MLA) mechanism when using LoRA. The changes are well-implemented and address the intended goals. A version check for megatron-core is added to correctly handle tensor parallelism for down-projection layers across different versions. Additionally, the weight mapping for q_a_layernorm is correctly handled for MLA with LoRA. The code is clean and the changes are logical. Overall, this is a good bugfix that improves compatibility and model support.

Jintao-Huang · 2025-11-09T16:50:29Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces compatibility for DeepSeek-V3 models and megatron-core version 0.13.0. The changes include adding new command-line arguments for MoE routing (moe_router_num_groups, moe_router_group_topk), updating their documentation, and adjusting configuration mappings. The code also introduces version-aware logic in gpt_bridge.py to handle differences between megatron-core versions.

The changes are well-structured and consistent across the documentation, argument parsing, and model bridging logic. I've found one potential issue in gpt_bridge.py where the tensor parallel splitting logic for newer megatron-core versions seems incomplete. My review includes a suggestion to fix this.

gemini-code-assist · 2025-11-09T16:52:21Z

swift/megatron/model/gpt_bridge.py

+        if not self.megatron_core_014:
+            # https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72
+            dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})
        # RowLinear
        dim1_keys = {'linear_proj', 'linear_fc2'}


For megatron-core>=0.14.0, linear_q_down_proj and linear_kv_down_proj are RowParallelLinear, which should be split along dimension 1. You've correctly handled the case for older versions by adding them to dim0_keys. However, for newer versions, they should be added to dim1_keys to ensure correct tensor parallelism splitting. Without this, they won't be split correctly.

# RowLinear dim1_keys = {'linear_proj', 'linear_fc2'} if not self.megatron_core_014: # https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72 dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'}) else: dim1_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})

fix deepseek-v3 mcore_bridge

2fee1d3

hjh0119 approved these changes Nov 9, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 9, 2025

View reviewed changes

Jintao-Huang added 3 commits November 10, 2025 00:17

fix

0b471dd

update

75003bc

update

e7db292

gemini-code-assist bot reviewed Nov 9, 2025

View reviewed changes

Jintao-Huang merged commit 10d9096 into modelscope:main Nov 9, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] compat deepseek-v3 mcore 0.13.0 #6510

[bugfix] compat deepseek-v3 mcore 0.13.0 #6510

Uh oh!

Jintao-Huang commented Nov 9, 2025

Uh oh!

gemini-code-assist bot commented Nov 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Jintao-Huang commented Nov 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[bugfix] compat deepseek-v3 mcore 0.13.0 #6510

[bugfix] compat deepseek-v3 mcore 0.13.0 #6510

Uh oh!

Conversation

Jintao-Huang commented Nov 9, 2025

Uh oh!

gemini-code-assist bot commented Nov 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Jintao-Huang commented Nov 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants