Skip to content

Conversation

@Jintao-Huang
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements crucial compatibility fixes within the gpt_bridge module, primarily to ensure seamless integration of deepseek-v3 models across different versions of megatron.core. It introduces version-aware logic to handle variations in tensor parallelism splitting dimensions and refines the assignment of layer normalization weights for LoRA configurations, thereby improving the robustness and adaptability of the model bridging mechanism.

Highlights

  • Megatron-Core Version Compatibility: Introduced a version check for megatron.core to dynamically adapt behavior based on whether the installed version is 0.14.0rc0 or newer, ensuring broader compatibility.
  • Tensor Parallelism Split Dimension Fix: Adjusted the tensor parallelism split dimensions for linear_q_down_proj and linear_kv_down_proj to maintain compatibility with older megatron.core versions (prior to 0.14.0rc0), addressing a specific change in the Megatron-LM API.
  • LoRA QK-Layernorm Support: Added logic to correctly set layer normalization weights for linear_q_up_proj when q_lora_rank is specified and QK-Layernorm is enabled, enhancing LoRA compatibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces compatibility fixes for megatron-core version 0.13.0 and enhances support for the deepseek-v3 model, particularly its Multi-Latent Attention (MLA) mechanism when using LoRA. The changes are well-implemented and address the intended goals. A version check for megatron-core is added to correctly handle tensor parallelism for down-projection layers across different versions. Additionally, the weight mapping for q_a_layernorm is correctly handled for MLA with LoRA. The code is clean and the changes are logical. Overall, this is a good bugfix that improves compatibility and model support.

@Jintao-Huang
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces compatibility for DeepSeek-V3 models and megatron-core version 0.13.0. The changes include adding new command-line arguments for MoE routing (moe_router_num_groups, moe_router_group_topk), updating their documentation, and adjusting configuration mappings. The code also introduces version-aware logic in gpt_bridge.py to handle differences between megatron-core versions.

The changes are well-structured and consistent across the documentation, argument parsing, and model bridging logic. I've found one potential issue in gpt_bridge.py where the tensor parallel splitting logic for newer megatron-core versions seems incomplete. My review includes a suggestion to fix this.

Comment on lines +83 to 87
if not self.megatron_core_014:
# https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72
dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})
# RowLinear
dim1_keys = {'linear_proj', 'linear_fc2'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For megatron-core>=0.14.0, linear_q_down_proj and linear_kv_down_proj are RowParallelLinear, which should be split along dimension 1. You've correctly handled the case for older versions by adding them to dim0_keys. However, for newer versions, they should be added to dim1_keys to ensure correct tensor parallelism splitting. Without this, they won't be split correctly.

        # RowLinear
        dim1_keys = {'linear_proj', 'linear_fc2'}
        if not self.megatron_core_014:
            # https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72
            dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})
        else:
            dim1_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})

@Jintao-Huang Jintao-Huang merged commit 10d9096 into modelscope:main Nov 9, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants