-
Notifications
You must be signed in to change notification settings - Fork 990
[bugfix] compat deepseek-v3 mcore 0.13.0 #6510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements crucial compatibility fixes within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces compatibility fixes for megatron-core version 0.13.0 and enhances support for the deepseek-v3 model, particularly its Multi-Latent Attention (MLA) mechanism when using LoRA. The changes are well-implemented and address the intended goals. A version check for megatron-core is added to correctly handle tensor parallelism for down-projection layers across different versions. Additionally, the weight mapping for q_a_layernorm is correctly handled for MLA with LoRA. The code is clean and the changes are logical. Overall, this is a good bugfix that improves compatibility and model support.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces compatibility for DeepSeek-V3 models and megatron-core version 0.13.0. The changes include adding new command-line arguments for MoE routing (moe_router_num_groups, moe_router_group_topk), updating their documentation, and adjusting configuration mappings. The code also introduces version-aware logic in gpt_bridge.py to handle differences between megatron-core versions.
The changes are well-structured and consistent across the documentation, argument parsing, and model bridging logic. I've found one potential issue in gpt_bridge.py where the tensor parallel splitting logic for newer megatron-core versions seems incomplete. My review includes a suggestion to fix this.
| if not self.megatron_core_014: | ||
| # https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72 | ||
| dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'}) | ||
| # RowLinear | ||
| dim1_keys = {'linear_proj', 'linear_fc2'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For megatron-core>=0.14.0, linear_q_down_proj and linear_kv_down_proj are RowParallelLinear, which should be split along dimension 1. You've correctly handled the case for older versions by adding them to dim0_keys. However, for newer versions, they should be added to dim1_keys to ensure correct tensor parallelism splitting. Without this, they won't be split correctly.
# RowLinear
dim1_keys = {'linear_proj', 'linear_fc2'}
if not self.megatron_core_014:
# https://github.com/NVIDIA/Megatron-LM/commit/720c8b40d8e7e2de1dd303d792f29093101c5e72
dim0_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})
else:
dim1_keys.update({'linear_q_down_proj', 'linear_kv_down_proj'})
No description provided.