[bugfix] fix minimax qk_norm sp by Jintao-Huang · Pull Request #83 · modelscope/mcore-bridge

Jintao-Huang · 2026-05-16T17:40:24Z

No description provided.

gemini-code-assist

Code Review

This pull request standardizes the use of 'duplicated' parallel mode for linear layers and refines sequence parallel gathering logic across the bridge. In gpt_model.py, the gathering logic for sequence classification and embedding tasks was moved from hidden_states to logits. In minimax_m2.py, normalization parameters are now explicitly excluded from sequence parallel gradient reduction to prevent gradient amplification. Feedback highlights a critical issue in gpt_model.py where the new gathering logic may perform a redundant AllGather during dynamic batching inference if sequence_parallel_override is active, potentially leading to incorrect output shapes.

Jintao-Huang · 2026-05-16T17:55:23Z

/gemini review

gemini-code-assist

Code Review

This pull request modifies the GPT model and LoRA tuners to refine sequence parallelism (SP) and tensor parallelism (TP) handling, including changes to parallel_mode settings and the placement of sequence parallel gathering. Critical feedback was provided regarding potential weight divergence in the output layer and LoRA adapters when SP is active, as well as a potential double-gather issue during inference in the _postprocess method.

Jintao-Huang · 2026-05-17T06:02:04Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors model parallelism configurations, specifically setting parallel_mode to 'duplicated' for output layers and LoRA linear layers. It also adjusts the sequence parallel gathering logic in GPTModel and disables sequence parallelism for specific normalization layers in MinimaxM2 to prevent gradient amplification. Additionally, the deepcopy utility was updated to preserve parameter metadata. Feedback suggests ensuring that all parameter attributes are overwritten during deep copying to correctly mirror the source state, even if default values exist in the target.

gemini-code-assist · 2026-05-17T06:03:37Z

+                if not hasattr(res_param, k):
+                    setattr(res_param, k, v)


The check if not hasattr(res_param, k) prevents copying attributes that might have been initialized with default values in the constructor of the new module. For example, if a parameter is initialized with sequence_parallel=False by default, it will not be updated even if the source parameter has it set to True. Since __dict__ on an nn.Parameter typically only contains custom metadata added by the user or the framework (such as Megatron's parallelism flags), it is safer to overwrite these attributes to ensure the state is correctly mirrored in the deep-copied module.

Suggested change

if not hasattr(res_param, k):

setattr(res_param, k, v)

for k, v in param.__dict__.items():

setattr(res_param, k, v)

Jintao-Huang · 2026-05-17T06:48:43Z

/gemini review

gemini-code-assist

Code Review

This pull request refines model parallelism and LoRA adapter logic, specifically moving sequence parallel gathering to the logit stage in gpt_model.py, adjusting gradient averaging for normalization layers in minimax_m2.py, and disabling communication overlap for certain LoRA configurations. It also improves the deepcopy utility to preserve parameter attributes. Feedback highlights a potential double-gather issue in gpt_model.py during inference for sequence classification tasks, suggesting a conditional check and explicit process group specification to ensure correct tensor shapes.

fix minimax qk_norm sp

ae7d0f9

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/gpt_model.py

fix

ca6a3c7

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/gpt_model.py

Comment thread src/mcore_bridge/model/gpt_model.py

Comment thread src/mcore_bridge/tuners/lora.py

Jintao-Huang added 5 commits May 17, 2026 02:07

fix

c5fd70a

fix

6a5836a

fix

b132204

fix

269555a

update

5cdb4ae

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

update

d4e7a1f

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/gpt_model.py

hjh0119 approved these changes May 17, 2026

View reviewed changes

Jintao-Huang merged commit e746e89 into modelscope:main May 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix minimax qk_norm sp#83

[bugfix] fix minimax qk_norm sp#83
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:fix_minimax_qk_norm

Jintao-Huang commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Jintao-Huang commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Jintao-Huang commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jintao-Huang commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Jintao-Huang commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants