Skip to content

[model] support bailing#55

Merged
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:support_bailing
May 8, 2026
Merged

[model] support bailing#55
Jintao-Huang merged 8 commits into
modelscope:mainfrom
Jintao-Huang:support_bailing

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the bailing_moe model by adding it to the model constants, exporting it in the GPTs module, and implementing the BailingMoeBridge class. Additionally, the .gitignore was updated and the configuration parser now includes score_function in its mapping. Feedback indicates that the bailing_moe model type should be explicitly handled in the configuration conversion logic to ensure that qk_layernorm is enabled and the router score function is correctly set to sigmoid.

'q_lora_rank': ['q_lora_rank'],
'kv_lora_rank': ['kv_lora_rank'],
'moe_router_score_function': ['scoring_func', 'moe_router_use_sigmoid'],
'moe_router_score_function': ['scoring_func', 'moe_router_use_sigmoid', 'score_function'],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding score_function to the config_mapping is correct, the bailing_moe model type should also be explicitly handled in the hf_to_mcore_config function (around line 120 and 164) to ensure qk_layernorm is enabled and the router score function is set to sigmoid. The bridge definition in bailing_moe.py includes QK normalization keys and expert bias, which strongly suggests these configurations are required for the model to function correctly in Megatron-Core.

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Bailing MoE model, including a specialized attention implementation to handle its specific QKV weight layout, updates to the configuration parser for MoE parameters, and the addition of integration tests. The review feedback identifies missing bias handling in the Bailing MoE attention layer and state dictionary mapping, which could lead to incorrect model behavior. It also points out that an existing test case was accidentally disabled.

Comment thread src/mcore_bridge/model/gpts/bailing_moe.py
Comment thread src/mcore_bridge/model/gpts/bailing_moe.py
Comment thread tests/test_llm.py
@Jintao-Huang Jintao-Huang merged commit 62e0100 into modelscope:main May 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants