Skip to content

[model] support deepseek v4 mtp#93

Merged
Jintao-Huang merged 7 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_mtp
May 26, 2026
Merged

[model] support deepseek v4 mtp#93
Jintao-Huang merged 7 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_mtp

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multi-head compression (mHC) in conjunction with multi-token prediction (MTP). Key changes include refactoring embedding conversion logic into a dedicated method, updating the GPT model's forward pass to handle multi-stream decoder outputs, and adjusting the MTP layer and patcher to support mHC-specific data flows. Review feedback identified two critical issues: a missing call to the newly defined _convert_mtp_embeds method, which would result in missing weights during model export, and the use of an undefined variable layer_idx in the patcher's forward loop.

Comment thread src/mcore_bridge/bridge/gpt_bridge.py
Comment thread src/mcore_bridge/patcher.py Outdated
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Jintao-Huang Jintao-Huang merged commit 0bf81ce into modelscope:main May 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants