[model] support deepseek v4 mtp#93
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for multi-head compression (mHC) in conjunction with multi-token prediction (MTP). Key changes include refactoring embedding conversion logic into a dedicated method, updating the GPT model's forward pass to handle multi-stream decoder outputs, and adjusting the MTP layer and patcher to support mHC-specific data flows. Review feedback identified two critical issues: a missing call to the newly defined _convert_mtp_embeds method, which would result in missing weights during model export, and the use of an undefined variable layer_idx in the patcher's forward loop.
|
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
No description provided.