Skip to content

add support for MiMo-V2-Flash#1718

Merged
n1ck-guo merged 5 commits into
mainfrom
hengguo/support_mimo
Apr 27, 2026
Merged

add support for MiMo-V2-Flash#1718
n1ck-guo merged 5 commits into
mainfrom
hengguo/support_mimo

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copilot AI review requested due to automatic review settings April 22, 2026 05:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds compatibility patches to support loading/running MiMo-V2-Flash (and some legacy remote-code RoPE behaviors) under newer transformers, plus improves FP8 block dequant handling when scale tensors are over-provisioned.

Changes:

  • Relax FP8 block scale shape assumptions by rejecting undersized scales and padding weights when scales are over-provisioned.
  • Apply model-instance monkey patches immediately after llm_load_model() loads/evals the model.
  • Add transformers compatibility shims for legacy RoPE default init and MiMo-V2-Flash attention helper call signatures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
auto_round/utils/weight_handler.py Accept over-provisioned FP8 scale tensors by padding weights; raise early for undersized scales.
auto_round/utils/model.py Invoke monkey_patch_model(model) after model load to apply instance-level compatibility patches.
auto_round/utils/common.py Add RoPE default init compatibility, patch _init_weights for legacy RotaryEmbedding, and patch MiMo attention helper.

Comment thread auto_round/utils/common.py
Comment thread auto_round/utils/weight_handler.py
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@n1ck-guo n1ck-guo requested a review from xin3he April 23, 2026 06:38
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copy link
Copy Markdown
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

Comment thread auto_round/utils/common.py
Comment thread auto_round/utils/common.py
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@n1ck-guo n1ck-guo merged commit e62d29d into main Apr 27, 2026
42 checks passed
@n1ck-guo n1ck-guo deleted the hengguo/support_mimo branch April 27, 2026 05:18
lvliang-intel pushed a commit that referenced this pull request May 12, 2026
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants