Skip to content

feat: Add Mimo v2.5 model support#22493

Open
AesSedai wants to merge 3 commits intoggml-org:masterfrom
AesSedai:mimo-v2.5
Open

feat: Add Mimo v2.5 model support#22493
AesSedai wants to merge 3 commits intoggml-org:masterfrom
AesSedai:mimo-v2.5

Conversation

@AesSedai
Copy link
Copy Markdown
Contributor

@AesSedai AesSedai commented Apr 29, 2026

Overview

This PR adds support for MiMo V2.5 (+ Pro) for text-to-text inference. The non-Pro MiMo V2.5 has audio and vision components that are not included in this PR.

Additional information

I haven't re-tested the Pro model but I think it should still convert and quantize correctly, will follow-up with that again when I finish with the non-Pro model quantizations.

The convert_hf_to_gguf.py now dequantized the FP8 safetensors correctly, MiMo has an oddly packed TP-aware sharding for its weights in additional to fusing the attention_qkv. To maintain compatibility with the existing MiMo V2 Flash path, I've opted to un-fuse the attention_qkv and use the existing modeling code.

One small tweak to note is that the MiMo V2 and V2.5 models have an attention_value_scale that was provided in the config.json but not being used. I've plumbed that through, which should bring the model closer to parity with the transformers implementation.

MiMo-V2.5-Q8_0-KLD.txt

====== Perplexity statistics ======
Mean PPL(Q)                   :   5.135221 ±   0.030263
Mean PPL(base)                :   5.128919 ±   0.030176
Cor(ln(PPL(Q)), ln(PPL(base))):  99.65%
Mean ln(PPL(Q)/PPL(base))     :   0.001228 ±   0.000494
Mean PPL(Q)/PPL(base)         :   1.001229 ±   0.000495
Mean PPL(Q)-PPL(base)         :   0.006302 ±   0.002539

====== KL divergence statistics ======
Mean    KLD:   0.012455 ±   0.000173
Maximum KLD:  10.765786
99.9%   KLD:   0.548270
99.0%   KLD:   0.125446
95.0%   KLD:   0.043128
90.0%   KLD:   0.025489
Median  KLD:   0.004163
10.0%   KLD:   0.000084
 5.0%   KLD:   0.000021
 1.0%   KLD:   0.000002
 0.1%   KLD:  -0.000002
Minimum KLD:  -0.000098

Requirements

  • I have read and agree with the contributing guidelines: Yes
  • AI usage disclosure: Yes, used to implement the TP-aware FP8 dequantization

@AesSedai AesSedai requested review from CISC and ggerganov as code owners April 29, 2026 01:19
@github-actions github-actions Bot added model Model specific python python script changes labels Apr 29, 2026
@AesSedai
Copy link
Copy Markdown
Contributor Author

also cc @ngxson for review

@sayap
Copy link
Copy Markdown
Contributor

sayap commented Apr 29, 2026

Getting this error when converting:

AttributeError: 'GGUFWriter' object has no attribute 'add_attn_value_scale'. Did you mean: 'add_attn_output_scale'?

Need to include changes to gguf-py?

@segmond
Copy link
Copy Markdown

segmond commented Apr 29, 2026

I'm going to find some disk and download and give this a go!

@AesSedai
Copy link
Copy Markdown
Contributor Author

@segmond oops, forgot to include that in the commit. I've pushed it now, give it another shot?

@segmond
Copy link
Copy Markdown

segmond commented Apr 29, 2026

@segmond oops, forgot to include that in the commit. I've pushed it now, give it another shot?

I'm downloading the q8, at the rate it's going, it will take about 9 hours if there's no issue. I'll just pull down and rebuild when I get up in the morning before I try it.

@AesSedai
Copy link
Copy Markdown
Contributor Author

Ah I meant to ping @sayap about the convert issue, my eyes are crossed :P

I pushed the commit that added the writer and constant updates.

@AesSedai
Copy link
Copy Markdown
Contributor Author

I've just tried converting the MiMi V2.5 Pro version and the conversion fails at the TP dequant, will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants