feat: Add Mimo v2.5 model support#22493
Conversation
|
also cc @ngxson for review |
|
Getting this error when converting: Need to include changes to gguf-py? |
|
I'm going to find some disk and download and give this a go! |
|
@segmond oops, forgot to include that in the commit. I've pushed it now, give it another shot? |
I'm downloading the q8, at the rate it's going, it will take about 9 hours if there's no issue. I'll just pull down and rebuild when I get up in the morning before I try it. |
|
Ah I meant to ping @sayap about the convert issue, my eyes are crossed :P I pushed the commit that added the writer and constant updates. |
|
I've just tried converting the MiMi V2.5 Pro version and the conversion fails at the TP dequant, will look into it. |
Overview
This PR adds support for MiMo V2.5 (+ Pro) for text-to-text inference. The non-Pro MiMo V2.5 has audio and vision components that are not included in this PR.
Additional information
I haven't re-tested the Pro model but I think it should still convert and quantize correctly, will follow-up with that again when I finish with the non-Pro model quantizations.
The
convert_hf_to_gguf.pynow dequantized the FP8 safetensors correctly, MiMo has an oddly packed TP-aware sharding for its weights in additional to fusing the attention_qkv. To maintain compatibility with the existing MiMo V2 Flash path, I've opted to un-fuse the attention_qkv and use the existing modeling code.One small tweak to note is that the MiMo V2 and V2.5 models have an
attention_value_scalethat was provided in the config.json but not being used. I've plumbed that through, which should bring the model closer to parity with the transformers implementation.MiMo-V2.5-Q8_0-KLD.txt
Requirements