Add GGUF support for MiniMax-M2.1 model#44526
Merged
SunMarc merged 1 commit intohuggingface:mainfrom Mar 18, 2026
Merged
Conversation
42c2be9 to
f73a17f
Compare
17fcd62 to
455751c
Compare
5 tasks
eba32ba to
b13c152
Compare
ArthurZucker
approved these changes
Mar 17, 2026
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
Well looks great overall! if you have time to investigate on the conversion mapping / maybe we need to replace the . block_sparse_moe. removing the dots?
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
Author
|
@ArthurZucker Thanks for the review! Removed _checkpoint_conversion_mapping — you're right that conversion_mapping.py already handles block_sparse_moe → mlp via the inherited mixtral mapping. |
b13c152 to
96faf06
Compare
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: minimax_m2 |
96faf06 to
2e972b7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Add GGUF loading support for MiniMax-M2.1 (456B MoE) model.
MiniMax-M2.1 is a large Mixture-of-Experts model with 456B total parameters (45.9B active), 256 experts and 8 experts per token. This PR enables loading its GGUF-quantized checkpoints (e.g. unsloth/MiniMax-M2.1-GGUF) via
from_pretrained(..., gguf_file=...).Changes
src/transformers/integrations/ggml.py"minimax_m2"entry toGGUF_CONFIG_MAPPINGwith model-specific config fields (including MoE fields:expert_count,expert_used_count,expert_feed_forward_length,expert_gating_func).expert_gating_funcinteger (from GGUF metadata) toscoring_funcstring ({0: "none", 1: "softmax", 2: "sigmoid"})."minimax_m2"entry toGGUF_CONFIG_DEFAULTS_MAPPINGwith"use_routing_bias": True. This is needed because GGUF metadata does not include a field for routing bias, but the model weights containe_score_correction_biastensors that requireuse_routing_bias=Trueto be loaded correctly.GGUFQwen2Converterforminimax_m2inGGUF_TO_FAST_CONVERTERS(tokenizer is compatible with Qwen2).src/transformers/modeling_gguf_pytorch_utils.pyMiniMaxM2TensorProcessorclass following theTensorProcessorAPI introduced in Qwen2/3 MoE + GGUF model support (restored) #42854 (same pattern asQwen2MoeTensorProcessor):preprocess_name(): strips per-expert indices from HF weight names so that multiple experts can map to one fused GGUF tensor.perform_fallback_tensor_mapping(): maps mergedgate_up_projto bothffn_gate_expsandffn_up_expsGGUF tensors; mapse_score_correction_biastoexp_probs_b.bias.process(): matches GGUF MoE expert tensors and merges gate+up intogate_up_proj [num_experts, 2*intermediate, hidden]._set_moe_expert_tensor(): merges gate and up weights into the fusedgate_up_projtensor, passes down weights directly.TENSOR_PROCESSORS, add model type and architecture mappings.Testing
Due to the model size (456B parameters, 227GB for Q8_0 GGUF), no CI-compatible unit tests are included. This is consistent with other large MoE models (e.g., Qwen3-30B-A3B in #42854).
Verified end-to-end via vLLM serving the Q8_0 GGUF checkpoint on two GPU platforms:
8×AMD W7900D (48GB each)
AMD MI350X (288GB each)
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@SunMarc @MekkCyber @ArthurZucker