Add GGUF support for MiniMax-M2.1 model by JoursBleu · Pull Request #44526 · huggingface/transformers

JoursBleu · 2026-03-08T09:57:38Z

What does this PR do?

Add GGUF loading support for MiniMax-M2.1 (456B MoE) model.

MiniMax-M2.1 is a large Mixture-of-Experts model with 456B total parameters (45.9B active), 256 experts and 8 experts per token. This PR enables loading its GGUF-quantized checkpoints (e.g. unsloth/MiniMax-M2.1-GGUF) via from_pretrained(..., gguf_file=...).

Changes

src/transformers/integrations/ggml.py

Add "minimax_m2" entry to GGUF_CONFIG_MAPPING with model-specific config fields (including MoE fields: expert_count, expert_used_count, expert_feed_forward_length, expert_gating_func).
Convert expert_gating_func integer (from GGUF metadata) to scoring_func string ({0: "none", 1: "softmax", 2: "sigmoid"}).
Add "minimax_m2" entry to GGUF_CONFIG_DEFAULTS_MAPPING with "use_routing_bias": True. This is needed because GGUF metadata does not include a field for routing bias, but the model weights contain e_score_correction_bias tensors that require use_routing_bias=True to be loaded correctly.
Register GGUFQwen2Converter for minimax_m2 in GGUF_TO_FAST_CONVERTERS (tokenizer is compatible with Qwen2).

src/transformers/modeling_gguf_pytorch_utils.py

Add MiniMaxM2TensorProcessor class following the TensorProcessor API introduced in Qwen2/3 MoE + GGUF model support (restored) #42854 (same pattern as Qwen2MoeTensorProcessor):
- preprocess_name(): strips per-expert indices from HF weight names so that multiple experts can map to one fused GGUF tensor.
- perform_fallback_tensor_mapping(): maps merged gate_up_proj to both ffn_gate_exps and ffn_up_exps GGUF tensors; maps e_score_correction_bias to exp_probs_b.bias.
- process(): matches GGUF MoE expert tensors and merges gate+up into gate_up_proj [num_experts, 2*intermediate, hidden].
- _set_moe_expert_tensor(): merges gate and up weights into the fused gate_up_proj tensor, passes down weights directly.
Register processor in TENSOR_PROCESSORS, add model type and architecture mappings.

Testing

Due to the model size (456B parameters, 227GB for Q8_0 GGUF), no CI-compatible unit tests are included. This is consistent with other large MoE models (e.g., Qwen3-30B-A3B in #42854).

Verified end-to-end via vLLM serving the Q8_0 GGUF checkpoint on two GPU platforms:

8×AMD W7900D (48GB each)

Benchmark	GGUF Q8_0	Official BF16
GSM8K 8-shot	91.5%	92.0%
MMLU 5-shot	85.66%	86.2%

AMD MI350X (288GB each)

Benchmark	TP	GGUF Q8_0	Official BF16
GSM8K 8-shot	8	91.7%	92.0%
GSM8K 8-shot	4	92.2%	92.0%
MMLU 5-shot	4	86.01%	86.2%

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@SunMarc @MekkCyber @ArthurZucker

ArthurZucker

Well looks great overall! if you have time to investigate on the conversion mapping / maybe we need to replace the . block_sparse_moe. removing the dots?

src/transformers/models/minimax_m2/modular_minimax_m2.py

HuggingFaceDocBuilderDev · 2026-03-17T10:52:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

JoursBleu · 2026-03-17T14:14:30Z

@ArthurZucker Thanks for the review! Removed _checkpoint_conversion_mapping — you're right that conversion_mapping.py already handles block_sparse_moe → mlp via the inherited mixtral mapping.

github-actions · 2026-03-17T14:16:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: minimax_m2

SunMarc

Thanks !

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from 42c2be9 to f73a17f Compare March 8, 2026 13:07

JoursBleu mentioned this pull request Mar 9, 2026

[Model][Quantization] Add GGUF support for MiniMax-M2.1 vllm-project/vllm#36444

Closed

JoursBleu force-pushed the feat/gguf-minimax-m2 branch 4 times, most recently from 17fcd62 to 455751c Compare March 13, 2026 07:58

JoursBleu mentioned this pull request Mar 13, 2026

[Model][Quantization] Add GGUF support for MiniMax-M2.1 vllm-project/vllm#36965

Open

5 tasks

JoursBleu force-pushed the feat/gguf-minimax-m2 branch 2 times, most recently from eba32ba to b13c152 Compare March 13, 2026 12:32

JoursBleu marked this pull request as ready for review March 16, 2026 07:53

github-actions bot requested review from ArthurZucker and Rocketknight1 March 16, 2026 07:53

ArthurZucker approved these changes Mar 17, 2026

View reviewed changes

src/transformers/models/minimax_m2/modular_minimax_m2.py Outdated Show resolved Hide resolved

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from b13c152 to 96faf06 Compare March 17, 2026 14:15

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from 96faf06 to 2e972b7 Compare March 17, 2026 23:21

Add GGUF support for MiniMax-M2.1 model

2e972b7

JoursBleu mentioned this pull request Mar 18, 2026

[ROCm] Add torch.cuda fallback for amdsmi-dependent methods on WSL2 vllm-project/vllm#37189

Open

SunMarc approved these changes Mar 18, 2026

View reviewed changes

SunMarc enabled auto-merge March 18, 2026 14:16

SunMarc added this pull request to the merge queue Mar 18, 2026

Merged via the queue into huggingface:main with commit aa57e1c Mar 18, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GGUF support for MiniMax-M2.1 model#44526

Add GGUF support for MiniMax-M2.1 model#44526
SunMarc merged 1 commit intohuggingface:mainfrom
JoursBleu:feat/gguf-minimax-m2

JoursBleu commented Mar 8, 2026 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

JoursBleu commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JoursBleu commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Testing

Before submitting

Who can review?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 17, 2026

Uh oh!

JoursBleu commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JoursBleu commented Mar 8, 2026 •

edited

Loading