model: mistral small 4 support by ngxson · Pull Request #20649 · ggml-org/llama.cpp

ngxson · 2026-03-16T17:20:37Z

Ref upstream PR: huggingface/transformers#44760

The model is the same as Mistral Large 3 (deepseek2 arch with llama4 scaling), but I'm moving it to a new arch mistral4 to be aligned with transformers code

Disclosure: this PR is made possible with the help from Mistral team. Kudos to @juliendenize for the coordination!

TODO:

Requires FA kernel specialization for MLA, HKS = 320, HVS = 256
- Metal (metal : add FA specialization for HSK = 320, HSV = 256 #20549)
- CUDA (cc @ggml-org/ggml-cuda)
- Vulkan (cc @ggml-org/ggml-vulkan)
- etc.

ngxson · 2026-03-16T17:21:24Z

convert_hf_to_gguf.py

+    model_arch = gguf.MODEL_ARCH.MISTRAL3 # unused
+    impl: TextModel
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        if self.hparams.get("model_type") == "mistral4":
+            self.impl = Mistral3Model.Mistral4Model(*args, **kwargs)
+        else:
+            self.impl = Mistral3Model.Ministral3Model(*args, **kwargs)
+


@CISC note that the model will be released with Mistral3ForConditionalGeneration as the main arch (to support vision input), only the text model uses the new mistral4 in the model_type field. Hence the hack introduced in the conversion script.

We should find a way to remove my hack in a follow-up PR.

Pento95 · 2026-03-16T17:52:19Z

i know this is spam, but.. how do you do that? i mean, HF Transformers PR has been opened 1 hour before you opened this PR! Spotting the PR, understanding that, porting it, testing and opening a PR in 1 hour? man... we love you.

ngxson · 2026-03-16T18:04:25Z

@Pento95 I included a disclosure on the PR description (which I forgot to add initially). That should answer your question.

convert_hf_to_gguf.py

ggerganov · 2026-03-16T20:54:01Z

Added a note in the description that currently we are missing kernel specializations in some backends.

0cc4m · 2026-03-16T21:03:24Z

Is Vulkan actually missing? We don't operate with hardcoded FA sizes, they get generated on demand when loading the shader.

ggerganov · 2026-03-16T21:17:30Z

Is Vulkan actually missing? We don't operate with hardcoded FA sizes, they get generated on demand when loading the shader.

Nice. It works.

JohannesGaessler

The C++ changes look correct to me, I'm not familiar with the Python code.

eauchs · 2026-03-16T21:47:03Z

@ngxson The Metal CI failure appears to be due to the branch not yet including #20549 (b30a5fd — metal: add FA specialization for HSK=320, HSV=256), which is already merged in master. A rebase on master should resolve ggml-ci-mac-metal. Tested locally on M3 Max 128GB, confirmed the specialization is present in master but missing from xsn/mistral-small-4.

eauchs · 2026-03-16T21:54:27Z

@ngxson opened a small fix for the flake8 CI — eauchs/llama.cpp#91

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

convert_hf_to_gguf.py

ngxson · 2026-03-16T22:07:54Z

seems like there is a glitch somehow, github reports that I replaced the entire convert_hf_to_gguf.py, while locally it renders correctly the changed lines

CISC · 2026-03-16T22:09:14Z

seems like there is a glitch somehow, github reports that I replaced the entire convert_hf_to_gguf.py, while locally it renders correctly the changed lines

Did it change newlines? That happened in another PR as well, though only on the changes, not the whole file...

ngxson · 2026-03-16T22:11:17Z

hmm ok the newline is changed to windows style somehow

ngxson · 2026-03-16T22:14:06Z

it's fine now, master branch uses linux style, so I suspect something wrong with github when I apply the changes directly from web interface

CISC · 2026-03-16T22:14:36Z

it's fine now, master branch uses linux style, so I suspect something wrong with github when I apply the changes directly from web interface

Yeah, it happened once before, really weird.

model: mistral small 4 support

382b30b

ngxson requested a review from CISC as a code owner March 16, 2026 17:20

ngxson commented Mar 16, 2026

View reviewed changes

fix test

b1d262b

ngxson requested a review from JohannesGaessler as a code owner March 16, 2026 17:59

fix test (2)

723efe9

github-actions bot added testing Everything test related python python script changes labels Mar 16, 2026

CISC approved these changes Mar 16, 2026

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

CISC reviewed Mar 16, 2026

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

JohannesGaessler reviewed Mar 16, 2026

View reviewed changes

eauchs mentioned this pull request Mar 16, 2026

convert_hf_to_gguf: fix flake8 E301 in Mistral4Model ngxson/llama.cpp#91

Closed

ngxson and others added 2 commits March 16, 2026 23:01

Apply suggestions from code review

a7fc24b

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Update convert_hf_to_gguf.py

03deac5

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC reviewed Mar 16, 2026

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

Merge branch 'master' into xsn/mistral-small-4

dddb642

change newline

c10eb9b

ngxson merged commit d34ff7e into ggml-org:master Mar 16, 2026
50 of 52 checks passed

Conversation

ngxson commented Mar 16, 2026 • edited by ggerganov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pento95 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

0cc4m commented Mar 16, 2026

Uh oh!

ggerganov commented Mar 16, 2026

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

eauchs commented Mar 16, 2026

Uh oh!

eauchs commented Mar 16, 2026

Uh oh!

Uh oh!

ngxson commented Mar 16, 2026

Uh oh!

CISC commented Mar 16, 2026

Uh oh!

ngxson commented Mar 16, 2026

Uh oh!

ngxson commented Mar 16, 2026

Uh oh!

CISC commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ngxson commented Mar 16, 2026 •

edited by ggerganov

Loading

ngxson Mar 16, 2026 •

edited

Loading

Pento95 commented Mar 16, 2026 •

edited

Loading