Skip to content

model: mistral small 4 support#20649

Merged
ngxson merged 7 commits intoggml-org:masterfrom
ngxson:xsn/mistral-small-4
Mar 16, 2026
Merged

model: mistral small 4 support#20649
ngxson merged 7 commits intoggml-org:masterfrom
ngxson:xsn/mistral-small-4

Conversation

@ngxson
Copy link
Contributor

@ngxson ngxson commented Mar 16, 2026

Ref upstream PR: huggingface/transformers#44760

The model is the same as Mistral Large 3 (deepseek2 arch with llama4 scaling), but I'm moving it to a new arch mistral4 to be aligned with transformers code

Disclosure: this PR is made possible with the help from Mistral team. Kudos to @juliendenize for the coordination!

TODO:

@ngxson ngxson requested a review from CISC as a code owner March 16, 2026 17:20
Comment on lines +8489 to +8497
model_arch = gguf.MODEL_ARCH.MISTRAL3 # unused
impl: TextModel
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if self.hparams.get("model_type") == "mistral4":
self.impl = Mistral3Model.Mistral4Model(*args, **kwargs)
else:
self.impl = Mistral3Model.Ministral3Model(*args, **kwargs)

Copy link
Contributor Author

@ngxson ngxson Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CISC note that the model will be released with Mistral3ForConditionalGeneration as the main arch (to support vision input), only the text model uses the new mistral4 in the model_type field. Hence the hack introduced in the conversion script.

We should find a way to remove my hack in a follow-up PR.

@Pento95
Copy link

Pento95 commented Mar 16, 2026

i know this is spam, but.. how do you do that? i mean, HF Transformers PR has been opened 1 hour before you opened this PR! Spotting the PR, understanding that, porting it, testing and opening a PR in 1 hour? man... we love you.

@ngxson
Copy link
Contributor Author

ngxson commented Mar 16, 2026

@Pento95 I included a disclosure on the PR description (which I forgot to add initially). That should answer your question.

@github-actions github-actions bot added testing Everything test related python python script changes labels Mar 16, 2026
@ggerganov
Copy link
Member

Added a note in the description that currently we are missing kernel specializations in some backends.

@0cc4m
Copy link
Contributor

0cc4m commented Mar 16, 2026

Is Vulkan actually missing? We don't operate with hardcoded FA sizes, they get generated on demand when loading the shader.

@ggerganov
Copy link
Member

Is Vulkan actually missing? We don't operate with hardcoded FA sizes, they get generated on demand when loading the shader.

Nice. It works.

Copy link
Contributor

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ changes look correct to me, I'm not familiar with the Python code.

@eauchs
Copy link

eauchs commented Mar 16, 2026

@ngxson The Metal CI failure appears to be due to the branch not yet including #20549 (b30a5fd — metal: add FA specialization for HSK=320, HSV=256), which is already merged in master. A rebase on master should resolve ggml-ci-mac-metal. Tested locally on M3 Max 128GB, confirmed the specialization is present in master but missing from xsn/mistral-small-4.

@eauchs
Copy link

eauchs commented Mar 16, 2026

@ngxson opened a small fix for the flake8 CI — eauchs/llama.cpp#91

ngxson and others added 2 commits March 16, 2026 23:01
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ngxson
Copy link
Contributor Author

ngxson commented Mar 16, 2026

seems like there is a glitch somehow, github reports that I replaced the entire convert_hf_to_gguf.py, while locally it renders correctly the changed lines

@CISC
Copy link
Member

CISC commented Mar 16, 2026

seems like there is a glitch somehow, github reports that I replaced the entire convert_hf_to_gguf.py, while locally it renders correctly the changed lines

Did it change newlines? That happened in another PR as well, though only on the changes, not the whole file...

@ngxson
Copy link
Contributor Author

ngxson commented Mar 16, 2026

hmm ok the newline is changed to windows style somehow

@ngxson
Copy link
Contributor Author

ngxson commented Mar 16, 2026

it's fine now, master branch uses linux style, so I suspect something wrong with github when I apply the changes directly from web interface

@CISC
Copy link
Member

CISC commented Mar 16, 2026

it's fine now, master branch uses linux style, so I suspect something wrong with github when I apply the changes directly from web interface

Yeah, it happened once before, really weird.

@ngxson ngxson merged commit d34ff7e into ggml-org:master Mar 16, 2026
50 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants