model: mistral small 4 support#20649
Conversation
| model_arch = gguf.MODEL_ARCH.MISTRAL3 # unused | ||
| impl: TextModel | ||
| def __init__(self, *args, **kwargs): | ||
| super().__init__(*args, **kwargs) | ||
| if self.hparams.get("model_type") == "mistral4": | ||
| self.impl = Mistral3Model.Mistral4Model(*args, **kwargs) | ||
| else: | ||
| self.impl = Mistral3Model.Ministral3Model(*args, **kwargs) | ||
|
|
There was a problem hiding this comment.
@CISC note that the model will be released with Mistral3ForConditionalGeneration as the main arch (to support vision input), only the text model uses the new mistral4 in the model_type field. Hence the hack introduced in the conversion script.
We should find a way to remove my hack in a follow-up PR.
|
i know this is spam, but.. how do you do that? i mean, HF Transformers PR has been opened 1 hour before you opened this PR! Spotting the PR, understanding that, porting it, testing and opening a PR in 1 hour? man... we love you. |
|
@Pento95 I included a disclosure on the PR description (which I forgot to add initially). That should answer your question. |
|
Added a note in the description that currently we are missing kernel specializations in some backends. |
|
Is Vulkan actually missing? We don't operate with hardcoded FA sizes, they get generated on demand when loading the shader. |
Nice. It works. |
JohannesGaessler
left a comment
There was a problem hiding this comment.
The C++ changes look correct to me, I'm not familiar with the Python code.
|
@ngxson The Metal CI failure appears to be due to the branch not yet including #20549 (b30a5fd — metal: add FA specialization for HSK=320, HSV=256), which is already merged in master. A rebase on master should resolve ggml-ci-mac-metal. Tested locally on M3 Max 128GB, confirmed the specialization is present in master but missing from xsn/mistral-small-4. |
|
@ngxson opened a small fix for the flake8 CI — eauchs/llama.cpp#91 |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
seems like there is a glitch somehow, github reports that I replaced the entire |
Did it change newlines? That happened in another PR as well, though only on the changes, not the whole file... |
|
hmm ok the newline is changed to windows style somehow |
|
it's fine now, master branch uses linux style, so I suspect something wrong with github when I apply the changes directly from web interface |
Yeah, it happened once before, really weird. |
Ref upstream PR: huggingface/transformers#44760
The model is the same as Mistral Large 3 (deepseek2 arch with llama4 scaling), but I'm moving it to a new arch
mistral4to be aligned with transformers codeDisclosure: this PR is made possible with the help from Mistral team. Kudos to @juliendenize for the coordination!
TODO:
MLA, HKS = 320, HVS = 256