Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 1, 2025

Ref upstream PR: huggingface/transformers#42498

Disclosure: This PR is made with collaboration from Mistral. Huge thanks to @juliendenize for coordination!

Note: The model weight is not yet released

PPl results: for the 14B model (-Instruct variant, f16, ctx=32000, batch=8192), ppl is Final estimate: PPL = 5.5389 +/- 0.03163

@ModelBase.register("Mistral3ForConditionalGeneration")
class Mistral3Model(LlamaModel):
model_arch = gguf.MODEL_ARCH.LLAMA
model_arch = gguf.MODEL_ARCH.MISTRAL3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for maintainers: while the ministral3 and the old mistral models have almost the same cgraph, the hparams handling in llama_model::load_hparams is quite more complicated. Therefore, it's better to separate the 2 archs to make it more readable.

This also make the code to be more future-proof, in case future mistral models become significantly more complicated than the traditional llama arch.

Comment on lines 2821 to 2827
# for compatibility, we use LLAMA arch for older models
# TODO: remove this once everyone has migrated to newer version of llama.cpp
if self.hparams.get("model_type") != "ministral3":
self.model_arch = gguf.MODEL_ARCH.LLAMA
self.gguf_writer.arch = str(self.model_arch)
self.gguf_writer.add_architecture()
self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count)
Copy link
Collaborator Author

@ngxson ngxson Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a time frame of ~1 week to remove this could be a reasonable timeline to remove this.

This is for the case where users using new version of script (i.e. via gguf-my-repo) to convert old models, while their local llama.cpp version probably not yet up-to-date

@ngxson ngxson marked this pull request as ready for review December 1, 2025 10:05
@ngxson ngxson requested review from CISC and ggerganov as code owners December 1, 2025 10:05
ngxson and others added 2 commits December 1, 2025 11:44
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@github-actions github-actions bot added model Model specific python python script changes labels Dec 1, 2025
@ngxson ngxson merged commit cd3c118 into ggml-org:master Dec 1, 2025
67 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants