-
Notifications
You must be signed in to change notification settings - Fork 13.9k
model: support Ministral3 #17644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: support Ministral3 #17644
Conversation
| @ModelBase.register("Mistral3ForConditionalGeneration") | ||
| class Mistral3Model(LlamaModel): | ||
| model_arch = gguf.MODEL_ARCH.LLAMA | ||
| model_arch = gguf.MODEL_ARCH.MISTRAL3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for maintainers: while the ministral3 and the old mistral models have almost the same cgraph, the hparams handling in llama_model::load_hparams is quite more complicated. Therefore, it's better to separate the 2 archs to make it more readable.
This also make the code to be more future-proof, in case future mistral models become significantly more complicated than the traditional llama arch.
| # for compatibility, we use LLAMA arch for older models | ||
| # TODO: remove this once everyone has migrated to newer version of llama.cpp | ||
| if self.hparams.get("model_type") != "ministral3": | ||
| self.model_arch = gguf.MODEL_ARCH.LLAMA | ||
| self.gguf_writer.arch = str(self.model_arch) | ||
| self.gguf_writer.add_architecture() | ||
| self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a time frame of ~1 week to remove this could be a reasonable timeline to remove this.
This is for the case where users using new version of script (i.e. via gguf-my-repo) to convert old models, while their local llama.cpp version probably not yet up-to-date
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Ref upstream PR: huggingface/transformers#42498
Disclosure: This PR is made with collaboration from Mistral. Huge thanks to @juliendenize for coordination!
Note: The model weight is not yet released
PPl results: for the 14B model (
-Instructvariant, f16, ctx=32000, batch=8192), ppl isFinal estimate: PPL = 5.5389 +/- 0.03163