Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 3, 2025

WIP, the code is quite ugly for now, but just want to get it to work.

Remember to convert with the --mistral-format argument, as the weight is not yet transformers-compatible

Output F16 weight is 1.35 Terabytes Q8_0 weight is 716GB and I don't have enough hw to test it

Edit: thanks @bartowski1182 for testing it!

Disclaimer: unlike Ministral release, this PR is not affiliated with Mistral Team


NOTE: this PR only covers the conversion to GGUF. the C++ code still missing llama 4 scaling to work, but it will be another PR

@ngxson ngxson changed the title convert: support Mistral 3 Large MoE convert: support Mistral 3 Large MoE (need help for testing) Dec 3, 2025
@bartowski1182
Copy link
Contributor

So far so good with this, in a couple hours will be able to test generation

@github-actions github-actions bot added the python python script changes label Dec 3, 2025
@bartowski1182
Copy link
Contributor

seems to work and produce coherent results!

@ngxson ngxson marked this pull request as ready for review December 3, 2025 16:54
@ngxson ngxson requested a review from CISC as a code owner December 3, 2025 16:54
@ngxson ngxson marked this pull request as draft December 3, 2025 16:55
@ngxson
Copy link
Collaborator Author

ngxson commented Dec 3, 2025

This PR still needs to be clean up before it is ready for review 😅

@ngxson ngxson marked this pull request as ready for review December 3, 2025 18:10
@ngxson ngxson changed the title convert: support Mistral 3 Large MoE (need help for testing) convert: support Mistral 3 Large MoE Dec 3, 2025
Comment on lines +9941 to +9942
# remap hparams from Mistral MoE format to DeepseekV2 format
# we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat ugly but an acceptable trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants