[New Model] Add MiniCPM3 support #45613
Open
aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
Open
[New Model] Add MiniCPM3 support #45613aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
Conversation
- Updated auto mappings to include MiniCPM3Config and associated model classes. - Introduced MiniCPM3 configuration file with detailed parameters for model architecture. - Implemented MiniCPM3 modeling classes including attention, MLP, and decoder layers. - Added support for causal language modeling and sequence classification with MiniCPM3. - Created modular structure for MiniCPM3 to facilitate future enhancements and maintainability.
Author
|
Add MiniCPM3 model support with configuration and modeling classes
|
Member
|
There's an existing PR at #41116, but it's quite stale at this point |
Author
|
Thanks @Rocketknight1! Since #41116 has been stale since October 2025, I'd like to take over |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, minicpm3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds native support for the MiniCPM3 model architecture
(
openbmb/MiniCPM3-4B).MiniCPM3 uses Multi-head Latent Attention (MLA) from DeepSeek-V2 combined with
a standard dense MLP (no MoE), plus three scaling mechanisms:
scale_emb— embedding scalingscale_depth / sqrt(num_hidden_layers)— residual connection scalinghidden_size / dim_model_base— logit scaling before the LM headThe implementation uses the modular model pattern, inheriting from Llama
(config, MLP, model structure) and DeepSeek-V2 (MLA attention, rotary
embeddings).
Fixes #41115
Code Agent Policy
Before submitting
checks if that's the case).
transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
Pull Request section?
forum? Please add a link
to it if that's the case. — Add Model Architecture for MiniCPM3 #41115
Who can review?
@ArthurZucker @Cyrilvallez