Skip to content

[New Model] Add MiniCPM3 support #45613

Open
aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
aliyevaladddin:main
Open

[New Model] Add MiniCPM3 support #45613
aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
aliyevaladddin:main

Conversation

@aliyevaladddin
Copy link
Copy Markdown

What does this PR do?

Adds native support for the MiniCPM3 model architecture
(openbmb/MiniCPM3-4B).

MiniCPM3 uses Multi-head Latent Attention (MLA) from DeepSeek-V2 combined with
a standard dense MLP (no MoE), plus three scaling mechanisms:

  • scale_emb — embedding scaling
  • scale_depth / sqrt(num_hidden_layers) — residual connection scaling
  • hidden_size / dim_model_base — logit scaling before the LM head

The implementation uses the modular model pattern, inheriting from Llama
(config, MLP, model structure) and DeepSeek-V2 (MLA attention, rotary
embeddings).

Fixes #41115

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other
    checks if that's the case).
  • Did you read the [contributor guideline](https://github.com/huggingface/
    transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
    Pull Request section?
  • Was this discussed/approved via a Github issue or the
    forum? Please add a link
    to it if that's the case. — Add Model Architecture for MiniCPM3 #41115
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez

- Updated auto mappings to include MiniCPM3Config and associated model classes.
- Introduced MiniCPM3 configuration file with detailed parameters for model architecture.
- Implemented MiniCPM3 modeling classes including attention, MLP, and decoder layers.
- Added support for causal language modeling and sequence classification with MiniCPM3.
- Created modular structure for MiniCPM3 to facilitate future enhancements and maintainability.
@aliyevaladddin
Copy link
Copy Markdown
Author

Add MiniCPM3 model support with configuration and modeling classes

  • Implement MiniCPM3 using modular pattern (inherits Llama + DeepSeek-V2 MLA
    attention)
  • Add embedding scaling (scale_emb), residual scaling (scale_depth), and logit
    scaling (dim_model_base)
  • Register model in AutoConfig, AutoModel, AutoModelForCausalLM,
    AutoModelForSequenceClassification
  • Verified forward pass with small config

@Rocketknight1
Copy link
Copy Markdown
Member

There's an existing PR at #41116, but it's quite stale at this point

@aliyevaladddin
Copy link
Copy Markdown
Author

Thanks @Rocketknight1! Since #41116 has been stale since October 2025, I'd like to take over
and submit a fresh PR with MiniCPM3 support using the current modular pattern. I have a working
implementation with tests ready — will open a PR shortly.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, minicpm3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Model Architecture for MiniCPM3

2 participants