[New Model] Add MiniCPM3 support by aliyevaladddin · Pull Request #45613 · huggingface/transformers

aliyevaladddin · 2026-04-23T20:22:49Z

What does this PR do?

Adds native support for the MiniCPM3 model architecture
(openbmb/MiniCPM3-4B).

MiniCPM3 uses Multi-head Latent Attention (MLA) from DeepSeek-V2 combined with
a standard dense MLP (no MoE), plus three scaling mechanisms:

scale_emb — embedding scaling
scale_depth / sqrt(num_hidden_layers) — residual connection scaling
hidden_size / dim_model_base — logit scaling before the LM head

The implementation uses the modular model pattern, inheriting from Llama
(config, MLP, model structure) and DeepSeek-V2 (MLA attention, rotary
embeddings).

Fixes #41115

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other
checks if that's the case).
Did you read the [contributor guideline](https://github.com/huggingface/
transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
Pull Request section?
Was this discussed/approved via a Github issue or the
forum? Please add a link
to it if that's the case. — Add Model Architecture for MiniCPM3 #41115
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez

- Updated auto mappings to include MiniCPM3Config and associated model classes. - Introduced MiniCPM3 configuration file with detailed parameters for model architecture. - Implemented MiniCPM3 modeling classes including attention, MLP, and decoder layers. - Added support for causal language modeling and sequence classification with MiniCPM3. - Created modular structure for MiniCPM3 to facilitate future enhancements and maintainability.

aliyevaladddin · 2026-04-23T20:40:56Z

Add MiniCPM3 model support with configuration and modeling classes

Implement MiniCPM3 using modular pattern (inherits Llama + DeepSeek-V2 MLA
attention)
Add embedding scaling (scale_emb), residual scaling (scale_depth), and logit
scaling (dim_model_base)
Register model in AutoConfig, AutoModel, AutoModelForCausalLM,
AutoModelForSequenceClassification
Verified forward pass with small config

Rocketknight1 · 2026-04-24T11:41:52Z

There's an existing PR at #41116, but it's quite stale at this point

aliyevaladddin · 2026-04-24T14:34:37Z

Thanks @Rocketknight1! Since #41116 has been stale since October 2025, I'd like to take over
and submit a fresh PR with MiniCPM3 support using the current modular pattern. I have a working
implementation with tests ready — will open a PR shortly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-24T19:01:31Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, minicpm3

aliyevaladddin added 2 commits April 23, 2026 20:15

Merge branch 'huggingface:main' into main

16b7176

Добавить тесты для модели MiniCPM3 в PyTorch

286a906

aliyevaladddin added 2 commits April 24, 2026 17:38

Merge branch 'main' into main

a9f3ef7

Merge branch 'huggingface:main' into main

3c4f8f3

aliyevaladddin and others added 2 commits April 24, 2026 22:02

Merge branch 'huggingface:main' into main

8393569

Skip sdpa padding test for MiniCPM3 due to MLA attention incompatibility

b5e533a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

aliyevaladddin force-pushed the main branch from 13930f7 to b5e533a Compare April 24, 2026 18:28

Add MiniCPM3 documentation and toctree entry

beaff99

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model] Add MiniCPM3 support #45613

[New Model] Add MiniCPM3 support #45613
aliyevaladddin wants to merge 8 commits intohuggingface:mainfrom
aliyevaladddin:main

aliyevaladddin commented Apr 23, 2026

Uh oh!

aliyevaladddin commented Apr 23, 2026

Uh oh!

Rocketknight1 commented Apr 24, 2026

Uh oh!

aliyevaladddin commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aliyevaladddin commented Apr 23, 2026

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

aliyevaladddin commented Apr 23, 2026

Uh oh!

Rocketknight1 commented Apr 24, 2026

Uh oh!

aliyevaladddin commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants