Fix missing rms_norm_eps in DeepseekV3 MLA layernorms by mvanhorn · Pull Request #44585 · huggingface/transformers

mvanhorn · 2026-03-11T00:20:54Z

What does this PR do?

Passes eps=config.rms_norm_eps to both q_a_layernorm and kv_a_layernorm in the DeepseekV3 MLA attention module. Without this, these layernorms default to eps=1e-5 instead of the config value (1e-6), causing precision differences compared to vLLM and SGLang implementations.

The fix was applied to modular_deepseek_v3.py and propagated to generated modeling files (deepseek_v3, glm4_moe_lite, longcat_flash, youtu) via make fix-repo.

Note: DeepseekV2 has the same issue but is left for a separate PR to keep this focused.

Fixes #44261

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. [Bug/Discussion] MLA q_a_layernorm Missing config.rms_norm_eps, Causing 1e-5/1e-6 Precision Error #44261
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez (text models, attention)

This contribution was developed with AI assistance (Claude Code).

Pass `eps=config.rms_norm_eps` to both `q_a_layernorm` and `kv_a_layernorm` in DeepseekV3 attention. Without this, these layernorms use the default eps (1e-5) instead of the config value (1e-6), causing precision errors vs vLLM/SGLang implementations. Edit applied to modular_deepseek_v3.py; generated modeling files (deepseek_v3, glm4_moe_lite, longcat_flash, youtu) updated via `make fix-repo`. Fixes huggingface#44261 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-11T00:22:28Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v3, glm4_moe_lite, longcat_flash, youtu

alvinttang

The fix is correct — without passing eps=config.rms_norm_eps, the MLA layernorms for both q_a_layernorm and kv_a_layernorm would silently use the RMSNorm default epsilon (typically 1e-6) instead of the model-configured value, which would cause subtle numerical divergence from reference implementations without any error. It is worth noting this same fix is applied consistently across all derived models (glm4_moe_lite, longcat_flash, youtu) and the modular source, which is the right approach. One open question: does the DeepseekV3RMSNorm default epsilon happen to match the typical config value, masking this bug in practice? If so, a unit test asserting the epsilon is correctly propagated would prevent future regressions of this class.

Cyrilvallez · 2026-03-12T09:48:36Z

cc @ArthurZucker, do you know if the norms in the Attention are supposed to use the config value as well or not?

mvanhorn · 2026-03-12T14:39:12Z

The MLA attention layernorms (q_a_layernorm and kv_a_layernorm) are the only layernorms in DeepseekV3 that weren't getting config.rms_norm_eps passed through - all other norms in the model (input, post-attention, MLP) already use it. This PR just makes them consistent.

The same pattern is applied across the derived models (glm4_moe_lite, longcat_flash, youtu) and the modular source, so it stays consistent everywhere.

mvanhorn mentioned this pull request Mar 11, 2026

[Bug/Discussion] MLA q_a_layernorm Missing config.rms_norm_eps, Causing 1e-5/1e-6 Precision Error #44261

Open

4 tasks

alvinttang reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing rms_norm_eps in DeepseekV3 MLA layernorms#44585

Fix missing rms_norm_eps in DeepseekV3 MLA layernorms#44585
mvanhorn wants to merge 1 commit intohuggingface:mainfrom
mvanhorn:osc/44261-fix-deepseek-v3-layernorm-eps

mvanhorn commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

alvinttang left a comment

Uh oh!

Cyrilvallez commented Mar 12, 2026

Uh oh!

mvanhorn commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mvanhorn commented Mar 11, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Mar 12, 2026

Uh oh!

mvanhorn commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants