fix(deepseek): pass rms_norm_eps to MLA q/kv layernorms by sxu75374 · Pull Request #44317 · huggingface/transformers

sxu75374 · 2026-02-27T04:48:08Z

What does this PR do?

Passes config.rms_norm_eps explicitly to q_a_layernorm and kv_a_layernorm in both DeepSeek V2 and V3 MLA attention.

Currently these two norms are constructed without eps, falling back to the RMSNorm class default (1e-6). Every other RMSNorm in these models correctly uses config.rms_norm_eps.

Why it matters: While DeepSeek's published configs happen to set rms_norm_eps=1e-6, omitting the explicit parameter is a latent bug — any fine-tuned checkpoint or future variant with a different epsilon would silently use the wrong value for these two norms, causing precision drift that's extremely hard to track down.

This aligns transformers with vLLM and SGLang, which both pass config.rms_norm_eps explicitly.

Fixes #44261

Changes

modeling_deepseek_v2.py: pass eps=config.rms_norm_eps to q_a_layernorm and kv_a_layernorm
modeling_deepseek_v3.py: same fix

4 lines changed across 2 files — minimal, zero-risk.

Before submitting

Did you read the contributor guideline?
Was this discussed/approved via a Github issue? [Bug/Discussion] MLA q_a_layernorm Missing config.rms_norm_eps, Causing 1e-5/1e-6 Precision Error #44261
Did you write any new necessary tests?

No new tests — this is a parameter-passing fix. Existing model tests cover the affected code paths.

Who can review?

@ArthurZucker @Cyrilvallez

The q_a_layernorm and kv_a_layernorm in DeepSeek V2/V3 MLA attention are constructed without passing eps, falling back to the class default (1e-6). All other RMSNorm instances in these models correctly use config.rms_norm_eps. While DeepSeek's published configs happen to use eps=1e-6 today, omitting the explicit parameter is a latent bug: any fine-tuned checkpoint or future model variant with a different rms_norm_eps would silently use the wrong epsilon, causing precision drift. Aligns transformers with vLLM and SGLang, which both pass config.rms_norm_eps explicitly. Fixes huggingface#44261 Signed-off-by: sxu75374 <imshuaixu@gmail.com>

github-actions · 2026-02-27T04:49:10Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v2, deepseek_v3

Rocketknight1 · 2026-02-27T14:29:54Z

I'm closing this PR: It's not just you — it's me.

Rocketknight1 closed this Feb 27, 2026

Rocketknight1 added the Code agent slop label Feb 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deepseek): pass rms_norm_eps to MLA q/kv layernorms#44317

fix(deepseek): pass rms_norm_eps to MLA q/kv layernorms#44317
sxu75374 wants to merge 1 commit into
huggingface:mainfrom
sxu75374:fix/deepseek-mla-layernorm-eps

sxu75374 commented Feb 27, 2026

Uh oh!

github-actions Bot commented Feb 27, 2026

Uh oh!

Rocketknight1 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sxu75374 commented Feb 27, 2026

What does this PR do?

Changes

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Feb 27, 2026

Uh oh!

Rocketknight1 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants