Skip to content

fix(deepseek): pass rms_norm_eps to MLA q/kv layernorms#44317

Closed
sxu75374 wants to merge 1 commit into
huggingface:mainfrom
sxu75374:fix/deepseek-mla-layernorm-eps
Closed

fix(deepseek): pass rms_norm_eps to MLA q/kv layernorms#44317
sxu75374 wants to merge 1 commit into
huggingface:mainfrom
sxu75374:fix/deepseek-mla-layernorm-eps

Conversation

@sxu75374
Copy link
Copy Markdown

What does this PR do?

Passes config.rms_norm_eps explicitly to q_a_layernorm and kv_a_layernorm in both DeepSeek V2 and V3 MLA attention.

Currently these two norms are constructed without eps, falling back to the RMSNorm class default (1e-6). Every other RMSNorm in these models correctly uses config.rms_norm_eps.

Why it matters: While DeepSeek's published configs happen to set rms_norm_eps=1e-6, omitting the explicit parameter is a latent bug — any fine-tuned checkpoint or future variant with a different epsilon would silently use the wrong value for these two norms, causing precision drift that's extremely hard to track down.

This aligns transformers with vLLM and SGLang, which both pass config.rms_norm_eps explicitly.

Fixes #44261

Changes

  • modeling_deepseek_v2.py: pass eps=config.rms_norm_eps to q_a_layernorm and kv_a_layernorm
  • modeling_deepseek_v3.py: same fix

4 lines changed across 2 files — minimal, zero-risk.

Before submitting

No new tests — this is a parameter-passing fix. Existing model tests cover the affected code paths.

Who can review?

@ArthurZucker @Cyrilvallez

The q_a_layernorm and kv_a_layernorm in DeepSeek V2/V3 MLA attention
are constructed without passing eps, falling back to the class default
(1e-6). All other RMSNorm instances in these models correctly use
config.rms_norm_eps.

While DeepSeek's published configs happen to use eps=1e-6 today,
omitting the explicit parameter is a latent bug: any fine-tuned
checkpoint or future model variant with a different rms_norm_eps would
silently use the wrong epsilon, causing precision drift.

Aligns transformers with vLLM and SGLang, which both pass
config.rms_norm_eps explicitly.

Fixes huggingface#44261

Signed-off-by: sxu75374 <imshuaixu@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v2, deepseek_v3

@Rocketknight1
Copy link
Copy Markdown
Member

I'm closing this PR: It's not just you — it's me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug/Discussion] MLA q_a_layernorm Missing config.rms_norm_eps, Causing 1e-5/1e-6 Precision Error

2 participants