[deepseek_v4] keep hc_head / sinks / position_bias in fp32 by ArthurZucker · Pull Request #46198 · huggingface/transformers

ArthurZucker · 2026-05-25T10:05:34Z

Fixes #46167 — adds hc_head, sinks, position_bias to _keep_in_fp32_modules_strict so the remaining 112 fp32 tensors stop being silently downcast to bf16.

Issue #46167: 417 fp32 plumbing tensors get downcast to bf16 because `_keep_in_fp32_modules_strict` was missing entries for `hc_head` (top-level + MTP), `sinks` (per-attention sink token), and `position_bias` (compressor and indexer compressor). Adds the three patterns so save_pretrained preserves the source dtype for the full set of 417 tensors instead of 305.

github-actions · 2026-05-25T10:06:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v4

HuggingFaceDocBuilderDev · 2026-05-25T10:18:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

All right

vasqu

That's quite a lot of small params being strictly fp32, is remote the same or what are they doing?

Anyways, trusting you and seems reasonable to me :D

vasqu · 2026-05-25T13:11:10Z

Ah seems like it only happens on save? Do we maybe have something silent like nn.Parameter(..., dtype=torch.float32)? They might cause the discrepancy between loading and saving

ArthurZucker · 2026-05-27T09:44:56Z

No no I think it happens on load as well, but I'll check the original parameter dtypes to be sure!

…ce#46198) Issue huggingface#46167: 417 fp32 plumbing tensors get downcast to bf16 because `_keep_in_fp32_modules_strict` was missing entries for `hc_head` (top-level + MTP), `sinks` (per-attention sink token), and `position_bias` (compressor and indexer compressor). Adds the three patterns so save_pretrained preserves the source dtype for the full set of 417 tensors instead of 305.

ArthurZucker requested review from Cyrilvallez and vasqu May 25, 2026 10:21

Cyrilvallez approved these changes May 25, 2026

View reviewed changes

vasqu approved these changes May 25, 2026

View reviewed changes

ArthurZucker merged commit 9ded3db into main May 27, 2026
43 checks passed

ArthurZucker deleted the fix-deepseek-v4-keep-in-fp32 branch May 27, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deepseek_v4] keep hc_head / sinks / position_bias in fp32#46198

[deepseek_v4] keep hc_head / sinks / position_bias in fp32#46198
ArthurZucker merged 1 commit into
mainfrom
fix-deepseek-v4-keep-in-fp32

ArthurZucker commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 25, 2026

Uh oh!

Cyrilvallez left a comment

Uh oh!

vasqu left a comment

Uh oh!

vasqu commented May 25, 2026 •

edited

Loading

Uh oh!

ArthurZucker commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ArthurZucker commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 25, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ArthurZucker commented May 25, 2026 •

edited

Loading

vasqu commented May 25, 2026 •

edited

Loading