Skip to content

Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32#42580

Merged
ArthurZucker merged 3 commits intohuggingface:mainfrom
xin3he:fp32_e_score_correction_bias
Dec 10, 2025
Merged

Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32#42580
ArthurZucker merged 3 commits intohuggingface:mainfrom
xin3he:fp32_e_score_correction_bias

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Dec 3, 2025

What does this PR do?

Fixes # (issue)

Hi, we find that below loading method cannot keep the dtype of e_score_correction_bias in safetensors, so raised this PR to fix it.

Please let me know if this is not the expected change. Thanks.

intel/auto-round#845

from transformers import AutoModelForCausalLM
model_name = "deepseek-ai/DeepSeek-V3.1-Terminus"

model = AutoModelForCausalLM.from_pretrained(
        model_name,torch_dtype="auto"
)

print(model.model.layers[3].mlp.gate.weight) # BF16
print(model.model.layers[3].mlp.gate.e_score_correction_bias) # FP32

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@Cyrilvallez @ArthurZucker

@xin3he xin3he force-pushed the fp32_e_score_correction_bias branch 2 times, most recently from 83219c0 to b5d1fea Compare December 8, 2025 08:28
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: deepseek_v3, dots1, glm4_moe, glm4v_moe

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect TY!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker merged commit 2e29a9a into huggingface:main Dec 10, 2025
17 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
…ngface#42580)

* Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32

* fix CI

Signed-off-by: He, Xin3 <xin3.he@intel.com>

---------

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants