fix zero3 init config by SunMarc · Pull Request #44236 · huggingface/transformers

SunMarc · 2026-02-23T17:20:01Z

What does this PR do?

Supersedes #43847

When using zero3 + from_config, the model was incorrectly initialized as we were not gathering the params. Added a test also.

cc @tohtana

When using `from_config()` with DeepSpeed ZeRO-3, `_init_weights()` silently operated on partitioned empty tensors, making custom initialization a no-op. Parameters retained PyTorch's default kaiming_uniform_ instead of the intended initialization, causing abnormally large gradients and loss. The fix suppresses init during construction via `no_init_weights()`, then re-initializes module-by-module using `GatheredParameters` so each module's parameters are gathered before init runs. Co-Authored-By: Masahiro Tanaka <tohtana@users.noreply.github.com>

…config-init

SunMarc · 2026-02-23T17:26:45Z

@bot /style

github-actions · 2026-02-23T17:27:22Z

Style fix bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2026-02-23T17:41:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix zero3 init config * Fix ZeRO-3 weight initialization in `_from_config()` When using `from_config()` with DeepSpeed ZeRO-3, `_init_weights()` silently operated on partitioned empty tensors, making custom initialization a no-op. Parameters retained PyTorch's default kaiming_uniform_ instead of the intended initialization, causing abnormally large gradients and loss. The fix suppresses init during construction via `no_init_weights()`, then re-initializes module-by-module using `GatheredParameters` so each module's parameters are gathered before init runs. Co-Authored-By: Masahiro Tanaka <tohtana@users.noreply.github.com> * Apply repo consistency fixes --------- Co-authored-by: Masahiro Tanaka <tohtana@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

SunMarc and others added 3 commits February 23, 2026 17:17

fix zero3 init config

c82d4e6

Merge remote-tracking branch 'origin/fix-dp-config-init' into fix-dp-…

13f2542

…config-init

SunMarc requested a review from Cyrilvallez February 23, 2026 17:23

Apply repo consistency fixes

5a2ed44

winglian approved these changes Feb 24, 2026

View reviewed changes

SunMarc merged commit a264509 into main Feb 27, 2026
26 checks passed

SunMarc deleted the fix-dp-config-init branch February 27, 2026 11:36

tohtana mentioned this pull request Feb 27, 2026

Fix _from_config silently skipping weight initialization under DeepSpeed ZeRO-3 #43847

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix zero3 init config#44236

fix zero3 init config#44236
SunMarc merged 4 commits intomainfrom
fix-dp-config-init

SunMarc commented Feb 23, 2026 •

edited

Loading

Uh oh!

SunMarc commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SunMarc commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SunMarc commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SunMarc commented Feb 23, 2026 •

edited

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading