from_pretrained orchestration + distributed save/load by 3outeille · Pull Request #45409 · huggingface/transformers

3outeille · 2026-04-13T14:27:38Z

Summary

Full distributed_config integration in from_pretrained() — mesh creation, apply TP + FSDP, attach model.device_mesh
gather_full_state_dict() for streaming DTensor→full tensor saving (rank 0 only)
convert_strided_to_shard() / restore_strided_from_shard() for DCP compatibility with _StridedShard
save_optimizer() / load_optimizer() in distributed/utils.py
Rename apply_fsdp2 → apply_fully_shard_data_parallel
Trainer integration with distributed_config

Part of the distributed training API chain: #44989

Chain: main ← #44989 ← #44083 ← #44974 ← #45028 ← #45408 ← this PR

Review question

Does from_pretrained wire things up in the right order? Is save/load round-trip correct?

Test plan

End-to-end from_pretrained with distributed_config
gather_full_state_dict() roundtrip verification
save_optimizer() / load_optimizer() roundtrip
Run existing FSDP and TP mixin tests

- Add gather_full_state_dict() for DTensor→full tensor saving - Add convert_strided_to_shard() / restore_strided_from_shard() for DCP - Add _redistribute_dtensor() helper - Full distributed_config integration in from_pretrained/save_pretrained - Rename apply_fsdp2 → apply_fully_shard_data_parallel - save_optimizer() / load_optimizer() in distributed/utils - Trainer integration with distributed_config - Updated FSDP and TP tests for new orchestration API - DTensor shard-on-read test updates

3outeille · 2026-04-14T08:29:18Z

src/transformers/integrations/__init__.py

-    "ALL_PARALLEL_STYLES",
-    "translate_to_torch_parallel_style",
-]
+_import_structure["tensor_parallel"] = []


why is it empty ? do we still need it ?

3outeille · 2026-04-14T08:31:01Z

src/transformers/modeling_utils.py

    def get_correct_experts_implementation(self, requested_experts: str | None) -> str:
        applicable_experts = "grouped_mm" if requested_experts is None else requested_experts
-        if applicable_experts not in ["eager", "grouped_mm", "batched_mm", "deepgemm"]:
+        if applicable_experts not in ["eager", "grouped_mm", "batched_mm"]:


why was deepgemm removed ?

3outeille · 2026-04-14T08:31:48Z

src/transformers/modeling_utils.py

-            # so that custom models calling .item() during __init__ (e.g. drop-path
-            # schedules) don't crash on meta tensors.
-            init_contexts.extend([torch.device("meta"), init.meta_device_safe_creation_ops()])
+            init_contexts.append(torch.device("meta"))


revert back

3outeille · 2026-04-14T08:32:59Z

src/transformers/modeling_utils.py

-            if device_mesh is not None:
-                shard_and_distribute_module(
-                    self, value, param, key, None, False, device_mesh.get_local_rank(), device_mesh
+            from torch.distributed.tensor import DTensor


remove import

HuggingFaceDocBuilderDev · 2026-04-14T09:40:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# Conflicts: # src/transformers/distributed/utils.py

github-actions · 2026-04-14T10:15:08Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45409&sha=7361de

3outeille mentioned this pull request Apr 13, 2026

🚨 Distributed training API #44989

Draft

3outeille commented Apr 14, 2026

View reviewed changes

3outeille force-pushed the moe-sequence-parallel branch from e04c7d9 to 24ca327 Compare April 14, 2026 09:54

Merge branch 'moe-sequence-parallel' into orchestration-save-load

7361deb

# Conflicts: # src/transformers/distributed/utils.py

3outeille force-pushed the orchestration-save-load branch from 815b5b2 to 7361deb Compare April 14, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from_pretrained orchestration + distributed save/load#45409

from_pretrained orchestration + distributed save/load#45409
3outeille wants to merge 2 commits intomoe-sequence-parallelfrom
orchestration-save-load

3outeille commented Apr 13, 2026

Uh oh!

3outeille Apr 14, 2026

Uh oh!

3outeille Apr 14, 2026

Uh oh!

3outeille Apr 14, 2026

Uh oh!

3outeille Apr 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

3outeille commented Apr 13, 2026

Summary

Review question

Test plan

Uh oh!

3outeille Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants