[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

fduwjj · 2025-09-19T18:34:58Z

Stack from ghstack (oldest at bottom):

[DeviceMesh][2D] Use concatenate for 2D (FSDP+TP) instead of getting from root mesh #165492
-> [DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358
[DeviceMesh] Enable mesh universe concept in mesh comparison #165680

Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users.

One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation.

cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

[ghstack-poisoned]

pytorch-bot · 2025-09-19T18:35:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163358

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Cancelled Jobs

As of commit cb8dd0c with merge base d795fb2 ():

NEW FAILURES - The following jobs have failed:

Check mergeability of ghstack PR / ghstack-mergeability-check (gh)
RuntimeError: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x e8331aa returned non-zero exit code 1
pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge) (gh)
test/distributed/test_device_mesh.py::DeviceMeshTest::test_from_group_with_global_pg
pull / linux-jammy-py3.10-gcc11 / test (distributed, 2, 2, lf.linux.2xlarge) (gh)
test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dtensor_attribute_access_on_intermediate

CANCELLED JOBS - The following jobs were cancelled. Please retry:

trunk / linux-jammy-cuda12.8-py3.10-gcc11 / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
trunk / linux-jammy-py3-clang12-executorch / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
trunk / verify-cachebench-cpu-build / build (gh)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 814f54f Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: dd5ba8d Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 4544de6 Pull Request resolved: #163358

torch/distributed/device_mesh.py

ezyang · 2025-09-21T01:16:19Z

torch/distributed/device_mesh.py

+                get_world_size(),
+            )
+
+            for mesh_nd in pg_ranks_by_dim:


?!?! Why do you need to do it for every mesh_nd? Is this because you're triggering comms to initialize PGs?

so long story short, we need all ranks to call new_group which is hidden very deep in the stack to initialize PGs. Otherwise the code will hang.

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 5bd2d76 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: de04fad Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 14dff21 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: b6a332e Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 367e222 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 2b49102 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 824110d Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 56973f7 Pull Request resolved: #163358

…ubmesh and SPMD use case" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 0975af2 Pull Request resolved: #163358

…ubmesh and SPMD use case" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: a0c5e2a Pull Request resolved: #163358

…ubmesh and SPMD use case" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: e2e15f4 Pull Request resolved: #163358

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

09db0ef

[ghstack-poisoned]

This was referenced Sep 19, 2025

[DeviceMesh] Introduce CuTe layout into devicemesh code base for internal bookkeeping #163212

Closed

[DeviceMesh] Add extra check in flatten result cache lookup #163288

Closed

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 19, 2025

fduwjj mentioned this pull request Sep 19, 2025

[DeviceMesh] Simplifying internal bookkeeping with CuTe layout #163213

Closed

fduwjj mentioned this pull request Sep 19, 2025

[device_mesh] Implement _unflatten on top of CuTe layout bookkeeping #161224

Closed

fduwjj added a commit that referenced this pull request Sep 19, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

9a6554e

ghstack-source-id: 814f54f Pull Request resolved: #163358

fduwjj requested review from ezyang, fegin and tianyu-l September 19, 2025 18:42

fduwjj added the release notes: DeviceMesh label Sep 19, 2025

fduwjj mentioned this pull request Sep 19, 2025

[CuTe] Add layout overlap checking util function in _MeshLayout #163367

Closed

fduwjj added a commit that referenced this pull request Sep 19, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

0cb4d74

ghstack-source-id: dd5ba8d Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 20, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

c6a4a27

ghstack-source-id: 4544de6 Pull Request resolved: #163358

ezyang reviewed Sep 21, 2025

View reviewed changes

torch/distributed/device_mesh.py Outdated Show resolved Hide resolved

ezyang reviewed Sep 21, 2025

View reviewed changes

torch/distributed/device_mesh.py Show resolved Hide resolved

ezyang reviewed Sep 21, 2025

View reviewed changes

fduwjj added a commit that referenced this pull request Sep 24, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

3fe5b5f

ghstack-source-id: 5bd2d76 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 25, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

e9a267c

ghstack-source-id: de04fad Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 27, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

abb7386

ghstack-source-id: 14dff21 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 30, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

c7f630a

ghstack-source-id: b6a332e Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 1, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

829e42a

ghstack-source-id: 367e222 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 13, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

efa2604

ghstack-source-id: 2b49102 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 14, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

203dc7d

ghstack-source-id: 824110d Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 14, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

16738cf

ghstack-source-id: 56973f7 Pull Request resolved: #163358

fduwjj changed the title ~~[For Discussion][DeviceMesh] Implement a concatenate api for submesh~~ [DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case Oct 14, 2025

fduwjj added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025

fduwjj added a commit that referenced this pull request Oct 14, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

78f9691

ghstack-source-id: 0975af2 Pull Request resolved: #163358

fduwjj mentioned this pull request Oct 15, 2025

[DeviceMesh][2D] Use concatenate for 2D (FSDP+TP) instead of getting from root mesh #165492

Open

ezyang requested a review from lw October 15, 2025 01:00

fduwjj added a commit that referenced this pull request Oct 15, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

0a7de4f

ghstack-source-id: a0c5e2a Pull Request resolved: #163358

fduwjj added 2 commits October 15, 2025 11:39

fduwjj mentioned this pull request Oct 16, 2025

[DeviceMesh] Enable mesh universe concept in mesh comparison #165680

Open

fduwjj added a commit that referenced this pull request Oct 16, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

55d13b9

ghstack-source-id: e2e15f4 Pull Request resolved: #163358

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

Uh oh!

fduwjj commented Sep 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ezyang Sep 21, 2025

Uh oh!

fduwjj Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

Are you sure you want to change the base?

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

Uh oh!

Conversation

fduwjj commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163358

❌ 3 New Failures, 3 Cancelled Jobs

Uh oh!

Uh oh!

Uh oh!

ezyang Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

fduwjj Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fduwjj commented Sep 19, 2025 •

edited

Loading

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading