[DTensor] implement dist_split as a sharding prop rule #93306

XilunWu · 2023-01-30T21:36:01Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-01-30T21:36:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93306

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1cc5d5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 17951d93ac33b0c1c58caf83c2767cb91c121494 Pull Request resolved: #93306

[ghstack-poisoned]

ghstack-source-id: 737cb0f8c42183cc069c23d2bb219422fe0541e1 Pull Request resolved: #93306

[ghstack-poisoned]

ghstack-source-id: 95deefbd55f41ae0d2985935c086b577eb7b8021 Pull Request resolved: #93306

wanchaol

Almost ready! Have some nits and suggestions inlined.

torch/distributed/_tensor/ops/tensor_ops.py

wanchaol · 2023-02-01T20:58:09Z

torch/distributed/_tensor/ops/tensor_ops.py

+
+    # TODO: just like slice op, split replicates before splitting
+    # on a sharded dimension
+    # TODO: shall we consider partial???


we should consider partial (maybe we can add this later), and because the dtensor_ops test does not generate partial inputs, we also need to add the partial inputs to op db

torch/distributed/_tensor/ops/tensor_ops.py

wanchaol · 2023-02-01T23:43:47Z

torch/distributed/_tensor/ops/tensor_ops.py

+            placements=unshard_tensor_dim(input_spec.placements, dim=dim),
+            shape=input_spec.shape,
+            ndim=input_spec.ndim,
+        )


nit: add a check to partial input_spec and raise NotImplementedError so we know to implement this later?

sounds good!

[ghstack-poisoned]

ghstack-source-id: fb8f5360b3a521480329ed2fa2a817c2de018161 Pull Request resolved: #93306

XilunWu · 2023-02-02T04:57:52Z

@pytorchmergebot merge -g

pytorchmergebot · 2023-02-02T04:59:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fduwjj · 2023-05-02T17:08:25Z

torch/distributed/_tensor/ops/tensor_ops.py

+    need_reshard = False
+    if is_tensor_dim_sharded(input_spec, dim=dim):


This somehow broke TP's code logic. Because a common technique people are using is that the DTensor is sharded on the last dim and they call split on the last dim too. We still want the result to be sharded on dim=-1.

Currently after split, we got replicate as a DTensor.

[DTensor] implement dist_split as a sharding prop rule

56c4cea

[ghstack-poisoned]

XilunWu requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501 and wanchaol as code owners January 30, 2023 21:36

XilunWu added a commit that referenced this pull request Jan 30, 2023

[DTensor] implement dist_split as a sharding prop rule

98e3432

ghstack-source-id: 17951d93ac33b0c1c58caf83c2767cb91c121494 Pull Request resolved: #93306

XilunWu marked this pull request as draft January 30, 2023 21:36

XilunWu changed the title ~~[DTensor] implement dist_split as a sharding prop rule~~ [WIP] [DTensor] implement dist_split as a sharding prop rule Jan 30, 2023

XilunWu added the release notes: distributed (dtensor) release notes category label Jan 30, 2023

Update on "[WIP] [DTensor] implement dist_split as a sharding prop rule"

e5b8af3

[ghstack-poisoned]

This was referenced Feb 1, 2023

[DTensor] fix DTensorSpec dim_map description #93160

Closed

[DTensor][fix] MultiThreadedTestCase misses _tls object and it won't reflect in CI #93832

Closed

XilunWu added a commit that referenced this pull request Feb 1, 2023

[DTensor] implement dist_split as a sharding prop rule

aee3ddd

ghstack-source-id: 737cb0f8c42183cc069c23d2bb219422fe0541e1 Pull Request resolved: #93306

XilunWu marked this pull request as ready for review February 1, 2023 08:28

XilunWu changed the title ~~[WIP] [DTensor] implement dist_split as a sharding prop rule~~ [DTensor] implement dist_split as a sharding prop rule Feb 1, 2023

Update on "[DTensor] implement dist_split as a sharding prop rule"

61c676a

[ghstack-poisoned]

XilunWu added a commit that referenced this pull request Feb 1, 2023

[DTensor] implement dist_split as a sharding prop rule

d7f1c3d

ghstack-source-id: 95deefbd55f41ae0d2985935c086b577eb7b8021 Pull Request resolved: #93306

wanchaol reviewed Feb 1, 2023

View reviewed changes

wanchaol approved these changes Feb 1, 2023

View reviewed changes

XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 2, 2023

Update on "[DTensor] implement dist_split as a sharding prop rule"

e1cc5d5

[ghstack-poisoned]

XilunWu added a commit that referenced this pull request Feb 2, 2023

[DTensor] implement dist_split as a sharding prop rule

b2561e0

ghstack-source-id: fb8f5360b3a521480329ed2fa2a817c2de018161 Pull Request resolved: #93306

pytorchmergebot added the Merged label Feb 2, 2023

pytorchmergebot closed this in 6f3018d Feb 2, 2023

XilunWu deleted the gh/XilunWu/14/head branch April 11, 2023 21:40

fduwjj reviewed May 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] implement dist_split as a sharding prop rule #93306

[DTensor] implement dist_split as a sharding prop rule #93306

XilunWu commented Jan 30, 2023 •

edited

pytorch-bot bot commented Jan 30, 2023 •

edited

wanchaol left a comment

wanchaol Feb 1, 2023

wanchaol Feb 1, 2023

XilunWu Feb 1, 2023

XilunWu commented Feb 2, 2023

pytorchmergebot commented Feb 2, 2023

fduwjj May 2, 2023

fduwjj May 2, 2023

		need_reshard = False
		if is_tensor_dim_sharded(input_spec, dim=dim):

[DTensor] implement dist_split as a sharding prop rule #93306

[DTensor] implement dist_split as a sharding prop rule #93306

Conversation

XilunWu commented Jan 30, 2023 • edited

pytorch-bot bot commented Jan 30, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93306

✅ No Failures

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol Feb 1, 2023

Choose a reason for hiding this comment

wanchaol Feb 1, 2023

Choose a reason for hiding this comment

XilunWu Feb 1, 2023

Choose a reason for hiding this comment

XilunWu commented Feb 2, 2023

pytorchmergebot commented Feb 2, 2023

Merge started

fduwjj May 2, 2023

Choose a reason for hiding this comment

fduwjj May 2, 2023

Choose a reason for hiding this comment

XilunWu commented Jan 30, 2023 •

edited

pytorch-bot bot commented Jan 30, 2023 •

edited