Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DTensor] implement dist_split as a sharding prop rule #93306

Closed
wants to merge 4 commits into from

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 30, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93306

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1cc5d5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

XilunWu added a commit that referenced this pull request Jan 30, 2023
ghstack-source-id: 17951d93ac33b0c1c58caf83c2767cb91c121494
Pull Request resolved: #93306
@XilunWu XilunWu marked this pull request as draft January 30, 2023 21:36
@XilunWu XilunWu changed the title [DTensor] implement dist_split as a sharding prop rule [WIP] [DTensor] implement dist_split as a sharding prop rule Jan 30, 2023
@XilunWu XilunWu added the release notes: distributed (dtensor) release notes category label Jan 30, 2023
XilunWu added a commit that referenced this pull request Feb 1, 2023
ghstack-source-id: 737cb0f8c42183cc069c23d2bb219422fe0541e1
Pull Request resolved: #93306
@XilunWu XilunWu marked this pull request as ready for review February 1, 2023 08:28
@XilunWu XilunWu changed the title [WIP] [DTensor] implement dist_split as a sharding prop rule [DTensor] implement dist_split as a sharding prop rule Feb 1, 2023
XilunWu added a commit that referenced this pull request Feb 1, 2023
ghstack-source-id: 95deefbd55f41ae0d2985935c086b577eb7b8021
Pull Request resolved: #93306
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready! Have some nits and suggestions inlined.

torch/distributed/_tensor/ops/tensor_ops.py Outdated Show resolved Hide resolved

# TODO: just like slice op, split replicates before splitting
# on a sharded dimension
# TODO: shall we consider partial???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider partial (maybe we can add this later), and because the dtensor_ops test does not generate partial inputs, we also need to add the partial inputs to op db

torch/distributed/_tensor/ops/tensor_ops.py Outdated Show resolved Hide resolved
torch/distributed/_tensor/ops/tensor_ops.py Show resolved Hide resolved
placements=unshard_tensor_dim(input_spec.placements, dim=dim),
shape=input_spec.shape,
ndim=input_spec.ndim,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a check to partial input_spec and raise NotImplementedError so we know to implement this later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@XilunWu XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 2, 2023
XilunWu added a commit that referenced this pull request Feb 2, 2023
ghstack-source-id: fb8f5360b3a521480329ed2fa2a817c2de018161
Pull Request resolved: #93306
@XilunWu
Copy link
Contributor Author

XilunWu commented Feb 2, 2023

@pytorchmergebot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@XilunWu XilunWu deleted the gh/XilunWu/14/head branch April 11, 2023 21:40
Comment on lines +628 to +629
need_reshard = False
if is_tensor_dim_sharded(input_spec, dim=dim):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somehow broke TP's code logic. Because a common technique people are using is that the DTensor is sharded on the last dim and they call split on the last dim too. We still want the result to be sharded on dim=-1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently after split, we got replicate as a DTensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants