[dtensor] move pad/unpad_tensor to separate utils #124871

wanchaol · 2024-04-24T19:36:53Z

Stack from ghstack (oldest at bottom):

as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument [ghstack-poisoned]

pytorch-bot · 2024-04-24T19:36:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124871

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ba282a6 with merge base e592a60 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu · 2024-04-24T20:22:07Z

Looks like need to migrate test_dtensor.py

shard_placement._unpad_tensor(tensor, pad_sizes[i])

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

wanchaol · 2024-04-24T20:29:43Z

Looks like need to migrate test_dtensor.py
shard_placement._unpad_tensor(tensor, pad_sizes[i])

good catch! updated

wanchaol · 2024-04-24T21:02:53Z

@pytorchbot merge

pytorchmergebot · 2024-04-24T21:04:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-04-24T21:04:48Z

Merge failed

Reason: 4 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi, trunk, linux-binary-manywheel, linux-binary-libtorch-pre-cxx11

Details for Dev Infra team

Raised by workflow job

wz337

LGTM

wanchaol · 2024-04-24T23:59:23Z

@pytorchbot merge

pytorchmergebot · 2024-04-25T00:01:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-04-25T00:02:06Z

Merge failed

Reason: 4 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi, trunk, linux-binary-manywheel, linux-binary-libtorch-pre-cxx11

Details for Dev Infra team

Raised by workflow job

wanchaol · 2024-04-25T00:02:36Z

@pytorchbot merge -i

jeanschmidt · 2024-04-26T09:28:53Z

@pytorchbot revert -m "Broke internal tests, see D56587991 for more details" -c ignoredsignal

pytorchmergebot · 2024-04-26T09:30:27Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 0b0eea2. Reverted #124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](#124871 (comment)))

pytorchmergebot · 2024-04-26T09:30:39Z

@wanchaol your PR has been successfully reverted.

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument Pull Request resolved: pytorch#124871 Approved by: https://github.com/awgu, https://github.com/wz337

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

as titled, we implement a dedicated communication op to allow efficient sharding dimension change using alltoall, to replace our previous allgather + local chunk Pull Request resolved: #124872 Approved by: https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: #124871

as titled, as we have a dedicated comm op, this is not needed anymore Pull Request resolved: #124879 Approved by: https://github.com/XilunWu, https://github.com/wz337 ghstack dependencies: #124871, #124872

as titled, we implement a dedicated communication op to allow efficient sharding dimension change using alltoall, to replace our previous allgather + local chunk Pull Request resolved: #124872 Approved by: https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: #124871

as titled, as we have a dedicated comm op, this is not needed anymore Pull Request resolved: #124879 Approved by: https://github.com/XilunWu, https://github.com/wz337 ghstack dependencies: #124871, #124872

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument Pull Request resolved: #124871 Approved by: https://github.com/awgu, https://github.com/wz337

This reverts commit 0b0eea2. Reverted #124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](#124871 (comment)))

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument Pull Request resolved: pytorch#124871 Approved by: https://github.com/awgu, https://github.com/wz337, https://github.com/XilunWu

as titled, we implement a dedicated communication op to allow efficient sharding dimension change using alltoall, to replace our previous allgather + local chunk Pull Request resolved: #124872 Approved by: https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: #124871

as titled, as we have a dedicated comm op, this is not needed anymore Pull Request resolved: pytorch#124879 Approved by: https://github.com/XilunWu, https://github.com/wz337 ghstack dependencies: pytorch#124871, pytorch#124872

as titled, we implement a dedicated communication op to allow efficient sharding dimension change using alltoall, to replace our previous allgather + local chunk Pull Request resolved: pytorch#124872 Approved by: https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: pytorch#124871

as titled, as we have a dedicated comm op, this is not needed anymore Pull Request resolved: #124879 Approved by: https://github.com/XilunWu, https://github.com/wz337 ghstack dependencies: #124871, #124872

[dtensor] move pad/unpad_tensor to separate utils

9e9b187

as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument [ghstack-poisoned]

pytorch-bot bot added ci-td-distributed ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Apr 24, 2024

wanchaol mentioned this pull request Apr 24, 2024

[dtensor] implement shard dim change with alltoall #124872

Closed

wanchaol added the release notes: distributed (dtensor) release notes category label Apr 24, 2024

wanchaol requested review from XilunWu, awgu, tianyu-l and wz337 April 24, 2024 19:47

awgu approved these changes Apr 24, 2024

View reviewed changes

wanchaol mentioned this pull request Apr 24, 2024

[dtensor] delete the old unused mesh_alltoall #124879

Closed

wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 24, 2024

pytorchmergebot added the merging label Apr 24, 2024

pytorchmergebot removed the merging label Apr 24, 2024

wz337 approved these changes Apr 24, 2024

View reviewed changes

pytorchmergebot added the merging label Apr 25, 2024

pytorchmergebot removed the merging label Apr 25, 2024

pytorchmergebot added the merging label Apr 25, 2024

pytorchmergebot added the Reverted label Apr 26, 2024

pytorchmergebot reopened this Apr 26, 2024

XilunWu approved these changes Apr 26, 2024

View reviewed changes

pytorchmergebot closed this in 8d46ab4 Apr 29, 2024

github-actions bot deleted the gh/wanchaol/456/head branch June 4, 2024 01:57

[dtensor] move pad/unpad_tensor to separate utils #124871

[dtensor] move pad/unpad_tensor to separate utils #124871

Uh oh!

Conversation

wanchaol commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124871

✅ No Failures

Uh oh!

awgu commented Apr 24, 2024

Uh oh!

wanchaol commented Apr 24, 2024

Uh oh!

wanchaol commented Apr 24, 2024

Uh oh!

pytorchmergebot commented Apr 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Apr 24, 2024

Merge failed

Uh oh!

wz337 left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol commented Apr 24, 2024

Uh oh!

pytorchmergebot commented Apr 25, 2024

Merge started

Uh oh!

pytorchmergebot commented Apr 25, 2024

Merge failed

Uh oh!

wanchaol commented Apr 25, 2024

Uh oh!

jeanschmidt commented Apr 26, 2024

Uh oh!

pytorchmergebot commented Apr 26, 2024

Uh oh!

pytorchmergebot commented Apr 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wanchaol commented Apr 24, 2024 •

edited

Loading

pytorch-bot bot commented Apr 24, 2024 •

edited

Loading