Skip to content

Conversation

wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Apr 24, 2024

Stack from ghstack (oldest at bottom):

as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124871

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ba282a6 with merge base e592a60 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@awgu
Copy link
Collaborator

awgu commented Apr 24, 2024

Looks like need to migrate test_dtensor.py

shard_placement._unpad_tensor(tensor, pad_sizes[i])

as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]
@wanchaol
Copy link
Collaborator Author

Looks like need to migrate test_dtensor.py

shard_placement._unpad_tensor(tensor, pad_sizes[i])

good catch! updated

@wanchaol wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 24, 2024
@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 4 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi, trunk, linux-binary-manywheel, linux-binary-libtorch-pre-cxx11

Details for Dev Infra team Raised by workflow job

Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 4 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi, trunk, linux-binary-manywheel, linux-binary-libtorch-pre-cxx11

Details for Dev Infra team Raised by workflow job

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge -i

@jeanschmidt
Copy link
Contributor

@pytorchbot revert -m "Broke internal tests, see D56587991 for more details" -c ignoredsignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Apr 26, 2024
This reverts commit 0b0eea2.

Reverted #124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](#124871 (comment)))
@pytorchmergebot
Copy link
Collaborator

@wanchaol your PR has been successfully reverted.

alat-rights pushed a commit to alat-rights/pytorch that referenced this pull request Apr 26, 2024
as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

Pull Request resolved: pytorch#124871
Approved by: https://github.com/awgu, https://github.com/wz337
as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Apr 29, 2024
as titled, we implement a dedicated communication op to allow efficient
sharding dimension change using alltoall, to replace our previous
allgather + local chunk

Pull Request resolved: #124872
Approved by: https://github.com/XilunWu, https://github.com/yifuwang
ghstack dependencies: #124871
pytorchmergebot pushed a commit that referenced this pull request Apr 29, 2024
as titled, as we have a dedicated comm op, this is not needed anymore

Pull Request resolved: #124879
Approved by: https://github.com/XilunWu, https://github.com/wz337
ghstack dependencies: #124871, #124872
pytorchmergebot pushed a commit that referenced this pull request Apr 30, 2024
as titled, we implement a dedicated communication op to allow efficient
sharding dimension change using alltoall, to replace our previous
allgather + local chunk

Pull Request resolved: #124872
Approved by: https://github.com/XilunWu, https://github.com/yifuwang
ghstack dependencies: #124871
pytorchmergebot pushed a commit that referenced this pull request Apr 30, 2024
as titled, as we have a dedicated comm op, this is not needed anymore

Pull Request resolved: #124879
Approved by: https://github.com/XilunWu, https://github.com/wz337
ghstack dependencies: #124871, #124872
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

Pull Request resolved: #124871
Approved by: https://github.com/awgu, https://github.com/wz337
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
This reverts commit 0b0eea2.

Reverted #124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](#124871 (comment)))
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
as titled, 1. pad/unpad is a general util not specific to the Shard
placement, 2. for the propose of the next PR, move these two out of Shard
placement itself, and give additional pad_dim argument

Pull Request resolved: pytorch#124871
Approved by: https://github.com/awgu, https://github.com/wz337, https://github.com/XilunWu
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
as titled, we implement a dedicated communication op to allow efficient
sharding dimension change using alltoall, to replace our previous
allgather + local chunk

Pull Request resolved: #124872
Approved by: https://github.com/XilunWu, https://github.com/yifuwang
ghstack dependencies: #124871
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
as titled, as we have a dedicated comm op, this is not needed anymore

Pull Request resolved: pytorch#124879
Approved by: https://github.com/XilunWu, https://github.com/wz337
ghstack dependencies: pytorch#124871, pytorch#124872
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
as titled, we implement a dedicated communication op to allow efficient
sharding dimension change using alltoall, to replace our previous
allgather + local chunk

Pull Request resolved: pytorch#124872
Approved by: https://github.com/XilunWu, https://github.com/yifuwang
ghstack dependencies: pytorch#124871
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
as titled, as we have a dedicated comm op, this is not needed anymore

Pull Request resolved: #124879
Approved by: https://github.com/XilunWu, https://github.com/wz337
ghstack dependencies: #124871, #124872
@github-actions github-actions bot deleted the gh/wanchaol/456/head branch June 4, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-td-distributed ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants