Skip to content

Conversation

XilunWu
Copy link
Contributor

@XilunWu XilunWu commented Mar 27, 2024

Stack from ghstack (oldest at bottom):

Summary
Always wrap local tensor into a LocalShardsWrapper. This is for uniformity and it leads to easiness on adoption of DTensor as a wrapper for local shard(s) representation. To support more tensor ops over LocalShardsWrapper, users need to extend its __torch_dispatch__.

Test
torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e row-wise-even

Result

Row-wise even sharding example in DTensor
         Col 0-15
-------  ----------
Row 0-1  cuda:0
Row 2-3  cuda:1
Row 4-5  cuda:2
Row 6-7  cuda:3

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang

Copy link

pytorch-bot bot commented Mar 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122843

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cc5525a with merge base e70bf23 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the previous PR comment as it might change how this PR is implemented, and re-request review once done :)

…dsWrapper"



**Summary**
Always wrap local tensor into a `LocalShardsWrapper`. This is for uniformity and it leads to easiness on adoption of DTensor as a wrapper for local shard(s) representation. To support more tensor ops over `LocalShardsWrapper`, users need to extend its `__torch_dispatch__`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e row-wise-even`

**Result**
```
Row-wise even sharding example in DTensor
         Col 0-15
-------  ----------
Row 0-1  cuda:0
Row 2-3  cuda:1
Row 4-5  cuda:2
Row 6-7  cuda:3
```
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang

[ghstack-poisoned]
Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@XilunWu XilunWu added the topic: not user facing topic category label Apr 16, 2024
pytorchmergebot pushed a commit that referenced this pull request Apr 16, 2024
…ticipating ranks only (#122853)

**Summary**
We wrap DTensor's local tensor in `LocalShardsWrapper` for torchrec's table-wise sharding. The exception is on non-participating ranks: for non-participating ranks, the local tensor is an empty torch.Tensor object. The reason of this design is to avoid complexity on supporting empty tensor case on `LocalShardsWrapper`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e table-wise`

Pull Request resolved: #122853
Approved by: https://github.com/wz337
ghstack dependencies: #120265, #121392, #122843
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
…ytorch#122843)

**Summary**
Always wrap local tensor into a `LocalShardsWrapper`. This is for uniformity and it leads to easiness on adoption of DTensor as a wrapper for local shard(s) representation. To support more tensor ops over `LocalShardsWrapper`, users need to extend its `__torch_dispatch__`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e row-wise-even`

**Result**
```
Row-wise even sharding example in DTensor
         Col 0-15
-------  ----------
Row 0-1  cuda:0
Row 2-3  cuda:1
Row 4-5  cuda:2
Row 6-7  cuda:3
```

Pull Request resolved: pytorch#122843
Approved by: https://github.com/wz337
ghstack dependencies: pytorch#120265, pytorch#121392
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
…ticipating ranks only (pytorch#122853)

**Summary**
We wrap DTensor's local tensor in `LocalShardsWrapper` for torchrec's table-wise sharding. The exception is on non-participating ranks: for non-participating ranks, the local tensor is an empty torch.Tensor object. The reason of this design is to avoid complexity on supporting empty tensor case on `LocalShardsWrapper`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e table-wise`

Pull Request resolved: pytorch#122853
Approved by: https://github.com/wz337
ghstack dependencies: pytorch#120265, pytorch#121392, pytorch#122843
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
…ytorch#122843)

**Summary**
Always wrap local tensor into a `LocalShardsWrapper`. This is for uniformity and it leads to easiness on adoption of DTensor as a wrapper for local shard(s) representation. To support more tensor ops over `LocalShardsWrapper`, users need to extend its `__torch_dispatch__`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e row-wise-even`

**Result**
```
Row-wise even sharding example in DTensor
         Col 0-15
-------  ----------
Row 0-1  cuda:0
Row 2-3  cuda:1
Row 4-5  cuda:2
Row 6-7  cuda:3
```

Pull Request resolved: pytorch#122843
Approved by: https://github.com/wz337
ghstack dependencies: pytorch#120265, pytorch#121392
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
…ticipating ranks only (pytorch#122853)

**Summary**
We wrap DTensor's local tensor in `LocalShardsWrapper` for torchrec's table-wise sharding. The exception is on non-participating ranks: for non-participating ranks, the local tensor is an empty torch.Tensor object. The reason of this design is to avoid complexity on supporting empty tensor case on `LocalShardsWrapper`.

**Test**
`torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e table-wise`

Pull Request resolved: pytorch#122853
Approved by: https://github.com/wz337
ghstack dependencies: pytorch#120265, pytorch#121392, pytorch#122843
@github-actions github-actions bot deleted the gh/XilunWu/74/head branch May 31, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor Merged oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants