[dtensor] fix pointwise op linearity with strategy #112107

wanchaol · 2023-10-26T01:28:25Z

Stack from ghstack (oldest at bottom):

This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu

@awgu

This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu [ghstack-poisoned]

pytorch-bot · 2023-10-26T01:28:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112107

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b6fff25 with merge base bf01a7b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu · 2023-10-26T21:23:00Z

Can we land this 🙇🏼

wz337

Noice!!! LGTM!

wanchaol · 2023-10-27T00:17:57Z

@pytorchbot merge

pytorchmergebot · 2023-10-27T00:20:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@awgu

This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337

@awgu

This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337

@awgu

This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337

wanchaol requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, fegin, fduwjj, wz337, kiukchung and d4l3k as code owners October 26, 2023 01:28

wanchaol mentioned this pull request Oct 26, 2023

[dtensor] enable foreach operators for adam optimizer #112108

Closed

wanchaol added the release notes: distributed (dtensor) release notes category label Oct 26, 2023

wz337 approved these changes Oct 26, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2023

pytorchmergebot added the merging label Oct 27, 2023

pytorchmergebot added Merged and removed merging labels Oct 27, 2023

pytorchmergebot closed this in 94e90c1 Oct 27, 2023

facebook-github-bot deleted the gh/wanchaol/380/head branch October 30, 2023 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor] fix pointwise op linearity with strategy #112107

[dtensor] fix pointwise op linearity with strategy #112107

wanchaol commented Oct 26, 2023 •

edited

pytorch-bot bot commented Oct 26, 2023 •

edited

awgu commented Oct 26, 2023

wz337 left a comment

wanchaol commented Oct 27, 2023

pytorchmergebot commented Oct 27, 2023

[dtensor] fix pointwise op linearity with strategy #112107

[dtensor] fix pointwise op linearity with strategy #112107

Conversation

wanchaol commented Oct 26, 2023 • edited

pytorch-bot bot commented Oct 26, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112107

✅ No Failures

awgu commented Oct 26, 2023

wz337 left a comment

Choose a reason for hiding this comment

wanchaol commented Oct 27, 2023

pytorchmergebot commented Oct 27, 2023

Merge started

wanchaol commented Oct 26, 2023 •

edited

pytorch-bot bot commented Oct 26, 2023 •

edited