New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dtensor] fix pointwise op linearity with strategy #112107
Conversation
This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112107
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b6fff25 with merge base bf01a7b (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Can we land this 🙇🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice!!! LGTM!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337
This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337
This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: pytorch#112107 Approved by: https://github.com/wz337
Stack from ghstack (oldest at bottom):
This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations
Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.
@awgu