Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtensor] fix pointwise op linearity with strategy #112107

Closed
wants to merge 1 commit into from

Conversation

wanchaol
Copy link
Contributor

@wanchaol wanchaol commented Oct 26, 2023

Stack from ghstack (oldest at bottom):

This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu

This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 26, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112107

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b6fff25 with merge base bf01a7b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@awgu
Copy link
Contributor

awgu commented Oct 26, 2023

Can we land this 🙇🏼

Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice!!! LGTM!

@wanchaol
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/380/head branch October 30, 2023 14:24
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu
Pull Request resolved: pytorch#112107
Approved by: https://github.com/wz337
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu
Pull Request resolved: pytorch#112107
Approved by: https://github.com/wz337
andreigh pushed a commit to andreigh/pytorch that referenced this pull request Nov 19, 2023
This PR fixes the pointwise op strategy linearity, and switch the
linear pointwise ops to use strategy. Also add tests show that using
the new way we can enable full shard (S(0), S(0)) like operations

Why this is useful? for 2-D Parallel like patterns where the named
parameters are possibly fully sharded on all devices, [S(0), S(0)] or
[S(1), S(0)], etc. need to work, since we don't use the sharding rules
anymore, this is possible at this point.

@awgu
Pull Request resolved: pytorch#112107
Approved by: https://github.com/wz337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants