Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtensor][8/N] Introduce cost model for sharding #109145

Closed
wants to merge 10 commits into from

Conversation

wanchaol
Copy link
Contributor

@wanchaol wanchaol commented Sep 12, 2023

This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 12, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109145

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2192bdb with merge base 898482f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Sep 15, 2023
This PR adds some basic comm cost model for sharding prop

ghstack-source-id: c45e652b428ee5ec1cb021975307e49dda6ffb7a
Pull Request resolved: #109145
This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Oct 2, 2023
This PR adds some basic comm cost model for sharding prop

ghstack-source-id: 1b99d3b8399e0944a2d849aedcd7b17d5bf0a70a
Pull Request resolved: #109145
This PR adds some basic comm cost model for sharding prop. Why we need this? operators can generate multiple placement strategies, i.e. for matmul we have at least 4 possible shardings: `1. Shard(0), R 2. R, Shard(1) 3. Shard(1), Shard(0) 4. R, R`

We need to be able to choose from one of these options during runtime, and perform resharding with reasonable choices. This is why we are building a cost model for sharding here. In this PR we associate each possible sharding strategy with redistribute costs. For eager mode since we run ops eagerly, we simply perform a min cost selection. One can imagine if we have some global information the strategy selection would become more intelligient.


[ghstack-poisoned]
This PR adds some basic comm cost model for sharding prop. Why we need this? operators can generate multiple placement strategies, i.e. for matmul we have at least 4 possible shardings: `1. Shard(0), R 2. R, Shard(1) 3. Shard(1), Shard(0) 4. R, R`

We need to be able to choose from one of these options during runtime, and perform resharding with reasonable choices. This is why we are building a cost model for sharding here. In this PR we associate each possible sharding strategy with redistribute costs. For eager mode since we run ops eagerly, we simply perform a min cost selection. One can imagine if we have some global information the strategy selection would become more intelligient.


[ghstack-poisoned]
@wanchaol wanchaol added the release notes: distributed (dtensor) release notes category label Oct 6, 2023
This PR adds some basic comm cost model for sharding prop. Why we need this? operators can generate multiple placement strategies, i.e. for matmul we have at least 4 possible shardings: `1. Shard(0), R 2. R, Shard(1) 3. Shard(1), Shard(0) 4. R, R`

We need to be able to choose from one of these options during runtime, and perform resharding with reasonable choices. This is why we are building a cost model for sharding here. In this PR we associate each possible sharding strategy with redistribute costs. For eager mode since we run ops eagerly, we simply perform a min cost selection. One can imagine if we have some global information the strategy selection would become more intelligient.


[ghstack-poisoned]
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add at least one simple UT for the cost cost modeling?

This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This PR adds some basic comm cost model for sharding prop

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Oct 15, 2023
This PR switches matrix ops to generate the sharding strategies, and
with the cost selection algorithm introduced in the previous PR we are
able to enable this and more ops to leverage strategy based sharding
prop

This also fixes a bunch of corner cases that existing propagation does
not cover, resulting in full coverage for baddbmm
Pull Request resolved: #110717
Approved by: https://github.com/fduwjj
ghstack dependencies: #109145
pytorchmergebot pushed a commit that referenced this pull request Oct 15, 2023
As titled, this also handles sth like [Shard(0), Shard(0)] correctly for
pointwise ops, which was previously errored out
Pull Request resolved: #111234
Approved by: https://github.com/fduwjj
ghstack dependencies: #109145, #110717
pytorchmergebot pushed a commit that referenced this pull request Oct 15, 2023
This add __Str__ to op schema and dtensor spec for ease of reading
Pull Request resolved: #111278
Approved by: https://github.com/fduwjj
ghstack dependencies: #109145, #110717, #111234
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
This PR adds some basic comm cost model for sharding prop
Pull Request resolved: pytorch#109145
Approved by: https://github.com/fduwjj
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
This PR switches matrix ops to generate the sharding strategies, and
with the cost selection algorithm introduced in the previous PR we are
able to enable this and more ops to leverage strategy based sharding
prop

This also fixes a bunch of corner cases that existing propagation does
not cover, resulting in full coverage for baddbmm
Pull Request resolved: pytorch#110717
Approved by: https://github.com/fduwjj
ghstack dependencies: pytorch#109145
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
As titled, this also handles sth like [Shard(0), Shard(0)] correctly for
pointwise ops, which was previously errored out
Pull Request resolved: pytorch#111234
Approved by: https://github.com/fduwjj
ghstack dependencies: pytorch#109145, pytorch#110717
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
This add __Str__ to op schema and dtensor spec for ease of reading
Pull Request resolved: pytorch#111278
Approved by: https://github.com/fduwjj
ghstack dependencies: pytorch#109145, pytorch#110717, pytorch#111234
@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/356/head branch October 19, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants