-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[PT/ShardedTensor]Allow zero size local shard #65007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 0e4080f (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This pull request was exported from Phabricator. Differential Revision: D30926566 |
Codecov Report
@@ Coverage Diff @@
## master #65007 +/- ##
=======================================
Coverage 66.37% 66.37%
=======================================
Files 738 738
Lines 94170 94170
=======================================
+ Hits 62508 62509 +1
+ Misses 31662 31661 -1 |
This pull request was exported from Phabricator. Differential Revision: D30926566 |
2964636
to
816d127
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
816d127
to
7369f5d
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
7369f5d
to
63d36b7
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
63d36b7
to
45dd526
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Xing
45dd526
to
1dd2e88
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
This pull request was exported from Phabricator. Differential Revision: D30926566 |
1dd2e88
to
eb88cdd
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
eb88cdd
to
c987ef5
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
c987ef5
to
1788761
Compare
1788761
to
40c28bc
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
This pull request was exported from Phabricator. Differential Revision: D30926566 |
40c28bc
to
ac95d84
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
ac95d84
to
5bd5cfb
Compare
This pull request was exported from Phabricator. Differential Revision: D30926566 |
5bd5cfb
to
ed71081
Compare
Summary: Pull Request resolved: pytorch#65007 Relax shard size check in ShardMetadata to allow zero size local shard. When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata. Test Plan: Unit tests and CLI Reviewed By: jiaqizhai, wanchaol Differential Revision: D30926566 fbshipit-source-id: b95247e616057ad41ccea6206653a3238eb26e30
This pull request was exported from Phabricator. Differential Revision: D30926566 |
ed71081
to
0e4080f
Compare
This pull request has been merged in 600df80. |
Summary:
Relax shard size check in ShardMetadata to allow zero size local shard.
When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.
Test Plan: Unit tests and CLI
Reviewed By: wanchaol
Differential Revision: D30926566
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @cbalioglu @gcramer23