Skip to content

Conversation

xing-liu
Copy link
Contributor

@xing-liu xing-liu commented Sep 14, 2021

Summary:
Relax shard size check in ShardMetadata to allow zero size local shard.

When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.

Test Plan: Unit tests and CLI

Reviewed By: wanchaol

Differential Revision: D30926566

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @cbalioglu @gcramer23

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 14, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 0e4080f (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #65007 (0e4080f) into master (880098a) will increase coverage by 0.00%.
The diff coverage is 50.00%.

@@           Coverage Diff           @@
##           master   #65007   +/-   ##
=======================================
  Coverage   66.37%   66.37%           
=======================================
  Files         738      738           
  Lines       94170    94170           
=======================================
+ Hits        62508    62509    +1     
+ Misses      31662    31661    -1     

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

Copy link
Contributor

@bowangbj bowangbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Xing

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

Summary:
Pull Request resolved: pytorch#65007

Relax shard size check in ShardMetadata to allow zero size local shard.

When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.

Test Plan: Unit tests and CLI

Reviewed By: jiaqizhai, wanchaol

Differential Revision: D30926566

fbshipit-source-id: b95247e616057ad41ccea6206653a3238eb26e30
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30926566

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 600df80.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants