-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DTensor] require DeviceMesh size equals world size #91801
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91801
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 387458b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
Two meshes created over world in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not always do the check and do it only when we initialize world_pg
@@ -143,6 +143,13 @@ def __init__( | |||
f"Mesh should not be bigger than default world size, but found {self.mesh.numel()} ranks!" | |||
) | |||
|
|||
# TODO: we will support mesh on a subset of WORLD in future | |||
if self.mesh.numel() < world_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check should not happen all the time, it should only happen when there's no default pg exist and we want to help user create a world_pg, this check should be only inside get_or_create_group I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense because IIRC we can define for example mesh A on rank 0, 1 and mesh B on rank 2, 3. The example we discussed last time is actually about mesh is defined on rank 0, 1 and no mesh is defined on rank 2, 3. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's possibly to create sub meshes, that's what 2-D did currently, so we should still allow such behavior.
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pytorchmergebot merge -g |
Merge startedYour change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):