-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DeviceMesh] Move DeviceMesh out from torch.distributed._tensor #112364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112364
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit df6792f with merge base b91fcdf ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
9f7556d
to
b02e839
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, we shouldn't move device mesh like this. I think it should be:
torch.distributed._tensor.device_mesh -> torch.distributed._device_mesh
test/distributed/_tensor/test_device_mesh -> test/distributed/test_device_mesh
No need to put deprecation warnings in _tensor.device_mesh
, that file can just import the device mesh from torch.distributed, then we can fix the imports as a separate PR.
525d318
to
bf61fe6
Compare
bf61fe6
to
2e7963c
Compare
torch/_dynamo/skipfiles.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this line and can avoid confusion when compiling the model that still depend on this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fd55567
to
0c45d4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, please make sure CI passes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a warning here telling users that the path has been moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually "we will make torch.distributed._device_mesh public." is not accurate because with _device_mesh
it is still private? IIUC. But it's great that we decouple it from DTensor.
0c45d4d
to
df6792f
Compare
Ye. We will in following PRs. Updated the summary as well. |
I think it should be fine, as the previous path is still importing from the new path so nothing would be broken. |
I am not talking about BC breaking here. I mean for new users, they import from the new path right? |
Once the migration is done, we will add doc to the .rst file and make it public. |
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…rch#112364) Move DeviceMesh out as a standalone module. Once we make sure everything is migrated and doc is ready, we will make `torch.distributed._device_mesh` public in follow-up PRs. Pull Request resolved: pytorch#112364 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/fduwjj
…rch#112364) Move DeviceMesh out as a standalone module. Once we make sure everything is migrated and doc is ready, we will make `torch.distributed._device_mesh` public in follow-up PRs. Pull Request resolved: pytorch#112364 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/fduwjj
Move DeviceMesh out as a standalone module. Once we make sure everything is migrated and doc is ready, we will make
torch.distributed._device_mesh
public in follow-up PRs.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng