New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make FakeProcessGroup traceable #113314
Make FakeProcessGroup traceable #113314
Conversation
This PR mimics what we have done to trace ProcessGroup. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113314
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 858ef91 with merge base 376217c (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR mimics what we have done to trace ProcessGroup. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) ghstack-source-id: 206942938 Pull Request resolved: #113314
torch/_dynamo/variables/builder.py
Outdated
@@ -660,6 +661,12 @@ def index_source(key): | |||
source=self.source, | |||
guards=self.make_guards(GuardBuilder.ID_MATCH), | |||
) | |||
elif FakeProcessGroupVariable.is_process_group(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make this check directly inside ProcessGroupVariable
, I think FakePG is just another type of PG so we should try to make that works directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I guess we won't actually do something special for FakePG. Let me change it.
This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Pull Request resolved: #113314 This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1. ghstack-source-id: 206958093 @exported-using-ghexport Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) Pull Request resolved: pytorch#113314 Approved by: https://github.com/wanchaol
Stack from ghstack (oldest at bottom):
This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the
wait()
seems to be optimized away when the world_size is 1.Differential Revision: D51136463
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng