Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FakeProcessGroup traceable #113314

Closed
wants to merge 2 commits into from
Closed

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Nov 8, 2023

Stack from ghstack (oldest at bottom):

This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the wait() seems to be optimized away when the world_size is 1.

Differential Revision: D51136463

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng

This PR mimics what we have done to trace ProcessGroup.

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113314

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 858ef91 with merge base 376217c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fegin added a commit that referenced this pull request Nov 8, 2023
This PR mimics what we have done to trace ProcessGroup.

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)

ghstack-source-id: 206942938
Pull Request resolved: #113314
@fegin fegin marked this pull request as draft November 8, 2023 23:33
@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2023
@@ -660,6 +661,12 @@ def index_source(key):
source=self.source,
guards=self.make_guards(GuardBuilder.ID_MATCH),
)
elif FakeProcessGroupVariable.is_process_group(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make this check directly inside ProcessGroupVariable, I think FakePG is just another type of PG so we should try to make that works directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess we won't actually do something special for FakePG. Let me change it.

This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1.

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

[ghstack-poisoned]
fegin added a commit that referenced this pull request Nov 9, 2023
Pull Request resolved: #113314

This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1.

ghstack-source-id: 206958093
@exported-using-ghexport

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)
@fegin fegin marked this pull request as ready for review November 9, 2023 20:15
@fegin fegin requested a review from wanchaol November 10, 2023 00:10
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1.

Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)

Pull Request resolved: pytorch#113314
Approved by: https://github.com/wanchaol
@facebook-github-bot facebook-github-bot deleted the gh/fegin/180/head branch November 14, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants