Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FakeProcessGroup traceable #113314

Closed
wants to merge 2 commits into from
Closed

Commits on Nov 8, 2023

  1. Make FakeProcessGroup traceable

    This PR mimics what we have done to trace ProcessGroup.
    
    Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)
    
    [ghstack-poisoned]
    fegin committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    38a8ce0 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2023

  1. Update on "Make FakeProcessGroup traceable"

    This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1.
    
    Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/)
    
    cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng
    
    [ghstack-poisoned]
    fegin committed Nov 9, 2023
    Configuration menu
    Copy the full SHA
    858ef91 View commit details
    Browse the repository at this point in the history