Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute GPUs in round robin mode for distributed_test #46389

Closed

Commits on Oct 15, 2020

  1. Distribute GPUs in round robin mode for distributed_test

    The ProcessGroupNCCL::barrier implementation assumes that when
    1 GPU/rank is used the GPU-Index equals the rank. Due to NCCL
    communicator reuse this then leads to rank 0 using the (kinda)
    temporary communicator while the other processes might use other GPUs
    leading to them trying to create a new communicator and waiting for
    rank 0 until that creates a new (potentially unrelated) one.
    
    See pytorch#46248 for details
    Flamefire committed Oct 15, 2020
    Configuration menu
    Copy the full SHA
    4ee880e View commit details
    Browse the repository at this point in the history