Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Minor doc fixes for init_process_group #47644

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions torch/distributed/distributed_c10d.py
Expand Up @@ -401,11 +401,11 @@ def init_process_group(backend,
asynchronously and the process will crash. ``NCCL_BLOCKING_WAIT``
will provide errors to the user which can be caught and handled,
but due to its blocking nature, it has a performance overhead. On
the other hand, ``NCCL_ASYNC_ERROR_HANDLING`` has little
the other hand, ``NCCL_ASYNC_ERROR_HANDLING`` has very little
performance overhead, but crashes the process on errors. This is
done since CUDA execution is async and it is no longer safe to
continue executing user code since failed async NCCL operations
might result in subsequent CUDA operations to run on corrupted
might result in subsequent CUDA operations running on corrupted
data. Only one of these two environment variables should be set.
group_name (str, optional, deprecated): Group name.

Expand Down