-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
📚 The doc issue
I found these environment variables in the PyTorch code. Is there any document that describes the application scenarios?
TORCH_NCCL_BLOCKING_WAIT
TORCH_NCCL_ASYNC_ERROR_HANDLING
TORCH_NCCL_DUMP_ON_TIMEOUT
TORCH_NCCL_DESYNC_DEBUG
TORCH_NCCL_ENABLE_TIMING
TORCH_NCCL_ENABLE_MONITORING
TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC
TORCH_NCCL_TRACE_BUFFER_SIZE
TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC
TORCH_NCCL_COORD_CHECK_MILSEC
TORCH_NCCL_ABORT_IN_DESTROY_PG
TORCH_NCCL_AVOID_RECORD_STREAMS
TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK
TORCH_NCCL_DEBUG_INFO_PIPE_FILE
TORCH_NCCL_DEBUG_INFO_TEMP_FILE
Suggest a potential alternative/fix
No response
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k