New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCCL backend in DeepSpeed not yet implemented #615
Comments
As far as I know. NCCL is not implemented by deepspeed yet, but deepspeed will use torch.distribute, which is inited by NCCL. |
no effect on usage of nccl in deepspeed? |
No. Please try. |
These warning messages should be gone after microsoft/DeepSpeed#4009 Please reopen and tag me if you are still seeing this problem. Thanks! |
Upgrading deep speed to latest version solved the problem for me. |
Hi, when i run with single_node with 4 GPUs, it gives that NCCL is not implemented, how can i use NCCL?
[2023-06-30 11:01:34,361] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:34,795] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [1, 2, 3, 4]}
[2023-06-30 11:01:34,795] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-06-30 11:01:34,795] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-06-30 11:01:34,795] [INFO] [launch.py:163:main] dist_world_size=4
[2023-06-30 11:01:34,795] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=1,2,3,4
[2023-06-30 11:01:36,125] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,129] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,139] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,152] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:39,534] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,535] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,556] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,556] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,556] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-06-30 11:01:39,568] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,568] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,576] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,576] [INFO] [comm.py:594:init_distributed] cdb=None
The text was updated successfully, but these errors were encountered: