Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL backend in DeepSpeed not yet implemented #615

Closed
David-Lee-1990 opened this issue Jun 30, 2023 · 5 comments
Closed

NCCL backend in DeepSpeed not yet implemented #615

David-Lee-1990 opened this issue Jun 30, 2023 · 5 comments

Comments

@David-Lee-1990
Copy link

Hi, when i run with single_node with 4 GPUs, it gives that NCCL is not implemented, how can i use NCCL?

[2023-06-30 11:01:34,361] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:34,795] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [1, 2, 3, 4]}
[2023-06-30 11:01:34,795] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-06-30 11:01:34,795] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-06-30 11:01:34,795] [INFO] [launch.py:163:main] dist_world_size=4
[2023-06-30 11:01:34,795] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=1,2,3,4
[2023-06-30 11:01:36,125] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,129] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,139] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:36,152] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-06-30 11:01:39,534] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,535] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,556] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,556] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,556] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-06-30 11:01:39,568] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,568] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-30 11:01:39,576] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-30 11:01:39,576] [INFO] [comm.py:594:init_distributed] cdb=None

@hipudding
Copy link

As far as I know. NCCL is not implemented by deepspeed yet, but deepspeed will use torch.distribute, which is inited by NCCL.
In short, deepspeed does not implement NCCL, but torch does.

@Crazybean-lwb
Copy link

As far as I know. NCCL is not implemented by deepspeed yet, but deepspeed will use torch.distribute, which is inited by NCCL. In short, deepspeed does not implement NCCL, but torch does.

no effect on usage of nccl in deepspeed?

@hipudding
Copy link

As far as I know. NCCL is not implemented by deepspeed yet, but deepspeed will use torch.distribute, which is inited by NCCL. In short, deepspeed does not implement NCCL, but torch does.

no effect on usage of nccl in deepspeed?

No. Please try.

@mrwyattii
Copy link
Contributor

mrwyattii commented Aug 18, 2023

These warning messages should be gone after microsoft/DeepSpeed#4009

Please reopen and tag me if you are still seeing this problem. Thanks!

@mohbattharani
Copy link

Upgrading deep speed to latest version solved the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants