Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessGroupNCCL does not support scatter #144

Closed
aoranwu opened this issue Dec 7, 2020 · 6 comments
Closed

ProcessGroupNCCL does not support scatter #144

aoranwu opened this issue Dec 7, 2020 · 6 comments

Comments

@aoranwu
Copy link

aoranwu commented Dec 7, 2020

When trying running DLRM distributedly using mpirun with NCCL as the backend there is a runtime error "ProcessGroupNCCL does not support scatter". Is there anyway to solve that?

Thanks!

@YazhiGao
Copy link
Contributor

YazhiGao commented Dec 7, 2020

can u paste the complete trace?

@aoranwu
Copy link
Author

aoranwu commented Dec 7, 2020

The trace is as follows:
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter

@YazhiGao
Copy link
Contributor

YazhiGao commented Dec 9, 2020

All2All_Scatter_Req

The trace is as follows:
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter

this is because NCCL indeed does not support scatter I believe? If you can update your pytorch to the latest, the code should be directly usingt the all2all primitive. For scatter, you have to use MPI guess. But we will update the documentation for better debugging.

@YazhiGao
Copy link
Contributor

see pytorch/pytorch#47291

@mnaumovfb
Copy link
Contributor

It looks like this is resolved.

@LukeLIN-web
Copy link

Screen Shot 2022-11-09 at 11 03 11 AM

see this photo, https://pytorch.org/docs/stable/distributed.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants