ProcessGroupNCCL does not support scatter #144

aoranwu · 2020-12-07T03:35:45Z

When trying running DLRM distributedly using mpirun with NCCL as the backend there is a runtime error "ProcessGroupNCCL does not support scatter". Is there anyway to solve that?

Thanks!

YazhiGao · 2020-12-07T17:10:27Z

can u paste the complete trace?

aoranwu · 2020-12-07T18:32:11Z

The trace is as follows:
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter

YazhiGao · 2020-12-09T17:47:47Z

All2All_Scatter_Req

The trace is as follows:
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter
Traceback (most recent call last):
File "dlrm_s_pytorch.py", line 1056, in
Z = dlrm_wrap(X, lS_o, lS_i, use_gpu, device)
File "dlrm_s_pytorch.py", line 915, in dlrm_wrap
lS_i
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "dlrm_s_pytorch.py", line 331, in forward
return self.distributed_forward(dense_x, lS_o, lS_i)
File "dlrm_s_pytorch.py", line 401, in distributed_forward
a2a_req = ext_dist.alltoall(ly, self.n_emb_per_rank)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 439, in alltoall
output = All2All_Scatter_Req.apply(a2ai, *inputs)
File "/home/wisr/aoranwu/dlrm/extend_distributed.py", line 225, in forward
req = dist.scatter(out_tensor, scatter_list if i == my_rank else [], src=i, async_op=True)
File "/home/wisr/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1380, in scatter
work = _default_pg.scatter(output_tensors, input_tensors, opts)
RuntimeError: ProcessGroupNCCL does not support scatter

this is because NCCL indeed does not support scatter I believe? If you can update your pytorch to the latest, the code should be directly usingt the all2all primitive. For scatter, you have to use MPI guess. But we will update the documentation for better debugging.

YazhiGao · 2020-12-15T23:16:55Z

see pytorch/pytorch#47291

mnaumovfb · 2021-02-17T21:25:53Z

It looks like this is resolved.

LukeLIN-web · 2022-11-09T08:03:38Z

see this photo, https://pytorch.org/docs/stable/distributed.html

mnaumovfb closed this as completed Feb 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProcessGroupNCCL does not support scatter #144

ProcessGroupNCCL does not support scatter #144

aoranwu commented Dec 7, 2020

YazhiGao commented Dec 7, 2020

aoranwu commented Dec 7, 2020

YazhiGao commented Dec 9, 2020

YazhiGao commented Dec 15, 2020

mnaumovfb commented Feb 17, 2021

LukeLIN-web commented Nov 9, 2022

ProcessGroupNCCL does not support scatter #144

ProcessGroupNCCL does not support scatter #144

Comments

aoranwu commented Dec 7, 2020

YazhiGao commented Dec 7, 2020

aoranwu commented Dec 7, 2020

YazhiGao commented Dec 9, 2020

YazhiGao commented Dec 15, 2020

mnaumovfb commented Feb 17, 2021

LukeLIN-web commented Nov 9, 2022