Broadcasting does not work for Quantization aware training with multiple GPUs #37270

raghuramank100 · 2020-04-25T00:44:07Z

Repro code and error info are at:
https://discuss.pytorch.org/t/quantization-awareness-training-multi-gpu-suport/66106

Snippet of error at:
Traceback (most recent call last):
File “train_quantization.py”, line 258, in
main(args)
File “train_quantization.py”, line 77, in main
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
File “xxx/.conda/envs/pytorch1.3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 298, in init
self.broadcast_bucket_size)
File “xxx/.conda/envs/pytorch1.3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 480, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
TypeError: _broadcast_coalesced(): incompatible function arguments. The following argument types are supported:

(process_group: torch.distributed.ProcessGroup, tensors: List[at::Tensor], buffer_size: int) -> None

Invoked with: <torch.distributed.ProcessGroupNCCL object at 0x7f943f78dd18>, [tensor([[[[ 1.3185e-02, -4.3213e-03, 1.4823e-02],
…

Note that problem is present in pytorch 1.5

cc @jerryzh168 @jianyuh @dzhulgakov @raghuramank100 @jamesr66a

vkuzo · 2020-05-07T00:26:39Z

pytorch/vision#2191 updates the tutorial to work better with QAT+DDP. There is still work to do in verifying BN correctness, which will be in a separate PR.

vkuzo · 2020-07-08T02:02:29Z

#38587, #39031, #38368, #38478 fixed this issue.

raghuramank100 assigned vkuzo Apr 25, 2020

raghuramank100 added oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 25, 2020

vkuzo closed this as completed Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcasting does not work for Quantization aware training with multiple GPUs #37270

Broadcasting does not work for Quantization aware training with multiple GPUs #37270

raghuramank100 commented Apr 25, 2020 •

edited by pytorch-probot bot

Loading

vkuzo commented May 7, 2020

vkuzo commented Jul 8, 2020

Broadcasting does not work for Quantization aware training with multiple GPUs #37270

Broadcasting does not work for Quantization aware training with multiple GPUs #37270

Comments

raghuramank100 commented Apr 25, 2020 • edited by pytorch-probot bot Loading

vkuzo commented May 7, 2020

vkuzo commented Jul 8, 2020

raghuramank100 commented Apr 25, 2020 •

edited by pytorch-probot bot

Loading