Broadcasting does not work for Quantization aware training with multiple GPUs #37270
Labels
oncall: quantization
Quantization support in PyTorch
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Repro code and error info are at:
https://discuss.pytorch.org/t/quantization-awareness-training-multi-gpu-suport/66106
Snippet of error at:
Traceback (most recent call last):
File “train_quantization.py”, line 258, in
main(args)
File “train_quantization.py”, line 77, in main
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
File “xxx/.conda/envs/pytorch1.3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 298, in init
self.broadcast_bucket_size)
File “xxx/.conda/envs/pytorch1.3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 480, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
TypeError: _broadcast_coalesced(): incompatible function arguments. The following argument types are supported:
Invoked with: <torch.distributed.ProcessGroupNCCL object at 0x7f943f78dd18>, [tensor([[[[ 1.3185e-02, -4.3213e-03, 1.4823e-02],
…
Note that problem is present in pytorch 1.5
cc @jerryzh168 @jianyuh @dzhulgakov @raghuramank100 @jamesr66a
The text was updated successfully, but these errors were encountered: