Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

erikchwang · 2020-05-07T04:24:22Z

Python: 3.6.10
PyTorch: 1.5.0
Transformers: 2.8.0 and 2.9.0

In the following code, I wrap the pretrained BERT with a DataParallel wrapper so as to run it on multiple GPUs:

import torch, transformers
model = transformers.AutoModel.from_pretrained("bert-base-multilingual-cased")
model = torch.nn.DataParallel(model)
model = model.cuda()
input = torch.ones([16, 10], dtype=torch.long)
input = input.cuda()
model(input)

But I got the following error:

Traceback (most recent call last):
File "", line 1, in
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/anaconda/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/transformers/modeling_bert.py", line 734, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration

But it will work if I remove the DataParallel wrapper.

erikchwang · 2020-05-07T14:33:55Z

By the way, when I downgrade Pytorch 1.5.0 to 1.4.0, the error disappears.

erikchwang · 2020-05-07T21:24:37Z

The same issue: #3936

julien-c · 2020-05-11T22:35:30Z

Closing in favor of #3936

erikchwang changed the title ~~How to use pretrained BERT on multiple GPUs with DataParallel?~~ Bug: can not use pretrained BERT on multiple GPUs with DataParallel May 7, 2020

erikchwang changed the title ~~Bug: can not use pretrained BERT on multiple GPUs with DataParallel~~ Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) May 7, 2020

erikchwang mentioned this issue May 7, 2020

Pytorch 1.5 DataParallel #3936

Closed

4 tasks

julien-c closed this as completed May 11, 2020

janhavi13 mentioned this issue Jun 19, 2020

fp16 compatibility StopIteration on Multiple GPU's: Text Classification of MultiNLI Sentences using BERT interpretml/interpret-text#117

Closed

Jason3900 mentioned this issue Jun 26, 2020

Low iteration speed grammarly/gector#7

Closed

NorthGuard mentioned this issue Oct 12, 2020

Pytorch 1.6 DataParallel #7731

Closed

4 tasks

B-1368 mentioned this issue Mar 1, 2021

Caught StopIteration in replica 1 on device 1 error soobinseo/Transformer-TTS#39

Closed

BenfengXu mentioned this issue Sep 28, 2021

Stop Iteration BenfengXu/SSAN#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

erikchwang commented May 7, 2020 •

edited

erikchwang commented May 7, 2020

erikchwang commented May 7, 2020

julien-c commented May 11, 2020

Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

Comments

erikchwang commented May 7, 2020 • edited

erikchwang commented May 7, 2020

erikchwang commented May 7, 2020

julien-c commented May 11, 2020

erikchwang commented May 7, 2020 •

edited