Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) #4189

Closed
erikchwang opened this issue May 7, 2020 · 3 comments

Comments

@erikchwang
Copy link

erikchwang commented May 7, 2020

Python: 3.6.10
PyTorch: 1.5.0
Transformers: 2.8.0 and 2.9.0

In the following code, I wrap the pretrained BERT with a DataParallel wrapper so as to run it on multiple GPUs:

import torch, transformers
model = transformers.AutoModel.from_pretrained("bert-base-multilingual-cased")
model = torch.nn.DataParallel(model)
model = model.cuda()
input = torch.ones([16, 10], dtype=torch.long)
input = input.cuda()
model(input)

But I got the following error:

Traceback (most recent call last):
File "", line 1, in
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/anaconda/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda/lib/python3.6/site-packages/transformers/modeling_bert.py", line 734, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration

But it will work if I remove the DataParallel wrapper.

@erikchwang
Copy link
Author

By the way, when I downgrade Pytorch 1.5.0 to 1.4.0, the error disappears.

@erikchwang erikchwang changed the title How to use pretrained BERT on multiple GPUs with DataParallel? Bug: can not use pretrained BERT on multiple GPUs with DataParallel May 7, 2020
@erikchwang erikchwang changed the title Bug: can not use pretrained BERT on multiple GPUs with DataParallel Bug: can not use pretrained BERT on multiple GPUs with DataParallel (PyTorch 1.5.0) May 7, 2020
@erikchwang
Copy link
Author

The same issue: #3936

@julien-c
Copy link
Member

Closing in favor of #3936

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants