Skip to content
This repository has been archived by the owner on Feb 16, 2022. It is now read-only.

Caught StopIteration in replica 0 on device 0. #57

Open
mamingyang112 opened this issue Aug 31, 2020 · 3 comments
Open

Caught StopIteration in replica 0 on device 0. #57

mamingyang112 opened this issue Aug 31, 2020 · 3 comments

Comments

@mamingyang112
Copy link

When I use multi-gpu , error happens, the detail is bellow. How can I solve this problem

File "vilbert-multi-task/train_cls.py", line 535, in
main()
File "vilbert-multi-task/train_cls.py", line 407, in main
task_losses,
File "..../vilbert-multi-task/vilbert/task_utils.py", line 327, in ForwardModelsTrain
task_tokens,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "..../vilbert-multi-task/vilbert/vilbert.py", line 1662, in forward
output_all_attention_masks=output_all_attention_masks,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "...../vilbert-multi-task/vilbert/vilbert.py", line 1351, in forward
dtype=next(self.parameters()).dtype
StopIteration

@youngfly11
Copy link

hi, do you have any solution? I met the same issue

@youngfly11
Copy link

This can be solved by downgrading the pytorch to 1.4.0.

@adamsvystun
Copy link

adamsvystun commented Dec 10, 2020

Yep, downgrading works. This is a known issue with Pytorch >= 1.5.0: pytorch/pytorch#40457

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants