You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the awesome and interesting research and project. I was wondering if anyone has encountered the following error when using multiple gpu. I have 4 Titan V gpus and to use them I've set the local rank to -1. But it seems a problem occurs during the forward calculation.
Traceback (most recent call last):
File "style_paraphrase/run_lm_finetuning.py", line 505, in <module>
main()
File "style_paraphrase/run_lm_finetuning.py", line 422, in main
global_step, tr_loss = train(args, gpt2_model, train_dataset, tokenizer)
File "style_paraphrase/run_lm_finetuning.py", line 228, in train
loss = gpt2_model(batch)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/workspace/style-transformer/style-transfer-paraphrase/style_paraphrase/utils.py", line 87, in forward
labels=labels
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
**applier(kwargs, input_caster))
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1059, in forward
return_dict=return_dict,
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 832, in forward
inputs_embeds = self.wte(input_ids)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/root/miniconda3/envs/style-venv/lib/python3.7/site-packages/torch/nn/functional.py", line 1814, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorIndex.cu:403
The text was updated successfully, but these errors were encountered:
Thank you for the awesome and interesting research and project. I was wondering if anyone has encountered the following error when using multiple gpu. I have 4 Titan V gpus and to use them I've set the local rank to -1. But it seems a problem occurs during the forward calculation.
The text was updated successfully, but these errors were encountered: