You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got this error message following GPT2 tutorial to the BERT code.
Can you let me know something that I missed?
The error message is "RuntimeError: expected scalar type Float but found Half (data at /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1821)", and below is the trace.
Traceback (most recent call last):
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 617, in
main()
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 595, in main
timers, args)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 354, in train
args, timers)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 304, in train_step
args, timers)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/pretrain_bert.py", line 232, in forward_step
checkpoint_activations=args.checkpoint_activations)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/deepspeed/pt/deepspeed_light.py", line 613, in forward
loss = self.module(*inputs, **kwargs)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/model/distributed.py", line 78, in forward
return self.module(*inputs, **kwargs)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(input, **kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/fp16/fp16.py", line 65, in forward
return fp16_to_fp32(self.module((fp32_to_fp16(inputs)), **kwargs))
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/model/model.py", line 82, in forward
checkpoint_activations=checkpoint_activations)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/model/modeling.py", line 944, in forward
output_all_encoded_layers=False, checkpoint_activations=checkpoint_activations)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, *kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/model/modeling.py", line 869, in forward
embedding_output = self.embeddings(input_ids, token_type_ids)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(input, kwargs)
File "/home/soojeong/forked/DeepSpeed/DeepSpeedExamples/Megatron-LM/model/modeling.py", line 300, in forward
embeddings = self.LayerNorm(embeddings)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(input, kwargs)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 159, in forward
input, self.weight, self.bias, self.normalized_shape,self.eps)
File "/home/soojeong/deepspeed_venv/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 25, in forward
input_, ctx.normalized_shape, weight_, bias_, ctx.eps)
RuntimeError: expected scalar type Float but found Half (data at /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1821)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fcc922f5273 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: float at::Tensor::data() const + 0x449 (0x7fc8843aa5e9 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: cuda_layer_norm(at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, c10::ArrayRef, at::Tensor, at::Tensor, double) + 0x725 (0x7fc8843a76c5 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: layer_norm_affine(at::Tensor, c10::ArrayRef, at::Tensor, at::Tensor, double) + 0x2a4 (0x7fc884394ca4 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x1e254 (0x7fc8843a5254 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: + 0x1a8e0 (0x7fc8843a18e0 in /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
The text was updated successfully, but these errors were encountered:
I got this error message following GPT2 tutorial to the BERT code.
Can you let me know something that I missed?
The error message is "RuntimeError: expected scalar type Float but found Half (data at /home/soojeong/deepspeed_venv/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1821)", and below is the trace.
The text was updated successfully, but these errors were encountered: