[SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` #13665

stas00 · 2021-09-21T03:10:02Z

This PR fixes a potential performance issue in general and a failure under Deepspeed when the following models are used under mixed precision with positional embedding resizing at forward time:

speech_to_text
m2m_100
fsmt

Currently when SinusoidalPositionalEmbedding.forward is called if it resizes the embeddings it ignores the original correct dtype and forces the embeddings into fp32, so the inputs are in fp32 now.

I detected the issue with deepspeed, which doesn't use amp but forces the model into fp16 and then of course if the input is in the wrong dtype we get:

deepspeed  examples/pytorch/translation/run_translation.py --train_file tests/fixtures/tests_samples/wmt_en_ro/train.json --source_lang en --target_lang ro --model_name_or_path hf-internal-testing/tiny-random-m2m_100 --do_train --max_train_samples 4 --per_device_train_batch_size 2 --num_train_epochs 1 --fp16 --report_to none --overwrite_output_dir --deepspeed tests/deepspeed/ds_config_zero2.json --output_dir /tmp/tmpi4k4wz8s --save_steps 1
[...]
  File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 393, in forward
    hidden_states = self.final_layer_norm(hidden_states)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 173, in forward
    return F.layer_norm(
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/functional.py", line 2346, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

So hidden_states ends up being fp32 instead of fp16 because the pos_emb is fp32.

I checked all models matching SinusoidalPositionalEmbedding and all the others that aren't modified by this PR don't do dynamic resizing at run time.

I haven't checked non-SinusoidalPositionalEmbedding - perhaps those have an issue too.

The test will be in #12695 as soon as this PR gets merged.

@patil-suraj, @LysandreJik, @sgugger

…forward

patil-suraj

Great catch! Thanks a lot for fixing this.

sgugger

LGTM, thanks for fixing!

…forward (huggingface#13665)

[SinusoidalPositionalEmbedding] incorrect dtype when make_weights in …

678cba5

…forward

stas00 changed the title ~~[SinusoidalPositionalEmbedding] incorrect dtype when make_weights in forward~~ [SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward Sep 21, 2021

stas00 mentioned this pull request Sep 21, 2021

[Deepspeed] add many more models to the model zoo test #12695

Merged

6 tasks

patil-suraj approved these changes Sep 21, 2021

View reviewed changes

sgugger approved these changes Sep 21, 2021

View reviewed changes

stas00 merged commit a722c30 into huggingface:master Sep 21, 2021

stas00 deleted the fix-pos-emb-dtype branch September 21, 2021 16:05

Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021

[SinusoidalPositionalEmbedding] incorrect dtype when make_weights in …

3a7ad07

…forward (huggingface#13665)

Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022

[SinusoidalPositionalEmbedding] incorrect dtype when make_weights in …

c46f71c

…forward (huggingface#13665)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` #13665

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` #13665

stas00 commented Sep 21, 2021 •

edited

patil-suraj left a comment

sgugger left a comment

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward #13665

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward #13665

Conversation

stas00 commented Sep 21, 2021 • edited

patil-suraj left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` #13665

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` #13665

stas00 commented Sep 21, 2021 •

edited