Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward #13665

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Sep 21, 2021

This PR fixes a potential performance issue in general and a failure under Deepspeed when the following models are used under mixed precision with positional embedding resizing at forward time:

  • speech_to_text
  • m2m_100
  • fsmt

Currently when SinusoidalPositionalEmbedding.forward is called if it resizes the embeddings it ignores the original correct dtype and forces the embeddings into fp32, so the inputs are in fp32 now.

I detected the issue with deepspeed, which doesn't use amp but forces the model into fp16 and then of course if the input is in the wrong dtype we get:

deepspeed  examples/pytorch/translation/run_translation.py --train_file tests/fixtures/tests_samples/wmt_en_ro/train.json --source_lang en --target_lang ro --model_name_or_path hf-internal-testing/tiny-random-m2m_100 --do_train --max_train_samples 4 --per_device_train_batch_size 2 --num_train_epochs 1 --fp16 --report_to none --overwrite_output_dir --deepspeed tests/deepspeed/ds_config_zero2.json --output_dir /tmp/tmpi4k4wz8s --save_steps 1
[...]
  File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 393, in forward
    hidden_states = self.final_layer_norm(hidden_states)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 173, in forward
    return F.layer_norm(
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/functional.py", line 2346, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

So hidden_states ends up being fp32 instead of fp16 because the pos_emb is fp32.

I checked all models matching SinusoidalPositionalEmbedding and all the others that aren't modified by this PR don't do dynamic resizing at run time.

I haven't checked non-SinusoidalPositionalEmbedding - perhaps those have an issue too.

The test will be in #12695 as soon as this PR gets merged.

@patil-suraj, @LysandreJik, @sgugger

@stas00 stas00 changed the title [SinusoidalPositionalEmbedding] incorrect dtype when make_weights in forward [SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward Sep 21, 2021
Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Thanks a lot for fixing this.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing!

@stas00 stas00 merged commit a722c30 into huggingface:master Sep 21, 2021
@stas00 stas00 deleted the fix-pos-emb-dtype branch September 21, 2021 16:05
Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants