[SinusoidalPositionalEmbedding] incorrect dtype when resizing in forward
#13665
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a potential performance issue in general and a failure under Deepspeed when the following models are used under mixed precision with positional embedding resizing at
forward
time:Currently when
SinusoidalPositionalEmbedding.forward
is called if it resizes the embeddings it ignores the original correct dtype and forces the embeddings intofp32
, so the inputs are infp32
now.I detected the issue with deepspeed, which doesn't use
amp
but forces the model intofp16
and then of course if the input is in the wrong dtype we get:So
hidden_states
ends up beingfp32
instead offp16
because the pos_emb isfp32
.I checked all models matching
SinusoidalPositionalEmbedding
and all the others that aren't modified by this PR don't do dynamic resizing at run time.I haven't checked non-
SinusoidalPositionalEmbedding
- perhaps those have an issue too.The test will be in #12695 as soon as this PR gets merged.
@patil-suraj, @LysandreJik, @sgugger