Fix comparison in ReinitializeTensor #16294
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
In
ReinitializeTensor
, we comparetensor->GetDevice()
andoptions.device()
, but in the callsite, we actually just provide an option withdevice_type
, which means thedevice_id
will always be default(-1) foroptions
, but for tensor, although it is passed adevice
with defaultdevice_id
, when we allocate the data, thedevice
of thetensor
is thedevice
ofStorage
, which is thedevice
of underlyingDataPtr
, which is the same as thedevice
of theContext
of the operator, which has a non-defaultdevice_id
.Therefore everytime we do
ReinitializeTensor
, we'll find thedevice
does not match, and after theReinitializeTensor
call, thedevice
still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that usesReinitializeTensor
on multiple GPUs.Reviewed By: BIT-silence
Differential Revision: D13795635