Skip to content

Conversation

jerryzh168
Copy link
Contributor

Summary:
In ReinitializeTensor, we compare tensor->GetDevice() and options.device(), but in the callsite, we actually just provide an option with device_type, which means the device_id will always be default(-1) for options, but for tensor, although it is passed a device with default device_id, when we allocate the data, the device of the tensor is the device of Storage, which is the device of underlying DataPtr, which is the same as the device of the Context of the operator, which has a non-default device_id.

Therefore everytime we do ReinitializeTensor, we'll find the device does not match, and after the ReinitializeTensor call, the device still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses ReinitializeTensor on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

Summary:
In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.

Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

fbshipit-source-id: d82f0e6e07e26e819010607ff3025e58556365c4
@xiaomengy
Copy link
Contributor

Thanks for this fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants