Skip to content

Handle Duplicated tensors in sync list and interop tensors#2559

Merged
JackCaoG merged 1 commit into
masterfrom
lazy_tensor_fix
Oct 21, 2020
Merged

Handle Duplicated tensors in sync list and interop tensors#2559
JackCaoG merged 1 commit into
masterfrom
lazy_tensor_fix

Conversation

@JackCaoG
Copy link
Copy Markdown
Collaborator

This pr is to address the corner case brought up by Alex in here, note that add lowering is removed. I used a slightly different example here

import torch
import torch_xla

torch.manual_seed(42)

device = 'xla'
x = torch.tensor([1,2,3], device=device)
y = x + x
z = torch.masked_select(y, y.eq(0))

There are two problems in this code

  1. in y = x + x, we will try to sync the value of x twice (since default implemantion of add is used in this case), this is unnecessary. We can use a unordered_set in sync and a unordered_map in GatherTensorsXlaData to avoid execute the same ir_graph twice.

  2. during torch.masked_select(y, y.eq(0)), y has ir_value and tensor_data but does not have xla_data.
    Here is the debugging output when z = torch.masked_select(y, y.eq(0)) is executed

Length of the tensors to be sync = 2

tensorID: 3
ir value: s64[3]{0} xla::device_data, location=<module>@example.py:20, device=CPU:0
with xla data: False
tensor data: 
 2
 4
 6
[ CPULongType{3} ]

tensorID: 4
ir value, pred[3]{0} aten::eq, location=<module>@example.py:20
with xla data: False
with tensor data: False

Currently in CollectSyncTensors we only check ir_value

if (ir_value) {

but in fact we should only sync the tensor if it

  1. does not have xla_data
  2. does not have tensor_data
  3. has ir_value

@JackCaoG JackCaoG requested a review from davidel October 16, 2020 02:36
Copy link
Copy Markdown
Collaborator

@davidel davidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JackCaoG !
I'd add a test ... given you already have one 😉

Comment thread torch_xla/csrc/tensor.cpp Outdated
Comment thread torch_xla/csrc/tensor.cpp
@JackCaoG JackCaoG force-pushed the lazy_tensor_fix branch 2 times, most recently from b9cc30a to a100598 Compare October 20, 2020 00:26
Comment thread torch_xla/csrc/tensor.cpp
Copy link
Copy Markdown
Collaborator

@davidel davidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JackCaoG !

@JackCaoG JackCaoG merged commit be999bd into master Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants