Fix a corner case where view tensor update will be ignore by mark_step #4232

JackCaoG · 2022-11-22T04:42:32Z

This is to fix #4189. Even if a tensor has xla_data, we might still have to update it if there is an active view associated with the tensor.

FYI @Liyang90 @alanwaketan

JackCaoG · 2022-11-22T04:44:27Z

@wonjoolee95 let's revert this change once functionization pass migration is completed, we need to make newly added test still works through.

wonjoo-wj · 2022-11-22T07:47:20Z

Thanks for the heads up. Are the CI tests failures expected?

JackCaoG · 2022-11-22T18:39:23Z

@wonjoolee95 lol I need to rebase, let me do that.

yeounoh · 2022-11-22T18:49:40Z

torch_xla/csrc/tensor.cpp

+        // A tensor's xla_data might not be up to date if there is a view
+        // associated with it. Make sure to sync those tensors here too.
+        (tensors[i]->CurrentXlaData() == nullptr ||
+         tensors[i]->data()->view != nullptr)) {


Awesome, @JackCaoG

yeounoh · 2022-11-22T19:27:30Z

test/test_operations.py

+    xm.mark_step()
+    t1[12] = 1123
+    xm.mark_step()
+    self.assertNotIn('update_slice',


Wouldn't the graph be trimmed anyway after mark_step()? I just tried on my branch without this patch, and it doesn't have the update_slice after. But maybe I am not following / understanding 🥟

nit. It might be helpful to add a brief comment explaining what this assertion is about -- esp for the future readers who haven't been following the original issue.

hmm, without this change, the test failed at my end.

The bug is that the second mark_step won't execute the IR of t1, can you verify?

Ugh, it failed -- sorry for the false alarm. I can verify that the graph wasn't trimmed after the second mark_step.

yeounoh

One nit comment

JackCaoG · 2022-11-22T21:25:23Z

weird that test still failing, trying to repo on my end.

yeounoh

LGTM -- with a nit comment. Feel free to merge once you resolve the test failures.

JackCaoG · 2022-11-22T21:53:53Z

ok error is real and it is a regression, I will look into it a bit.

JackCaoG · 2022-11-22T22:49:39Z

Seems like I need to make this check smarter

[ScheduleSyncTensorsGraph]
TensorsGraphInfo:
  mark_step (/pytorch/xla/torch_xla/core/xla_model.py:953)
  optimized_mod (/pytorch/torch/_dynamo/optimizations/torchxla_integration.py:186)
  _fn (/pytorch/torch/_dynamo/eval_frame.py:194)
  run_model_with_dynamo (test/dynamo/test_dynamo.py:25)
  _fn (/pytorch/torch/_dynamo/eval_frame.py:194)
  test_resnet18 (test/dynamo/test_dynamo.py:62)
  run (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/case.py:628)
  __call__ (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/case.py:676)
  run (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/suite.py:122)
  __call__ (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/suite.py:84)
  run (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/suite.py:122)
  __call__ (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/suite.py:84)
  run (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/runner.py:176)
  runTests (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/main.py:271)
  __init__ (/root/anaconda3/envs/pytorch/lib/python3.7/unittest/main.py:101)
  <module> (test/dynamo/test_dynamo.py:76)

Hashes: (91e5b939b215944892e246b99ed1d826)

## BEGIN_GRAPH
IR {
  %0 = f32[1000,512]{1,0} xla::device_data(), location=forward@linear.py:114, device=CPU:0, ROOT=0
}

currently it will execute a graph that only contains a single device_data.

miladm · 2022-12-13T00:16:52Z

This is an interesting bug/fix.
@JackCaoG did we run any tests to determine the perf influence of this change?

JackCaoG · 2022-12-13T00:22:25Z

not really, this should be a rare case since most tensor should not outlive the scope of a single mark_step. This happened in lighting because they used a tensor list to record some running average.

JackCaoG requested review from wonjoo-wj and yeounoh November 22, 2022 04:42

JackCaoG mentioned this pull request Nov 22, 2022

Behavior of in-place updated xla tensor at mark_step() #4189

Closed

Fix a corner case where view tensor update will be ignore by mark_step

119155e

JackCaoG force-pushed the jackcao/fix_view_not_update branch from 75451ef to 119155e Compare November 22, 2022 18:40

yeounoh reviewed Nov 22, 2022

View reviewed changes

yeounoh suggested changes Nov 22, 2022

View reviewed changes

yeounoh approved these changes Nov 22, 2022

View reviewed changes

Also check for view is up to date

433caad

JackCaoG merged commit ba9f0df into master Nov 23, 2022

Fix a corner case where view tensor update will be ignore by mark_step #4232

Fix a corner case where view tensor update will be ignore by mark_step #4232

Conversation

JackCaoG commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackCaoG commented Nov 22, 2022

Uh oh!

wonjoo-wj commented Nov 22, 2022

Uh oh!

JackCaoG commented Nov 22, 2022

Uh oh!

yeounoh Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yeounoh Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

JackCaoG Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

JackCaoG Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yeounoh Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yeounoh left a comment

Choose a reason for hiding this comment

Uh oh!

JackCaoG commented Nov 22, 2022

Uh oh!

yeounoh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG commented Nov 22, 2022

Uh oh!

JackCaoG commented Nov 22, 2022

Uh oh!

miladm commented Dec 13, 2022

Uh oh!

JackCaoG commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JackCaoG commented Nov 22, 2022 •

edited

Loading

yeounoh left a comment •

edited

Loading