-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[dynamo] handle setting .data on a tensor #113080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cleanup fix tests [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113080
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 374d303 with merge base ec124b9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cleanup fix tests cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
cleanup fix tests cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Need to make a few changes, holdup |
We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The plan/RFC here is to force re-creating the fake_tensor using the passed in input, so that aot_autograd does not choke on the graph it makes at runtime. cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
torch/_subclasses/meta_utils.py
Outdated
if force_fresh: | ||
del self.tensor_memo[WeakIdRef(t)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang this might have weird implications for dynamic shapes, we should discuss. Will ping you.
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The plan/RFC here is to force re-creating the fake_tensor using the passed in input, so that aot_autograd does not choke on the graph it makes at runtime. cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
torch/_functorch/aot_autograd.py
Outdated
try: | ||
original_inpt.set_(updated_inpt) | ||
except RuntimeError: | ||
original_inpt.data = updated_inpt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdhirsh this passes, but is garbage, let's figure out a better way to do this? Maybe what we need here is to track if the original_inpt set_() was due to a .data change, since that bypasses size safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask what error is raised? Why does set_ check size safety?
Is the error due to checking size when it consists of dynamic shapes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what about this:
# We can technically run under no_grad and avoid the VC bump to avoid any issues at runtime
# For full correctness, we should probably make sure that we **still** bump the VC, if the eager mode code also would have bumped the VC
ctx = nullcontext()
if detected_vc_did_not_change_at_compile_time:
ctx = torch.autograd._unsafe_preserve_version_counter(original_inpt)
with torch.no_grad(), ctx:
original_inpt.set_(updated_inpt)
If we're saying "the only forms of .data we will support in torch.compile are those that can be emulated through set_() + VC setting", then it should be safe to assume that if we got here at runtime, we should be able to use those same API's?
As a follow-up / corner case, could you also check if
Require the same handling? I think Not sure what |
# new .data value size and shape differences will cause | ||
# tracked fakes to produce incorrect guards. This is sound because the TensorVariable | ||
# coming out of set_() below will be a new one, and get | ||
# installed in tracked fakes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm a little unclear on what tracked_fakes
is used for, and why we still need this extra handling now that we have the "immutable FakeTensorMode
infra. Just leaving the comment but I'll read the tracked_fakes
code some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh at first I was going to say set_()
is an inplace op so it will just return the same (mutated) FakeTensor as the input".
But I see that tracked_fakes
is a List not a Set. So I think this makes sense to me. One question I have is: does that mean that we'll end up with duplicate FakeTensors in the tracked_fakes
list whenever we have an ordinary inplace op? And I guess that's fine because that list is only really used by ShapEnv
, which is only using it to generate guards on the FakeTensor symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially? We should test for this.
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
torch/_dynamo/variables/builtin.py
Outdated
(out.as_proxy(),), | ||
{}, | ||
) | ||
obj.as_proxy().node.meta['example_value'] = _lower_version_count_by_1(obj.as_proxy().node.meta['example_value']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doens't really matter but no need to re-assign (we're just directly mutating the
FakeTensor`'s VC)
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
cleanup fix tests ghstack-source-id: a87bc01 Pull Request resolved: #113080 fix tests fix tests fix tests [dynamo] Not working yet - .add_() w/ aot_autograd (missing synthetic_base_info) ghstack-source-id: a87bc01 Pull Request resolved: #113081 fix tests aot_autograd fixes fix tests Refactor Revert "[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043)" This reverts commit ddfe572. ghstack-source-id: a87bc01 Pull Request resolved: #113205 Refactor Refactor Refactor Refactor Refactor Refactor cache to shape env ghstack-source-id: a87bc01 Pull Request resolved: #113295 cache to shape env fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy Builder fix Builder fix lint bad test bad test bad tests Alternate impl ghstack-source-id: a87bc01 Pull Request resolved: #113576 Alternate impl undo wip Fixes Fixes Fixes wip lint, test fix
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
cleanup fix tests ghstack-source-id: 7a543f8 Pull Request resolved: #113080 fix tests fix tests fix tests [dynamo] Not working yet - .add_() w/ aot_autograd (missing synthetic_base_info) ghstack-source-id: 7a543f8 Pull Request resolved: #113081 fix tests aot_autograd fixes fix tests Refactor Revert "[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043)" This reverts commit ddfe572. ghstack-source-id: 7a543f8 Pull Request resolved: #113205 Refactor Refactor Refactor Refactor Refactor Refactor cache to shape env ghstack-source-id: 7a543f8 Pull Request resolved: #113295 cache to shape env fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy Builder fix Builder fix lint bad test bad test bad tests Alternate impl ghstack-source-id: 7a543f8 Pull Request resolved: #113576 Alternate impl undo wip Fixes Fixes Fixes wip lint, test fix woopsie fix test
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
cleanup fix tests ghstack-source-id: 5343720 Pull Request resolved: #113080 fix tests fix tests fix tests [dynamo] Not working yet - .add_() w/ aot_autograd (missing synthetic_base_info) ghstack-source-id: 5343720 Pull Request resolved: #113081 fix tests aot_autograd fixes fix tests Refactor Revert "[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043)" This reverts commit ddfe572. ghstack-source-id: 5343720 Pull Request resolved: #113205 Refactor Refactor Refactor Refactor Refactor Refactor cache to shape env ghstack-source-id: 5343720 Pull Request resolved: #113295 cache to shape env fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy Builder fix Builder fix lint bad test bad test bad tests Alternate impl ghstack-source-id: 5343720 Pull Request resolved: #113576 Alternate impl undo wip Fixes Fixes Fixes wip lint, test fix woopsie fix test rm replace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
cleanup fix tests ghstack-source-id: c89b426 Pull Request resolved: #113080 fix tests fix tests fix tests [dynamo] Not working yet - .add_() w/ aot_autograd (missing synthetic_base_info) ghstack-source-id: c89b426 Pull Request resolved: #113081 fix tests aot_autograd fixes fix tests Refactor Revert "[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043)" This reverts commit ddfe572. ghstack-source-id: c89b426 Pull Request resolved: #113205 Refactor Refactor Refactor Refactor Refactor Refactor cache to shape env ghstack-source-id: c89b426 Pull Request resolved: #113295 cache to shape env fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy fake policy Builder fix Builder fix lint bad test bad test bad tests Alternate impl ghstack-source-id: c89b426 Pull Request resolved: #113576 Alternate impl undo wip Fixes Fixes Fixes wip lint, test fix woopsie fix test rm replace rm replace
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
**Dynamo** We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. **aot_autograd** For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd Pull Request resolved: pytorch#113080 Approved by: https://github.com/bdhirsh
Stack from ghstack (oldest at bottom):
Dynamo
We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine.
The safe recipe is:
This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor.
aot_autograd
For aot_autograd, there's another snag.
Specifically, when we invoke aot_autograd, we call
fake_mode.from_tensor()
, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls liketorch.ops.aten.view.default(primals_2, [0])
where primals is actually sized([6])
on input.The new plan here is to:
cc @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng