dynamo: mutations on .data should be invisible to autograd #131403

bdhirsh · 2024-07-22T22:44:23Z

our handle for .data in dynamo today basically just converts y = x.data into y = x.detach(). The semantics of these two ops are not quite the same, because:

(1) any future mutations on x.data will be fully ignored by autograd
(2) any mutations on x.detach() will bump x's version counter

the linked model does a .data mutation that is hidden from autograd in eager, but ends up erroring during AOTDispatcher tracing.

I updated dynamo's handling so that:

(1) when dynamo sees a call to getattr(tensor, "data") and calls .detach() we set a flag on the returned TensorVariable indicating it came from .data

(2) on any tensor method that we call with an input TensorVariable with this flag turned on, we proxy autograd's preserve_version_counter logic into the graph, to properly reset the VC after the op is run.

One thing to note is that I don't actually do this on every op that we pass the tensor to: I only do it for tensor methods that appear to be mutations (by checking for a trailing underscore). My thought was that:

(1) I didn't want to do this for every op that you pass y into, since that will e.g. triple the number of nodes in the graph, and could cause compile time regressions if you use .data

(2) this situation is pretty rare in general, and I'm hoping that "tensor method mutations" cover most reasonable mutation cases. If we manage to miss a case, you will get a loud error during tracing anyway, so there is not a safety issue.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

[ghstack-poisoned]

pytorch-bot · 2024-07-22T22:44:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131403

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9a55056 with merge base 2ff98bc ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #121353 our handle for `.data` in dynamo today basically just converts `y = x.data` into `y = x.detach()`. The semantics of these two ops are not quite the same, because: (1) any future mutations on `x.data` will be fully ignored by autograd (2) any mutations on `x.detach()` will bump x's version counter the linked model does a .data mutation that is hidden from autograd in eager, but ends up erroring during AOTDispatcher tracing. I updated dynamo's handling so that: (1) when dynamo sees a call to `getattr(tensor, "data")` and calls `.detach()` we set a flag on the returned `TensorVariable` indicating it came from `.data` (2) on any tensor method that we call with an input `TensorVariable` with this flag turned on, we proxy autograd's `preserve_version_counter` logic into the graph, to properly reset the VC after the op is run. One thing to note is that I don't actually do this on every op that we pass the tensor to: I only do it for tensor methods that appear to be mutations (by checking for a trailing underscore). My thought was that: (1) I didn't want to do this for **every** op that you pass `y` into, since that will e.g. triple the number of nodes in the graph, and could cause compile time regressions if you use .data (2) this situation is pretty rare in general, and I'm hoping that "tensor method mutations" cover most reasonable mutation cases. If we manage to miss a case, you will get a loud error during tracing anyway, so there is not a safety issue. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

bdhirsh · 2024-07-24T14:17:11Z

cc @anijain2305 / @williamwen42 for review

Fixes #121353 our handle for `.data` in dynamo today basically just converts `y = x.data` into `y = x.detach()`. The semantics of these two ops are not quite the same, because: (1) any future mutations on `x.data` will be fully ignored by autograd (2) any mutations on `x.detach()` will bump x's version counter the linked model does a .data mutation that is hidden from autograd in eager, but ends up erroring during AOTDispatcher tracing. I updated dynamo's handling so that: (1) when dynamo sees a call to `getattr(tensor, "data")` and calls `.detach()` we set a flag on the returned `TensorVariable` indicating it came from `.data` (2) on any tensor method that we call with an input `TensorVariable` with this flag turned on, we proxy autograd's `preserve_version_counter` logic into the graph, to properly reset the VC after the op is run. One thing to note is that I don't actually do this on every op that we pass the tensor to: I only do it for tensor methods that appear to be mutations (by checking for a trailing underscore). My thought was that: (1) I didn't want to do this for **every** op that you pass `y` into, since that will e.g. triple the number of nodes in the graph, and could cause compile time regressions if you use .data (2) this situation is pretty rare in general, and I'm hoping that "tensor method mutations" cover most reasonable mutation cases. If we manage to miss a case, you will get a loud error during tracing anyway, so there is not a safety issue. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

#105290 The problem in the original flow is that: (1) the user calls `torch.mul(complex_tensor, complex_scalar) (2) python arg parser wraps the complex scalar in a `scalar_tensor`, and dispatches to `aten.mul.Tensor(self, scalar_other)` (3) autograd sees `aten.mul.Tensor`, calls `scalar_other.conj()` [here](https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/FunctionsManual.cpp#L597) (4) during proxy tensor tracing, this gets dispatched to `aten._conj(scalar_tensor)` (5) when we hit __torch_dispatch__, the scalar_tensor is converted back into a plain python scalar (6) we error during tracing, because in `FunctionalTensorMode.__torch_dispatch__` we try to redispatch on `aten._conj.default(plain_python_scalar)`, and this overload does not accept python scalars. My attempted fix in this PR is to update `TensorBase::conj()` to check if the current tensor is a scalar tensor (wrapped number), and if so, manually: (1) convert the scalar tensor back into a scalar (2) call scalar.conj() directly (3) convert the result back into a wrapped tensor This avoids having to go through python entirely in the tracing case (which is fine, because these scalar tensors are constants that we can const-prop during tracing anyway). Notable, I did **not** add e.g. a new `aten._conj.Scalar` overload. This would not actually fix the problem, since the bug is that we call `aten._conj.default(python_scalar)` directly. we would also need to muck with all `__torch_dispatch__` call sites to know to convert python scalars back into tensors directly. Pull Request resolved: #131482 Approved by: https://github.com/zou3519, https://github.com/ezyang ghstack dependencies: #131403

…ference graphs (#131665) This ensures that in an inference setting, we properly bump the VC of mutated graph inputs. Previously, we would only properly bump the VC for training graphs. Pull Request resolved: #131665 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #131403, #131482

…exist (#131554) Pull Request resolved: #131554 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #131403, #131482, #131665

dynamo: mutations on .data should be invisible to autograd

7024a35

[ghstack-poisoned]

bdhirsh mentioned this pull request Jul 22, 2024

add python binding for _get_current_graph_task_keep_graph #131038

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Jul 22, 2024

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, ezyang and miladm July 22, 2024 22:44

bdhirsh added 2 commits July 23, 2024 11:02

bdhirsh mentioned this pull request Jul 23, 2024

_get_operation_overload: dont raise exception when overload does not exist #131554

Closed

bdhirsh requested review from anijain2305 and zou3519 July 24, 2024 14:16

bdhirsh added 2 commits July 24, 2024 07:32

bdhirsh mentioned this pull request Jul 24, 2024

AOTDispatcher: properly bump version counter on input mutations in inference graphs #131665

Closed

bdhirsh requested a review from soulitzer as a code owner July 24, 2024 16:08

anijain2305 approved these changes Jul 24, 2024

View reviewed changes

bdhirsh added the release notes: dynamo label Jul 25, 2024

zou3519 approved these changes Jul 25, 2024

View reviewed changes

bdhirsh added 2 commits July 25, 2024 13:33

zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 26, 2024

pytorchmergebot added the Merged label Jul 26, 2024

pytorchmergebot closed this in 8bb9aa9 Jul 26, 2024

bdhirsh mentioned this pull request Jul 29, 2024

DTensor: add more foreach ops to supported sharding prop list #132066

Closed

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

github-actions bot deleted the gh/bdhirsh/587/head branch August 26, 2024 02:01

laithsakka mentioned this pull request Aug 29, 2024

RuntimeError: Cannot set version_counter for inference tensor when get_data_attr is called #134798

Closed

zou3519 mentioned this pull request Oct 31, 2024

Dynamo capture of tensor.data assignment doesn't identical to eager call of tensor.data assignment #139152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dynamo: mutations on .data should be invisible to autograd #131403

dynamo: mutations on .data should be invisible to autograd #131403

Uh oh!

bdhirsh commented Jul 22, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 22, 2024 •

edited

Loading

Uh oh!

bdhirsh commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dynamo: mutations on .data should be invisible to autograd #131403

dynamo: mutations on .data should be invisible to autograd #131403

Uh oh!

Conversation

bdhirsh commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131403

✅ No Failures

Uh oh!

bdhirsh commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdhirsh commented Jul 22, 2024 •

edited

Loading

pytorch-bot bot commented Jul 22, 2024 •

edited

Loading