fix inference mode / PyDispatcher / Functionalize interaction #103275

bdhirsh · 2023-06-08T22:06:08Z

This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to CompositeImplicitAutograd, and a C++ decomp registered directly to the Functionalize key, so the C++ decomp gets precedence over the python decomp.

The way this showed up was that a model was running matmul() under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call .sizes() on a tensor with dynamic shapes" error.

For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the CompositeImplicitAutograd key, this PR just automatically registers that decomp to the Functionalize key at the same time.

I'm trying to remember now why we didn't just add Functionalize (and all of the other functorch transform keys) directly to the CompositeImplicitAutograd alias keyset, but I couldn't remember (@zou3519 any chance you remember?).

Stack from ghstack (oldest at bottom):

-> fix inference mode / PyDispatcher / Functionalize interaction #103275

cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-06-08T22:06:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103275

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8871c77:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5b7cf0d317ce5dcbbf93c7a76aa8dcfef9f38c21 Pull Request resolved: #103275

ezyang

OHHH this is a good catch

ezyang · 2023-06-09T14:49:05Z

torch/_ops.py

+                # to use it, instead of the C++ decomp. We can't though, because Functionalize
+                # isn't part of the CompositeImplicitAutograd alias set.
+                # (open quesetion: will we eventually need to do this for functorch transform keys too?)
+                self.py_kernels[torch._C.DispatchKey.Functionalize] = fn


A less hacky version of this would be to modify our py_impl CompositeImplicitAutograd sites to call some higher level function which takes care of doing both registrations. This would be good because the hack as written violates the invariant that one py_impl does one registration.

Hmm the only thing I'm worried about is that people might forget to use that new API, and continuing to register future ops with py_impl(CompositeImplicitAutograd) (which will now always be subtly wrong). If you're not worried about that though, then I can change it (or.. maybe just have py_impl() give you a nice error message if you pass in that key?)

zou3519 · 2023-06-09T15:12:38Z

torch/_ops.py

@@ -122,6 +122,14 @@ def inner(fn):
                    f"Trying to override a python impl for {k} on operator {self.name()}"
                )
            self.py_kernels[k] = fn
+            if k == torch._C.DispatchKey.CompositeImplicitAutograd and torch._C.DispatchKey.Functionalize not in self.py_kernels:


I'm trying to remember now why we didn't just add Functionalize (and all of the other functorch transform keys) directly to the CompositeImplicitAutograd alias keyset, but I couldn't remember (@zou3519 any chance you remember?).

This PR is good as-is, but if we want to be less hacky we should just make Functionalize a part of the CompositeImplicitAutograd alias keyset (and maybe we should rename it since it is now more than just autograd?)

Not all CompositeImplicitAutograd operations work with vmap (due to "not preserve Tensor subclass-ness"), which is why vmap isn't there. Although we've fixed most of these cases, I am wary of actually adding vmap to the CompositeImplicitAutograd set because it is unclear what % of aten ops our OpInfos actually cover.

This suggestion, if it works, is better!

Agreed - I'll try it and see if there's fallout.

There was actually a comment about it (hooray!) here. At the time, at::ones() and friends were all CompositeImplicitAutograd, and decomposed into empty() + fill_(). I think we didn't want functionalization to decompose ones() since it would result in a bunch of unnecessary functionalization logic running. But this shouldn't really matter anymore, since those factory functions all got changed to be CompositeExplicitAutograd.

…ion" Fixes #103132 This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to `CompositeImplicitAutograd`, and a C++ decomp registered *directly* to the `Functionalize` key, so the C++ decomp gets precedence over the python decomp. The way this showed up was that a model was running `matmul()` under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call `.sizes()` on a tensor with dynamic shapes" error. For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the `CompositeImplicitAutograd` key, this PR just automatically registers that decomp to the `Functionalize` key at the same time. I'm trying to remember now why we didn't just add `Functionalize` (and all of the other functorch transform keys) directly to the `CompositeImplicitAutograd` alias keyset, but I couldn't remember (zou3519 any chance you remember?). cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy aakhundov [ghstack-poisoned]

ghstack-source-id: 834bb63738908aee2ad4cf0471d73d76c8dd9a7f Pull Request resolved: #103275

…ion" Fixes #103132 This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to `CompositeImplicitAutograd`, and a C++ decomp registered *directly* to the `Functionalize` key, so the C++ decomp gets precedence over the python decomp. The way this showed up was that a model was running `matmul()` under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call `.sizes()` on a tensor with dynamic shapes" error. For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the `CompositeImplicitAutograd` key, this PR just automatically registers that decomp to the `Functionalize` key at the same time. I'm trying to remember now why we didn't just add `Functionalize` (and all of the other functorch transform keys) directly to the `CompositeImplicitAutograd` alias keyset, but I couldn't remember (zou3519 any chance you remember?). cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy aakhundov [ghstack-poisoned]

ghstack-source-id: cf06aa9bd5ce15c2d6ebd36aea239514c5a38e96 Pull Request resolved: #103275

bdhirsh · 2023-06-20T18:24:57Z

@pytorchbot rebase

pytorchmergebot · 2023-06-20T18:27:01Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…ion" Fixes #103132 This is kind of annoying: Functionalization (and also vmap, I think?) manually figures out which ops have C++ CompositeImplicit decomps, and directly registers them to the Functionalize key. This is a problem for the PyDispatcher: We normally want the PyDispatcher to take precedence over the regular dispatcher. But in this case, we have a python decomp registered to `CompositeImplicitAutograd`, and a C++ decomp registered *directly* to the `Functionalize` key, so the C++ decomp gets precedence over the python decomp. The way this showed up was that a model was running `matmul()` under inference mode, so we never hit the autograd dispatch key, and go straight to the functionalize dispatch key. Matmul has both a python decomp and a c++ decomp, but we were running the C++ decomp. That C++ decomp isn't meant to be used with dynamic shapes, so we were failing with the "tried to call `.sizes()` on a tensor with dynamic shapes" error. For now, I had the PyDispatcher mimic the behavior of functionalization codegen: when you register a python decomp to the `CompositeImplicitAutograd` key, this PR just automatically registers that decomp to the `Functionalize` key at the same time. I'm trying to remember now why we didn't just add `Functionalize` (and all of the other functorch transform keys) directly to the `CompositeImplicitAutograd` alias keyset, but I couldn't remember (zou3519 any chance you remember?). cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 aakhundov [ghstack-poisoned]

pytorchmergebot · 2023-06-20T18:27:16Z

Successfully rebased gh/bdhirsh/425/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/103275)

ghstack-source-id: f54feef356e9ee866298d53c3f040c10d612780b Pull Request resolved: #103275

bdhirsh · 2023-06-21T00:43:52Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T00:45:23Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

bdhirsh · 2023-06-21T14:44:22Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T14:46:30Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

bdhirsh · 2023-06-21T15:17:27Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T15:19:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fix inference mode / PyDispatcher / Functionalize interaction

bba125e

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Jun 8, 2023

fix inference mode / PyDispatcher / Functionalize interaction

9bd4949

ghstack-source-id: 5b7cf0d317ce5dcbbf93c7a76aa8dcfef9f38c21 Pull Request resolved: #103275

github-actions bot requested review from albanD, antoniojkim, ezyang, jbschlosser, miladm, SherlockNoMad, voznesenskym and wconstab June 8, 2023 22:06

github-actions bot added the module: dynamo label Jun 8, 2023

bdhirsh mentioned this pull request Jun 8, 2023

[dynamo] Error "Inference tensors do not track version counter" in inference_mode w/ llama7b #101151

Closed

ezyang approved these changes Jun 9, 2023

View reviewed changes

ezyang reviewed Jun 9, 2023

View reviewed changes

zou3519 approved these changes Jun 9, 2023

View reviewed changes

zou3519 reviewed Jun 9, 2023

View reviewed changes

bdhirsh added a commit that referenced this pull request Jun 9, 2023

fix inference mode / PyDispatcher / Functionalize interaction

a6414d3

ghstack-source-id: 834bb63738908aee2ad4cf0471d73d76c8dd9a7f Pull Request resolved: #103275

bdhirsh added a commit that referenced this pull request Jun 9, 2023

fix inference mode / PyDispatcher / Functionalize interaction

6685a2a

ghstack-source-id: cf06aa9bd5ce15c2d6ebd36aea239514c5a38e96 Pull Request resolved: #103275

bdhirsh added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 20, 2023

pytorchmergebot pushed a commit that referenced this pull request Jun 20, 2023

fix inference mode / PyDispatcher / Functionalize interaction

ae4412d

ghstack-source-id: f54feef356e9ee866298d53c3f040c10d612780b Pull Request resolved: #103275

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot removed the merging label Jun 21, 2023

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot removed the merging label Jun 21, 2023

bdhirsh added the release notes: composability release notes category label Jun 21, 2023

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot added Merged and removed merging labels Jun 21, 2023

pytorchmergebot closed this in 3cfd677 Jun 21, 2023

facebook-github-bot deleted the gh/bdhirsh/425/head branch June 25, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix inference mode / PyDispatcher / Functionalize interaction #103275

fix inference mode / PyDispatcher / Functionalize interaction #103275

bdhirsh commented Jun 8, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jun 8, 2023 •

edited

ezyang left a comment

ezyang Jun 9, 2023

bdhirsh Jun 9, 2023 •

edited

zou3519 Jun 9, 2023 •

edited

ezyang Jun 9, 2023

bdhirsh Jun 9, 2023 •

edited

bdhirsh commented Jun 20, 2023

pytorchmergebot commented Jun 20, 2023

pytorchmergebot commented Jun 20, 2023

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

fix inference mode / PyDispatcher / Functionalize interaction #103275

fix inference mode / PyDispatcher / Functionalize interaction #103275

Conversation

bdhirsh commented Jun 8, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Jun 8, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103275

✅ No Failures

ezyang left a comment

Choose a reason for hiding this comment

ezyang Jun 9, 2023

Choose a reason for hiding this comment

bdhirsh Jun 9, 2023 • edited

Choose a reason for hiding this comment

zou3519 Jun 9, 2023 • edited

Choose a reason for hiding this comment

ezyang Jun 9, 2023

Choose a reason for hiding this comment

bdhirsh Jun 9, 2023 • edited

Choose a reason for hiding this comment

bdhirsh commented Jun 20, 2023

pytorchmergebot commented Jun 20, 2023

pytorchmergebot commented Jun 20, 2023

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

Merge failed

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

Merge failed

bdhirsh commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

Merge started

bdhirsh commented Jun 8, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jun 8, 2023 •

edited

bdhirsh Jun 9, 2023 •

edited

zou3519 Jun 9, 2023 •

edited

bdhirsh Jun 9, 2023 •

edited