Register DefaultBackend implementations for functional/inplace structured operators #53037

ezyang · 2021-03-01T19:54:51Z

Stack from ghstack:

Register DefaultBackend implementations for functional/inplace structured operators #53037 Register DefaultBackend implementations for functional/inplace structured operators
Fix minor inaccuracy in translate error reporting #53032 Fix minor inaccuracy in translate error reporting

As remarked in #52277 it is easy to give an (inefficient, due to extra
redispatches) DefaultBackend implementation of foo and foo_ in terms of
foo_out. This patch enables code generation for DefaultBackend in these
cases by default for all structured kernels. You can see the payoff
in MSNPU extension: it only has to register a kernel for add.out, and it
gets add and add_ kernels automatically.

The actual code changes are very modest:

When DefaultBackend, call the dispatched (not direct native::)
functions to allocate tensors, change device guard, etc
Don't call impl() for DefaultBackend (as it doesn't exist); instead,
directly generate a call to at::foo_out to do the actual work.
Do NOT generate DefaultBackend implementation for foo_out. Actually,
there is a case to be made for this being a good idea with more infra;
see comments inside.

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D26731225

…ured operators As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

facebook-github-bot · 2021-03-01T19:55:02Z

💊 CI failures summary and remediations

As of commit e86c164 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

…ured operators As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e45111af6bf9c908c894c0576739a9770e1123b2 Pull Request resolved: #53037

bdhirsh · 2021-03-01T20:14:55Z

tools/codegen/dest/register_dispatch_key.py

+        if self.dispatch_key == DispatchKey.DefaultBackend and f.func.kind() is SchemaKind.out:
+            # Never generate a default implementation for out, that's what you
+            # have to define as a backend implementor
+            return None


Maybe I'm missing something- Doesn't the previous assert prevent you from hitting this case? You have an assert near the top of the file to check that an out variant native_function that's marked structured doesn't have a DefaultBackend key registered.

They're asserting different things.

assert self.dispatch_key not in g.out.dispatch, \ "Do not explicitly specify DefaultBackend dispatch key on structured " \ "functions, they will be automatically generated for you"

asserts that in the dispatch table (in native_functions.yaml), you're not allowed to have a DefaultBackend entry. But even if you DON'T have a DefaultBackend entry, we will still call into codegen (something similar happens for Meta, where you never write Meta in the dispatch table, but Meta always gets generated. We don't iterate over the dispatch dictionary to figure out what to codegen!)

ohhh yep, that's right. Thanks!

bdhirsh · 2021-03-01T20:55:26Z

tools/codegen/dest/register_dispatch_key.py

+                # I got in trouble because that meant I needed a DispatchKeySet
+                # in the wrapper function, which meant I needed a DispatchKeySet
+                # in the DispatchKeyFunctions declarations, but the defined API
+                # there does NOT permit a dispatch key set.  I think you can


Hmm I might take a quick look after the PR lands to see what it would take to get the wrapper to take in a DispatchKeySet. Since if we can get these functions to take in that argument, the dispatcher should just plumb the keyset through for free.

I briefly attempted it and I mostly concluded that I really wanted DispatchKeySet to be in the argument list, so that translate could handle it automatically.

bdhirsh · 2021-03-01T21:09:56Z

tools/codegen/dest/register_dispatch_key.py

@@ -257,6 +261,9 @@ def gen_class_set_output_body(self, k: SchemaKind) -> str:
                elif self.dispatch_key == DispatchKey.CUDA:
                    empty_impl = "at::native::empty_cuda"
                    empty_strided_impl = "at::native::empty_strided_cuda"
+                elif self.dispatch_key == DispatchKey.DefaultBackend:
+                    empty_impl = "at::empty"


Totally agreed that this makes the most sense to unblock native functions with DefaultBackend functional/inplace variants for structured kernels.

I was wondering if we have a more long-term view on what backend guarantees we want to make for functional/inplace ops. It seems a little arbitrary that some unary ops have DefaultBackend functional/inplace variants, while most other ops give the same backend guarantees to all of their variants (either they're all Math, or they all have backend-specific implementations).

So right now, a custom backend would need to implement their own version of add/add_/add_out, to get full add functionality, but only sin_out to get everything for sin. That also has the minor downside of making those variants just-a-bit-slower by requiring an extra trip through the dispatcher.

It doesn't seem like one is one is obviously better than the other, but do you think we'd ever want to introduce some amount of uniformity for what external backends can expect across ops? The current way of doing that is that backends need to look at entry of native_functions.yaml individually and implement the variants not listed as Math/DefaultBackend. That's also a perfectly fine state to be in, so I don't think this is super important- mostly just curious.

Totally agreed that this makes the most sense to unblock native functions with DefaultBackend functional/inplace variants for structured kernels.

This is a misunderstanding of what the patch does. Although there is a test for dispatch key for DefaultBackend here, this code is exercised even if a structured kernel doesn't have any DefaultBackend in the dispatch dictionary in native functions yaml. An easy way to see this is add, which is exercised by the tests in this PR, doesn't have DefaultBackend in the dispatch dictionary:

- func: add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!) structured: True structured_inherits: TensorIteratorBase dispatch: CPU, CUDA: add_out SparseCPU: add_out_sparse_cpu SparseCUDA: add_out_sparse_cuda MkldnnCPU: mkldnn_add_out

So right now, a custom backend would need to implement their own version of add/add_/add_out, to get full add functionality, but only sin_out to get everything for sin.

This is factually incorrect. Both add and sin (after @hameerabbasi's PR) are structured, and in fact in both cases you only need to implement add_out and sin_out to get full functionality. If you registered a specialized implementation you can get it a little faster (from skipped redispatches), but it is not necessary.

It is true that it is a bit arbitrary whether or not something is structured or not to say what you need to implement. The fix for that is to port more stuff to structured.

Let me know if there is something else you were getting at here that I missed. I think there is generally a question about "what should we expect backends to implement" and in some cases (e.g., XLA) I don't think it is an improvement to only have to implement the out version; they'd rather implement the functional version. But we can only pick one direction to do the defaulting the way the dispatcher is currently setup, need some more fancy gear otherwise.

Thanks for writing that out! Staring at the codegen output from this PR also helped a little. Just to lay it out, what I had in my head was that we had to choose between two options (taking sin as an example):

Register a specialized version of sin that skips redispatches, but requires separate implementations for other backends

Register a backend-agnostic version of sin. Backends don't need to implement their own version, but it requires extra dispatches

The structured kernel codegen now just picks both choices, and uses the fact that DefaultBackend is an alias key to ensure that "the right thing" happens. If another backend registers their own specialized implementation, the dispatcher will give that implementation precedence. Otherwise, we default to the DefaultBackend implementation.

You probably already had this in mind with this issue, but we can probably do something similar with overloads- potentially also for overloads that don't necessarily call each other for performance reasons. For example, I started a port of pow, and it has an overload pow.Tensor_Scalar(Tensor self, Scalar exponent) -> Tensor that's implemented directly as a unary op, even though it could technically be written a little less efficiently to call out to the more general Tensor overload. I could imagine doing something similar to here, where we keep the specialized implementation of the overload for perf reasons, but also codegen a slower, backend-agnostic variant of it that calls the more general Tensor overload.

@ezyang @bdhirsh just to check my understanding, I think this PR narrows the BC-breakage window substantially but doesn't entirely close it. (Not that it should, just want to make sure I'm thinking about all the moving parts properly.)

E.g. if we have

an unstructured op with dispatch configured a la sin in native_functions.yaml:

- func: foo ... dispatch: DefaultBackend: foo - func: foo_ ... dispatch: DefaultBackend: foo_ - func: foo.out ... dispatch: CPU, CUDA: foo_out

where foo and foo_ kernels really do work for all backends

and an out-of-tree backend provider relies on this support for foo and foo_

and the provider doesn't care about, and has not registered a kernel for, foo.out

Then before this PR, moving foo/foo_/foo.out to SK would suddenly oblige the backend provider to register kernels for foo and foo_ to keep the same level of support it had previously gotten for free.

After this PR, the generated DefaultBackend support means this is no longer necessary (though still allowed), but IIUC the backend provider will still need to have registered a kernel for foo.out for the op to actually run, correct?

@bhosmer I don't know how to square (4) with (3).

and the provider doesn't care about, and has not registered a kernel for, foo.out

OK, so the provider hasn't registered foo.out. So.... how exactly was the DefaultBackend foo and foo_ implemented? Can't be by calling foo.out, because if they actually did that then it would be broken if they hadn't registered foo.out. So you're saying that foo in some sort of other way that doesn't use out... to which I say, "Don't do that, and structured kernels will make you not do that."

Right, I'm just confirming that this PR assumes that dependency between unstructured variants, and that it's possible for an unstructured op to break the assumption while still coloring inside the lines, registration-wise. I don't think we should try to handle this case - I have zero ideas for real-world ops that would actually use it - but wanted to confirm my understanding of exactly which configurations this PR guarantees BC for.

bdhirsh

Just left a few comments but none that should require code changes. LGTM!

…lace structured operators" As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D26731225](https://our.internmc.facebook.com/intern/diff/D26731225) [ghstack-poisoned]

…ured operators As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c3ead7bd9eda790e22ce5776cc3168c63d63a00b Pull Request resolved: #53037

facebook-github-bot · 2021-03-02T22:13:38Z

@ezyang merged this pull request in 37bf6c1.

…ured operators (pytorch#53037) Summary: Pull Request resolved: pytorch#53037 As remarked in pytorch#52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985

ezyang mentioned this pull request Mar 1, 2021

Fix minor inaccuracy in translate error reporting #53032

Closed

facebook-github-bot added the cla signed label Mar 1, 2021

ezyang requested review from hameerabbasi and bdhirsh March 1, 2021 19:55

bdhirsh reviewed Mar 1, 2021

View reviewed changes

bdhirsh approved these changes Mar 1, 2021

View reviewed changes

facebook-github-bot closed this in 37bf6c1 Mar 2, 2021

facebook-github-bot added the Merged label Mar 2, 2021

facebook-github-bot deleted the gh/ezyang/917/head branch March 6, 2021 15:17

ezyang mentioned this pull request Apr 16, 2021

[structured] Porting to structured kernels can be BC breaking for non-differentiable operators that used CompositeImplicitAutograd #56273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register DefaultBackend implementations for functional/inplace structured operators #53037

Register DefaultBackend implementations for functional/inplace structured operators #53037

ezyang commented Mar 1, 2021 •

edited

facebook-github-bot commented Mar 1, 2021 •

edited

bdhirsh Mar 1, 2021

ezyang Mar 1, 2021

bdhirsh Mar 1, 2021

bdhirsh Mar 1, 2021

ezyang Mar 1, 2021

bdhirsh Mar 1, 2021

ezyang Mar 1, 2021

bdhirsh Mar 1, 2021

bhosmer Mar 2, 2021

ezyang Mar 2, 2021

bhosmer Mar 2, 2021

bdhirsh left a comment

facebook-github-bot commented Mar 2, 2021

Register DefaultBackend implementations for functional/inplace structured operators #53037

Register DefaultBackend implementations for functional/inplace structured operators #53037

Conversation

ezyang commented Mar 1, 2021 • edited

facebook-github-bot commented Mar 1, 2021 • edited

💊 CI failures summary and remediations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdhirsh left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Mar 2, 2021

ezyang commented Mar 1, 2021 •

edited

facebook-github-bot commented Mar 1, 2021 •

edited