ATen registrable operator list #38974

fengyuan14 · 2020-05-25T07:31:10Z

🚀 Feature

Expose ATen registrable operators information in ATen operator API

Motivation

When implementing a new out-of-source ATen backend extension for PyTorch, we find it is not easy to find out which ATen operators should be registered and implemented. Extension developer expect to work only based on PyTorch ATen operator API. But in fact, we have to distinguish operators in two cases as below, which depends on PyTorch implementation,

The operator is supported by sparse backend or not. If yes, we need to register the operator for not only dense backend but also for sparse backend.
Backends of the operator are bypassed at runtime or not. If bypassed, we ignore the operator, or we need register and implement the operator.
Regarding sparse or not, we have to check native_functions.yaml.
Regarding bypass or not, we have to check derivatives.yaml and generated file VariableTypeEverything.cpp.

Pitch

If PyTorch can expose two operator lists as a part of API to show “sparse or not” and “bypass or not”, the case by case analysis for ATen OP could be avoided when registering Ops for new device. One intuitive idea is to mark them in existing API, for instance of detailed ATen Op schema in RegistrationDeclarations.h,

{"schema": "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> Tensor", "compound": "false", "bypass": “true", "sparse": “false"}

The text was updated successfully, but these errors were encountered:

ezyang · 2020-05-26T20:49:03Z

It would be a relatively simple matter to update the codegen/dispatcher to also print out when an operator gets or doesn't get a "bypass" (we call these composite ops internally) so that you know you don't have to implement them. However, I'd prefer not to commit to a specific machine-consumable format of this output, since how composites work in PyTorch are something that we've been actively looking into changing. If you had a spreadsheet that listed everything you needed to change, similar to https://docs.google.com/spreadsheets/d/1dROnZUuR81PAvFLhvAKz23G0ZVK3pSMsBy7XDPSiQjA/edit#gid=1262884632 , would that be sufficient?

By the way, @ailzhang as part of XLA has observed to us that sometimes it is useful to be able to override composite operators, because maybe the backend has a fused version. In these situations you have to write a custom backwards as well for the operator in this situation. It would be good if the API made it easy for you to do the right thing in these situations (it doesn't, right now.)

fengyuan14 · 2020-05-27T02:17:56Z

Thanks for your explanation, @ezyang, allow me check the spreadsheet first. Seems I have no access still, though I received your invitation. Please approve my request on Google Docs.

I understand your consideration on composites. BTW, based on PyTorch v1.5, we find some non-composite ops is configured as non-bypass op, like, tanh, clamp, argmax, argmin ... We also find some non-composite ops are still bypassed. like, __and__.Scalar, __and__.Tensor, __iand__.Scalar ... So do you have plan to make all non-composite ops not bypassed gradually?

If all non-composite ops are not bypassed, it might be big help to reduce our efforts on studying ops case by case in PyTorch. Please refer what we do when we rebase PyTorch from v1.4 to v1.5.

schema	dispatch strategy change on PyTorch 1.5	reaction
and.Scalar	non-by-pass -> by-pass	remove registeration and implementation
and.Tensor	non-by-pass -> by-pass	remove registeration and implementation
iand.Scalar	non-by-pass -> by-pass	remove registeration and implementation
iand.Tensor	non-by-pass -> by-pass	remove registeration and implementation
or.Scalar	non-by-pass -> by-pass	remove registeration and implementation
or.Tensor	non-by-pass -> by-pass	remove registeration and implementation
ior.Scalar	non-by-pass -> by-pass	remove registeration and implementation
ior.Tensor	non-by-pass -> by-pass	remove registeration and implementation
tanh	by-pass -> non-by-pass	register and implement Op
clamp	by-pass -> non-by-pass	register and implement Op
argmax	by-pass -> non-by-pass	register and implement Op
argmin	by-pass -> non-by-pass	register and implement Op

ezyang · 2020-05-27T21:00:26Z

So do you have plan to make all non-composite ops not bypassed gradually?

I am not sure I understand the question. A non-composite op doesn't bypass, period. If you look above at operations where things have swapped between bypass/non-bypass, this is because we have changed internal implementation details. For example, argmin/argmax change is from #32961 and is just us fixing a bug in the codebase. Clamp change is from #37646 where migrated something from TH to ATen, which caused it to stop bypassing (so you can stop implementing _th_clamp. I'm not sure what happened with the logical operators but it was probably some other refactor.

As you can see, "what is composite" is an implementation detail, and we often change these under the assumption that it is not BC-breaking (which is true for regular users, but not for you). To stabilize things for extensions, we would have to think about what it means to actually provide a stable API here. If I have a previously composite op and I write a fused implementation for it, does that mean an extension has to now also fuse it? As @ailzhang has pointed out in private correspondence, absolutely not! We should continue to maintain the composite implementation so that it is optional for backends to implement it.

These are all good ideas we should do. But we are not really ready for it right now, which is why the experience has been bad.

ailzhang · 2020-05-27T21:42:21Z

@arthuryuan1987 To add a little more context, compound field in RegistrationDeclarations.h should be what you look at to decide whether it's go through DeviceDispatch or TypeDefault dispatch(note it's not bypassed, but dispatched to different places).
I wonder what does bypass mean in your context(and why is it different from compound?

fengyuan14 · 2020-05-28T01:09:43Z

bypass, I mean at::xxx is invoked in VariableType::xxx, not TypeDefault::xxx. So, in my understanding, bypass should mean that DeviceDispatch is bypassed. (bypass device, or backend from our view)

And in your explanation, TypeDefault is another kind of dispatch. So no op are bypassed.

compound looks good to us, if it could decide whether it's go through DeviceDispatch or not.

Thanks for your explanation, @ezyang, @ailzhang .

BTW, we also need to distinguish "dense" and "sparse" to register corresponding backend, DenseBackend, SparceBackend. Have you any suggestion on it?

ailzhang · 2020-05-28T15:11:56Z

@arthuryuan1987 Cool glad to hear that compound is also useful in your case. Btw if you build pytorch/xla from source, you'll see how XLA use this file and register ops to device dispatch. (it's codegened and not well documented....)
w.r.t to sparse/dense, I feel https://github.com/pytorch/pytorch/blob/master/c10/core/DispatchKey.h#L97-L98 might be helpful that you can register to sparse dispatch keys. But since XLA doesn't support sparse, I haven't tried it myself. @ezyang might know more about it.

ezyang · 2020-05-28T16:21:28Z

Yeah I think sparse is kind of orthogonal. Some ops support sparse. You need to know which ones they are and register them. TBH, native_functions.yaml works pretty well for this case.

fengyuan14 · 2020-05-29T00:29:54Z

Thanks, @ailzhang , we will add compound check in our codegen also.

Thanks, @ezyang , for sure, native_functions.yaml is clear enough for sparse kind. But it is not a part of API? Our practice (build from source) is generating code based on header files (API) in PyTorch installation directory. So actually, we expect sparse could be another notation in RegistrationDeclarations.h to allow us generate code automatically for those ops supported by sparse backend, not check PyTorch source directory for native_functions.yaml manually. Do you have any idea?

ezyang · 2020-06-01T13:52:01Z

RegistrationDeclarations.h is a hyperspecialized API that was built just for XLA, which at least partially may explain why it doesn't say anything about sparse. I agree that in general what ops support sparse is part of the public API, as far as backend extensions are concerned.

dzhulgakov · 2020-06-02T06:21:42Z

Btw, just pointing out a relevant link on how to dump all schemas without relying on codegen: https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/dump_all_function_schemas.py (should be easily extendable with other information like the presence of "sparse" dispatch).

This is more on tech details, we still should agree what part of this information is the actual API contract

ezyang · 2020-06-03T17:12:43Z

@pbelevich has reminded me that there is a subtle nuance to what I described above:

It would be a relatively simple matter to update the codegen/dispatcher to also print out when an operator gets or doesn't get a "bypass" (we call these composite ops internally) so that you know you don't have to implement them.

Composite != "doesn't have a dispatch entry in native_functions.yaml". If you have a dispatch entry, it is DEFINITELY not composite, but the vice versa is not true. In particular, some operators have no entry in native_functions.yaml, but they have an entry in derivatives.yaml and that is also sufficient to make them non-composite. They won't get bypassed and you can still override them.

ailzhang · 2020-06-03T18:15:09Z

@ezyang Yup compound in RegistrationDeclarations.h was created for that purpose, compound=false means we can override them while it go through TypeDefault when compound=true.

ezyang added module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 26, 2020

meehatpa mentioned this issue Feb 19, 2021

Dispatcher documentation needs update #52485

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATen registrable operator list #38974

ATen registrable operator list #38974

fengyuan14 commented May 25, 2020 •

edited

ezyang commented May 26, 2020

fengyuan14 commented May 27, 2020

ezyang commented May 27, 2020

ailzhang commented May 27, 2020

fengyuan14 commented May 28, 2020

ailzhang commented May 28, 2020

ezyang commented May 28, 2020

fengyuan14 commented May 29, 2020

ezyang commented Jun 1, 2020

dzhulgakov commented Jun 2, 2020

ezyang commented Jun 3, 2020

ailzhang commented Jun 3, 2020

ATen registrable operator list #38974

ATen registrable operator list #38974

Comments

fengyuan14 commented May 25, 2020 • edited

🚀 Feature

Motivation

Pitch

ezyang commented May 26, 2020

fengyuan14 commented May 27, 2020

ezyang commented May 27, 2020

ailzhang commented May 27, 2020

fengyuan14 commented May 28, 2020

ailzhang commented May 28, 2020

ezyang commented May 28, 2020

fengyuan14 commented May 29, 2020

ezyang commented Jun 1, 2020

dzhulgakov commented Jun 2, 2020

ezyang commented Jun 3, 2020

ailzhang commented Jun 3, 2020

fengyuan14 commented May 25, 2020 •

edited