Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATen registrable operator list #38974

Open
fengyuan14 opened this issue May 25, 2020 · 12 comments
Open

ATen registrable operator list #38974

fengyuan14 opened this issue May 25, 2020 · 12 comments
Labels
module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@fengyuan14
Copy link
Collaborator

fengyuan14 commented May 25, 2020

🚀 Feature

Expose ATen registrable operators information in ATen operator API

Motivation

When implementing a new out-of-source ATen backend extension for PyTorch, we find it is not easy to find out which ATen operators should be registered and implemented. Extension developer expect to work only based on PyTorch ATen operator API. But in fact, we have to distinguish operators in two cases as below, which depends on PyTorch implementation,

  1. The operator is supported by sparse backend or not. If yes, we need to register the operator for not only dense backend but also for sparse backend.
  2. Backends of the operator are bypassed at runtime or not. If bypassed, we ignore the operator, or we need register and implement the operator.
    Regarding sparse or not, we have to check native_functions.yaml.
    Regarding bypass or not, we have to check derivatives.yaml and generated file VariableTypeEverything.cpp.

Pitch

If PyTorch can expose two operator lists as a part of API to show “sparse or not” and “bypass or not”, the case by case analysis for ATen OP could be avoided when registering Ops for new device. One intuitive idea is to mark them in existing API, for instance of detailed ATen Op schema in RegistrationDeclarations.h,

{"schema": "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)", "compound": "false", "bypass": "false", "sparse": "true"}
{"schema": "aten::batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> Tensor", "compound": "false", "bypass": “true", "sparse": “false"}

@ezyang ezyang added module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 26, 2020
@ezyang
Copy link
Contributor

ezyang commented May 26, 2020

It would be a relatively simple matter to update the codegen/dispatcher to also print out when an operator gets or doesn't get a "bypass" (we call these composite ops internally) so that you know you don't have to implement them. However, I'd prefer not to commit to a specific machine-consumable format of this output, since how composites work in PyTorch are something that we've been actively looking into changing. If you had a spreadsheet that listed everything you needed to change, similar to https://docs.google.com/spreadsheets/d/1dROnZUuR81PAvFLhvAKz23G0ZVK3pSMsBy7XDPSiQjA/edit#gid=1262884632 , would that be sufficient?

By the way, @ailzhang as part of XLA has observed to us that sometimes it is useful to be able to override composite operators, because maybe the backend has a fused version. In these situations you have to write a custom backwards as well for the operator in this situation. It would be good if the API made it easy for you to do the right thing in these situations (it doesn't, right now.)

@fengyuan14
Copy link
Collaborator Author

Thanks for your explanation, @ezyang, allow me check the spreadsheet first. Seems I have no access still, though I received your invitation. Please approve my request on Google Docs.

I understand your consideration on composites. BTW, based on PyTorch v1.5, we find some non-composite ops is configured as non-bypass op, like, tanh, clamp, argmax, argmin ... We also find some non-composite ops are still bypassed. like, __and__.Scalar, __and__.Tensor, __iand__.Scalar ... So do you have plan to make all non-composite ops not bypassed gradually?

If all non-composite ops are not bypassed, it might be big help to reduce our efforts on studying ops case by case in PyTorch. Please refer what we do when we rebase PyTorch from v1.4 to v1.5.

schema dispatch strategy change on PyTorch 1.5 reaction
and.Scalar non-by-pass -> by-pass remove registeration and implementation
and.Tensor non-by-pass -> by-pass remove registeration and implementation
iand.Scalar non-by-pass -> by-pass remove registeration and implementation
iand.Tensor non-by-pass -> by-pass remove registeration and implementation
or.Scalar non-by-pass -> by-pass remove registeration and implementation
or.Tensor non-by-pass -> by-pass remove registeration and implementation
ior.Scalar non-by-pass -> by-pass remove registeration and implementation
ior.Tensor non-by-pass -> by-pass remove registeration and implementation
tanh by-pass -> non-by-pass register and implement Op
clamp by-pass -> non-by-pass register and implement Op
argmax by-pass -> non-by-pass register and implement Op
argmin by-pass -> non-by-pass register and implement Op

@ezyang
Copy link
Contributor

ezyang commented May 27, 2020

So do you have plan to make all non-composite ops not bypassed gradually?

I am not sure I understand the question. A non-composite op doesn't bypass, period. If you look above at operations where things have swapped between bypass/non-bypass, this is because we have changed internal implementation details. For example, argmin/argmax change is from #32961 and is just us fixing a bug in the codebase. Clamp change is from #37646 where migrated something from TH to ATen, which caused it to stop bypassing (so you can stop implementing _th_clamp. I'm not sure what happened with the logical operators but it was probably some other refactor.

As you can see, "what is composite" is an implementation detail, and we often change these under the assumption that it is not BC-breaking (which is true for regular users, but not for you). To stabilize things for extensions, we would have to think about what it means to actually provide a stable API here. If I have a previously composite op and I write a fused implementation for it, does that mean an extension has to now also fuse it? As @ailzhang has pointed out in private correspondence, absolutely not! We should continue to maintain the composite implementation so that it is optional for backends to implement it.

These are all good ideas we should do. But we are not really ready for it right now, which is why the experience has been bad.

@ailzhang
Copy link
Contributor

@arthuryuan1987 To add a little more context, compound field in RegistrationDeclarations.h should be what you look at to decide whether it's go through DeviceDispatch or TypeDefault dispatch(note it's not bypassed, but dispatched to different places).
I wonder what does bypass mean in your context(and why is it different from compound?

@fengyuan14
Copy link
Collaborator Author

bypass, I mean at::xxx is invoked in VariableType::xxx, not TypeDefault::xxx. So, in my understanding, bypass should mean that DeviceDispatch is bypassed. (bypass device, or backend from our view)

And in your explanation, TypeDefault is another kind of dispatch. So no op are bypassed.

compound looks good to us, if it could decide whether it's go through DeviceDispatch or not.

Thanks for your explanation, @ezyang, @ailzhang .

BTW, we also need to distinguish "dense" and "sparse" to register corresponding backend, DenseBackend, SparceBackend. Have you any suggestion on it?

@ailzhang
Copy link
Contributor

@arthuryuan1987 Cool glad to hear that compound is also useful in your case. Btw if you build pytorch/xla from source, you'll see how XLA use this file and register ops to device dispatch. (it's codegened and not well documented....)
w.r.t to sparse/dense, I feel https://github.com/pytorch/pytorch/blob/master/c10/core/DispatchKey.h#L97-L98 might be helpful that you can register to sparse dispatch keys. But since XLA doesn't support sparse, I haven't tried it myself. @ezyang might know more about it.

@ezyang
Copy link
Contributor

ezyang commented May 28, 2020

Yeah I think sparse is kind of orthogonal. Some ops support sparse. You need to know which ones they are and register them. TBH, native_functions.yaml works pretty well for this case.

@fengyuan14
Copy link
Collaborator Author

Thanks, @ailzhang , we will add compound check in our codegen also.

Thanks, @ezyang , for sure, native_functions.yaml is clear enough for sparse kind. But it is not a part of API? Our practice (build from source) is generating code based on header files (API) in PyTorch installation directory. So actually, we expect sparse could be another notation in RegistrationDeclarations.h to allow us generate code automatically for those ops supported by sparse backend, not check PyTorch source directory for native_functions.yaml manually. Do you have any idea?

@ezyang
Copy link
Contributor

ezyang commented Jun 1, 2020

RegistrationDeclarations.h is a hyperspecialized API that was built just for XLA, which at least partially may explain why it doesn't say anything about sparse. I agree that in general what ops support sparse is part of the public API, as far as backend extensions are concerned.

@dzhulgakov
Copy link
Collaborator

Btw, just pointing out a relevant link on how to dump all schemas without relying on codegen: https://github.com/pytorch/pytorch/blob/master/test/backward_compatibility/dump_all_function_schemas.py (should be easily extendable with other information like the presence of "sparse" dispatch).

This is more on tech details, we still should agree what part of this information is the actual API contract

@ezyang
Copy link
Contributor

ezyang commented Jun 3, 2020

@pbelevich has reminded me that there is a subtle nuance to what I described above:

It would be a relatively simple matter to update the codegen/dispatcher to also print out when an operator gets or doesn't get a "bypass" (we call these composite ops internally) so that you know you don't have to implement them.

Composite != "doesn't have a dispatch entry in native_functions.yaml". If you have a dispatch entry, it is DEFINITELY not composite, but the vice versa is not true. In particular, some operators have no entry in native_functions.yaml, but they have an entry in derivatives.yaml and that is also sufficient to make them non-composite. They won't get bypassed and you can still override them.

@ailzhang
Copy link
Contributor

ailzhang commented Jun 3, 2020

@ezyang Yup compound in RegistrationDeclarations.h was created for that purpose, compound=false means we can override them while it go through TypeDefault when compound=true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants