-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overload names to native_functions.yaml #23532
Conversation
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75630718 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of having overload names when we do not actually have more than one implementation of a method. Very few operators are overloaded. If everything were overloaded, we would just mangle the types automatically as this patch semi-manually does. The intention is for the overload name to be semantically meaningful to distinguish it from other overloads. A string like TTiTTTTTTTiiibfbbiTTb
doesn't help with understanding.
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75630718 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75630718 Pull Request resolved: #23532 ghstack-source-id: 87387510 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
@zdevito Ok, I changed it to only add overload names for the ops that actually have overloads. It's about 1.400 ops, which is >50%. It's not feasible to find semantic names for all of these, so I'd keep with the generated names for now. |
@smessmer - I think vast majority of >1 overload can be dealt with by diagnosing the _out version of the op. Another big bucket is Tensor vs Scalar argument type. Also, do we allow "default" overload without the overload name? P.S. Can you remind me why we wanted to do it? I recall the way of registering the ops and verifying changes in schema. That applies less to native_function.yaml and we could probably add the mangling you have directly to the codegen script. But if the names are more meaningful, it's nice to include them here. |
I agree about the "out" variant. From looking at it almost all the overloads come from that. I would also prefer that the suffix only include the arguments that are different across the overloads. I think with that change it should become feasible to give the rest semantically meaningful names. |
Yes, the reason is for verifying schema changes in registration and it is less important for native_functions.yaml. We could generate the mangling in the codegen, but that would mean that we introduce a difference between the function schema in native_functions.yaml and the one used when it's registered in jit. I think that's the wrong direction - we want to unify these schemas, not make them different. What do you mean about diagnosing the out variant? Give them an "inplace" overload name? |
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75659751 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
After taking care of the out variants and simple Tensor/Scalar differences, there's now 357 mangled overloads left. |
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75659751 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
I think I was expecting that overload names could have semantic meaning for each overload. This is possible for Tensor/Scalar and Out variants (btw, inplace is the wrong word to use for those add_ is inplace because it modifies self, the out variants do not modify self). Tagging anything with a mangled name is going to be a problem for the future. We can't change the overload name without breaking BC, but we will want to add additional default arguments. Once we do this then the overload name is not even going to match the mangling scheme. So we either need to:
There are other ways we can catch typos. For instance, if a user registers an op and marks it an overload, we could raise an error if it isn't overloading something loaded from aten. |
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75896844 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75630718 Pull Request resolved: #23532 ghstack-source-id: 87458034 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
ok new approach: The script now (P75976311):
This seems to produce relatively good semantic names for the overloads. @zdevito @dzhulgakov please take another look |
Why can't we just do the |
@apaszke We need overload names for several things. Error checking is one of them but not the only one. Another reason is that we don't want mobile to have to resolve overloads, so we need to serialize a model that has overloads already resolved. I think there've been a few other reasons but the decision for overload names happened some time ago and I don't remember everything. The functionality for overload names is already part of the system for some time and there's things relying on it, but we didn't actually add names for overloads of ATen ops before. This PR now catches up with the truth and adds overload names to existing ATen ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot better now, thanks! I have a few minor things we should fix that I mentioned inline, but the overload names are now much more descriptive about the differences between the operators.
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75896844 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P75896844 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
Why can't we use full signatures to uniquely identify those ops then? |
We need this to be able to register them with the c10 dispatcher. The overload names are based on one-letter-per-argument-type. Script used to change native_functions.yaml and derivatives.yaml: P76270106 Differential Revision: [D16553437](https://our.internmc.facebook.com/intern/diff/D16553437/)
This pull request has been merged in 02f794b. |
This should not have been merged. native_functions is something our OSS contributors frequently touch, and this modifies the behavior without any explanation or documentation that someone not looking at this PR can find. See https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md. |
A few other issues:
Updating our understanding of why we are doing the changes is necessary; we can't ever get rid of things if we don't remember why we did them in the first place.
|
I added documentation: #23844 Backends registering a new kernel for an overload want to use a shorthand syntax that doesn't require them to specify the full schema:
This looks identical to a backend adding a new overload:
The only difference being that in one case the overload was already registered before, in the other it wasn't. Without overload names, there would be no way for us to know if they intended to add a new overload or intended to add a new kernel to an existing one, so we would have to hope the behavior is correct and couldn't error out. Marking overloads like @zdevito proposed above with an API like this:
or
would be able to differentiate between these, but make the API harder to use and it actually doesn't work because we don't know which static initializers are going to be run first - C++ doesn't guarantee that the registration adding the overload is run before registrations adding kernels for it. Also, as mentioned, the decision to go with overload names happened when the c10 dispatcher was designed some months ago in design discussions with @zdevito, @dzhulgakov and many other people. Changing that now would require changing how the c10 dispatcher works. Let's avoid boiling the ocean. Error checking is implemented in the c10 dispatcher. I'm working on a stack of PRs that registers all ATen ops with c10 and c10 will balk if there's an operator with non-unique overload names. This is actually the reason why this PR came now, long after we decided to go with overload names in c10: Before, these ops weren't in c10 and soon they're going to be. Operators that don't have overloads don't need overload names. Nobody is expected to run my script. If, between now and the time I am able to add the c10 registration, people manage to add overloads and forget about overload names, I will fix them. |
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
If you registered both functions and only said that they're an implementation of "my::op", then how would the dispatch system pick one over the other? This seems highly ambiguous to me and I don't understand the example.
This doesn't seem like a very strong argument. In particular you don't have to do the checking immediately when the overloads are registered, but you can put it off until a later time (e.g. after all stdlib operators are loaded or when an op is used for the first time). |
We should address the documentation concerns for having overload names, but generally I am mildly in favor of having semantically named overloads. I think error checking with overload names is really a secondary justification for why having overload names is good. The primary reason for having them is so that once we have resolved the overload (either dynamically in python, or statically in TorchScript), we can provide a unique name for the result. This makes a few things easier:
That said, in both cases, we can instead refer to the full schema, with arguments, to unambiguously name overloads. This seems more fragile to me but not in ways I completely understand yet. |
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Differential Revision: [D16629907](https://our.internmc.facebook.com/intern/diff/D16629907/) Extending changes from #23532
Summary: Pull Request resolved: pytorch#23748 This extends the changes from pytorch#23532 ghstack-source-id: 88157704 Differential Revision: D16629907 fbshipit-source-id: ffcf937ec34a798a971e7d28ad85afb3b646d1fe
Summary: Pull Request resolved: pytorch/pytorch#23748 This extends the changes from pytorch/pytorch#23532 ghstack-source-id: 88157704 Differential Revision: D16629907 fbshipit-source-id: ffcf937ec34a798a971e7d28ad85afb3b646d1fe
This is a manual form of type mangling with zero restrictions on symbol choice. It doesn't seem maintainable long term. Alternate solutionsThere might be some less invasive ways to solve some of the problems I found mentioned in this conversation:
totally specifies the schema if implemented to do so.
This seems like it would be well solved by a mangling scheme, which would be maintained in the same way the variant type IValue is maintained. This would elide the need for introducing changes to schemas and manually writing custom tags for only certain ops.
I'm honestly not clear on what the issue is here. It seems like there may be perf concerns doing these lookups? I think we should probably benchmark that and worst comes to worst do something like interning the schema strings to speed stuff up. Other issuesWith respect to this actual change, I think there are some issues that might arise long term.
|
Stack from ghstack:
We need this to be able to register them with the c10 dispatcher.
The overload names are based on one-letter-per-argument-type.
Script used to change native_functions.yaml and derivatives.yaml: https://gist.github.com/dzhulgakov/e64b03ed38c7b530c65992a8318e7332
Differential Revision: D16553437