New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dispatch-less structured wrapper / composite / alias kernels #50953
Comments
getting it to C++ would be easy, but not sure if that's a better solution. You'd have to specify the interface they'd have to be using. An example could be to ask them to construct a struct that contains references to all the arguments and has a method for each target-argument: struct UpsampleNearest1d {
UpsampleNearest1d(const Tensor& input, c10::optional<IntArrayRef> output_size, c10::optional<ArrayRef<double>> scale_factors) {...}
std::array<int64_t> output_size() const {
return compute_output_size(input_.sizes(), output_size_, scale_factors_);
}
c10::optional<double> scales() const {
return get_scale_value(scale_factors, 0);
}
private:
const Tensor& input_;
c10::optional<IntArrayRef>& output_size_; // reference makes sure we don't copy it. Likely unnecessary for ArrayRef, but nice for other types
c10::optional<ArrayRef<double>> scale_factors_;
}; |
@smessmer yeah, I'm trying hard not to put the arguments into the struct itself, we sort of know that compiler is bad at optimizing this case (you're forcing it to actually allocate and construct memory for the arguments) |
A simpler case of this is alias kernels, which don't require any C++ code |
Another interesting situation is pow_Scalar (maybe not truly related to this issue though):
|
I was thinking about this issue, and wondered why not, instead of device namespaces, make them into structs? Then, we could easily replace dispatches with automatic code generation by templates. For example, consider the following kernel: Correct me if I'm missing anything, but the problem is the redispatch in the last line: struct cpu {
static Tensor upsample_nearest1d(const Tensor& self, c10::optional<IntArrayRef> output_size, c10::optional<ArrayRef<double>> scale_factors);
};
struct cuda {
static Tensor upsample_nearest1d(const Tensor& self, c10::optional<IntArrayRef> output_size, c10::optional<ArrayRef<double>> scale_factors);
}; Then, templates could help us solve this problem by making template <class DEV>
Tensor upsample_nearest1d(const Tensor& self, c10::optional<IntArrayRef> output_size, c10::optional<ArrayRef<double>> scale_factors) {
auto osize = compute_output_size(self.sizes(), output_size, scale_factors);
auto scale_w = get_scale_value(scale_factors, 0);
return DEV::upsample_nearest1d(self, osize, scale_w);
} As a plus, that would preserve the way of explicitly calling kernels What do you think? |
This might be a case of "perfect being the enemy of good", but I'm kind of not that keen on a template based approach because (1) templates suck (e.g., you get no typechecking until the template gets instantiated) and (2) there are a bunch of auxiliary issues (such as overloading the meaning of integers, running this code in the Python interpreter, handling out and functional simultaneously) that can't be easily solved with templates. I agree that a template style approach would be relatively easy to implement and would solve the immediate problem. |
It's worth elaborating on the auxiliary issues. A good start is looking at this post https://dev-discuss.pytorch.org/t/where-we-are-headed-and-why-it-looks-a-lot-like-julia-but-not-exactly-like-julia/276 which lays out at a high level what some of the challenges we're facing are. If my constraint was ONLY that I wanted to avoid extra dispatch, I think templates would probably be the right way to go (indeed, it's basically the only way to do it). But I also (eventually) want to be able to trace through the code in this setting symbolically, overloading the meaning of all types (not just Tensor, which is the only type we can do in C++). If I write What I kind of want, but haven't convinced myself is the right thing to do yet, is build some sort of mini-DSL (Python-like, ofc) for writing composite kernels which can compile to C++, but can also be directly run by the Python interpreter. I don't exactly know how it should work in the terminal state, but I know that at least for small examples it should be feasible to do. |
Based on discussions with @mruberry, we're nuking the Python mini-DSL, so I think templates are the way to go now. |
OK, so here's a proposal for how to do this. It's actually a pair of proposals, one that it is simple but a bit boilerplatey, and another that takes more advantage of structured kernels. Non-structured composites. We'll start off with @ysiraichi's proposed template syntax for writing these composites. These will go in headers like
Yukio originally suggested that we put the operator namespaces into structs, so we can pass them directly. I don't want to do this, because @peterbell10 has been doing a lot of good work in reducing the amount of recompilation we have to do when unrelated function signatures change, and passing a record with ALL of the operators would go against this goal. But remember, we're in codegen world: so we can just generate a struct on the fly of exactly the functions we need, and trust in inliner to remove the indirections. This implies we must explicitly list what operators the composite depend on in native_functions.yaml:
This will give us the following structs (in separate files) which we can now instantiate the template with:
Each instantiation gets registered as the CPU/CUDA/Meta/CompositeImplicitAutograd keys respectively. The operators are generated using CppSignatureGroup so they exactly match the corresponding at::cpu/at::cuda/at:: APIs. That's it. @ysiraichi are you interested in implementing any of this? (structured in next comment) |
Structured composites. A big downside of the formulation above is you have to write functional/inplace/out composites. So I was wondering what it might look like to mash up this feature with structured kernels. Here are some things that aren't quite right: define only the functional composite or the out composite, and try to derive the other from it.
One idea is to pass in separate structs for the non-tail and tail positions. Sub would look something like this:
OUT is no longer a struct of static methods; it is an actual object which we will use to pass out the return result if we are functional. Now we have two variants of OUT for functional and out:
The generated inplace may not be optimal; if there are multiple ops involved, it may have been optimal to do multiple inplaces, but we don't consider this for now. One downside to this proposal is that resulting kernels are not "really" structured kernels; e.g., if you want to write a traditional structured kernel, you need to still write a TORCH_META_FUNC, we cannot derive it from the composite (why not? Well we could write a basic one, but meta funcs can also compute intermediate values and setup auxiliary structs like TensorIterator, and it's not clear if you had wanted those things for your traditional structured kernel). Another possibility for structured kernels is to make it possible to directly call TORCH_IMPL_FUNCs. This can be "simulated" with DispatchStub, which is how I implemented sub in terms of add in #65851
Instead of requiring a stub, we could instead let a structured kernel inherit from the other structured kernel that it wants to invoke:
requiring you to be intimately familiar with how the target structured kernel is written, but maybe that is not too much to ask. This formulation works bets if you only want to call one function, as if you want to call multiple (a true composite) we would need to once again template this function over OPS to let you directly call other implementations without routing through the dispatcher. |
It seems useful to try to figure out the "multiple ops decomposition" case (if it's not too hard), since there are probably a lot more aten ops that fit that pattern.
I guess this would generalize to multiple ops pretty easily (you tell the yaml up front what all of the ops are in the decomposition, and we fill the struct with all of their dispatcher-less implementations). That could be kind of cumbersome, but one really nice benefit is that it would start to give us a real source of truth for |
Definitely! |
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: 1bec053b9c4af9c568a8ec4feb42fb8ac10ed925 Pull Request resolved: #77484
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: 1269877693a62cadadee6812578f404d5f39fc8c Pull Request resolved: #77484
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: e9f581aa83f8be538cbf42ff1b423ea96ab3ff45 Pull Request resolved: #77484
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: b37bf82b8bfaad093017bfdf4a848375871bb84c Pull Request resolved: #77484
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) Pull Request resolved: #77484 Approved by: https://github.com/bdhirsh
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: 5b8617455c8743bbd1c7035eb4c9b7d21032196f Pull Request resolved: #77484
@bdhirsh @ezyang
namespace dispatchless {
template <DeviceEnum DEV> // (1)
Tensor add(Tensor self, Tensor other, Scalar alpha) {
return at::add(self, other, alpha); // fallback is the dispatcher call.
}
// (2)
template <>
Tensor add<DeviceEnum::CPU>(Tensor self, Tensor other, Scalar alpha) {
return at::cpu::add(self, other, alpha); // dispatch-less call to the CPU kernel.
}
template <>
Tensor add<DeviceEnum::Meta>(Tensor self, Tensor other, Scalar alpha) {
return at::meta::add(self, other, alpha); // dispatch-less call to the Meta kernel.
}
}
namespace native {
// (3)
template <DeviceEnum DEV>
Tensor add_one(Tensor self) {
return at::dispatchless::add<DEV>(self, at::dispatchless::ones<DEV>(self.sizes(), self.options())); // (4)
}
// (5)
template Tensor add_one<DeviceEnum::CPU>(Tensor self);
template Tensor add_one<DeviceEnum::CUDA>(Tensor self);
template Tensor add_one<DeviceEnum::Meta>(Tensor self);
}
Let me know what you think. Edit: we would also need to tweak codegen a bit for accepting templated functions as native kernels. |
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header ghstack-source-id: 5e124dc1fe39985331b70c9f5d3d032a29d9f1a1 Pull Request resolved: #77484
…osite kernels." Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Fix #50953 This PR introduces changes in the codegen for generating dispatch-less composite kernels. Summarizing, the idea is to make use of templates as the namespace source for tensor operations. Then, we only have to generate a struct (the namespace) with the required operations. Here's a summary with the main changes in this PR: - `RegisterDispatchKey.cpp` - Add `composite_headers` as code template variable - `model.py` - Add `composite` as a key in `native_functions.yaml` - Set of `OperatorName`, indicating the dependent operations - Add `CompositeGraph` type alias - Make `BackendIndex.get_kernel` return a new `BackendMetadata` for generated dispatch-less composite kernels - Add `BackendIndex.has_registered_kernel` to check whether a given tuple of `NativeFunction` and `NativeFunctionsGroup` is registered to the given dispatch key - Shortcut for dealing with both structured and unstructured kernels - Add methods for generating the struct and kernel name of the generated dispatch-less composite kernel - Needed for returning the `BackendMetada` - `gen.py` - Build the composite graph with `get_composite_graph` - Representation of dependency for dispatch-less composite kernels - Mapping of `OperatorName` with a list of tuples (one for each dependent kernel) of `NativeFunction` and `NativeFunctionGroup` - Collect a set of `#include <ATen/native/composite/op.h>` headers with `get_composite_headers` - `register_dispatch_key.py` - Add `composite_graph` as a field of `RegisterDispatchKey` class - Generate the struct for that dispatch key - `native_functions.py` - Skip the generation of `op_native.h` header for dispatch-less composite kernels - They are already defined in their respective `ATen/native/composite/op.h` header Differential Revision: [D36934643](https://our.internmc.facebook.com/intern/diff/D36934643) [ghstack-poisoned]
Side note - sorry for the delay in landing the existing version of the PR. The internal failures should be finally cleaned up. On the suggestion: not having to manually specify what ops are in the decomposition seems better, especially since it doesn't look like native_functions.yaml would be the source of truth for decomp info in the long run (lots of or decomps are getting written in python). The extra template boilerplate seems minimal (especially since we can macro-ify the template instantiations like you mentioned), so I agree this feels net better. Curious what Ed thinks though |
If we're pivoting off of the original plan, I'd like to quash this issue and associated PRs entirely. @ysiraichi, as I've discussed with you, incremental improvements on the C++ implementations is really not aligned with the current direction (which is PrimTorch implementations in Python, with an overhead reduction backend). I would rather we spend our time and effort on the overhead reduction backend for dynamo instead. |
Yes! I would argue that those decompositions serve a different purpose + are not to be used by anyone but the codegen (i.e. noone should look at it)
Got it. That idea came up in my dispatch-less kernels presentation. I just thought it was an interesting way to avoid declaring dependent operation in I agree with Ed in that we should "spend our time and effort on the overhead reduction backend for dynamo instead". So, I would say we could leave this idea as an improvement for dispatch-less kernels if later needed. i.e. I will keep focused on the overhead reduction backend for dynamo and, if necessary, we can go back to this afterwards. |
A common pattern in PyTorch is to have two implementations of a function which have different signatures:
Typically, one of these functions is implemented in terms of the other:
Now, there is a very irritating problem with
upsample_nearest1d
as it is written here, which is that it necessitates two dispatches: once to the wrapper function (shown), and then once again when we callat::upsample_nearest1d
. Alternately, we could write multiple copies of the wrapper function and bypass the second dispatch (using #49505):But this is irritating, and in the worst case scenario needs to be done per backend (CPU, CUDA) and per variant (out, functional, inplace). Oof!
What you would like to do, instead, is describe how to transform the (functional) input arguments from the wrapper function to the real function, and then automatically generate all of the variants.
It's a little uncertain to me what the parameters of this transformation should be. The easiest way to implement the transformation is to insert C++ code directly into
native_functions.yaml
, and then generate the multiple copies directly based on this code.This would set a new precedent that it is OK to put C++ code inside native_functions.yaml. Maybe you do not like it, and would like the conversion code to live in C++. Unfortunately, I'm not too sure how to do this: recall that the class hierarchy looks like:
There is no logical place to interpose an adapter in the class hierarchy here.
cc @ezyang @bhosmer @smessmer @ljk53 @bdhirsh @ailzhang
The text was updated successfully, but these errors were encountered: