-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Make c10 dispatcher use boxed kernel function pointers #16051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Differential Revision: D13684517 Differential Version: 68934004
Differential Revision: D13684518 Differential Version: 68934003
Differential Revision: D13684761 Differential Version: 68936140
Differential Revision: D13684518 Differential Version: 68936279
Differential Revision: D13684518 Differential Version: 68969226
Differential Revision: D13684517 Differential Version: 69048668
Differential Revision: D13684761 Differential Version: 69048670
Differential Revision: D13684518 Differential Version: 69048667
Differential Revision: D13684518 Differential Version: 69070202
Differential Revision: D13684518 Differential Version: 69168735
Differential Revision: D13684518 Differential Version: 69170129
Differential Revision: D13684518 Differential Version: 69197627
Differential Revision: D13684518 Differential Version: 69198028
| namespace c10 { | ||
|
|
||
| // TODO Use folly::Function for perf | ||
| using KernelFunction = std::function<IValue(ArrayRef<IValue>)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I think neither folly::function nor std::function should be used here. These functions shouldn't be closures right? They don't have any state or closed over variables; we pass in state as an explicit argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm looking at your implementation, and it seems you are making a closure for KernelFunction. But it seems unnecessary to me, because you're closing over a function pointer which is known at compile time when you call kernel(). So it seems to me that it should be made some sort of compile-time template argument. Then you won't have a closure at all and can just take an address of the template instantiated lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds like a cool idea, thanks. Will try it out in a separate diff. Getting rid of the closure would avoid some of the perf risks here.
| inline TensorParameterDispatchKey tensor_to_dispatch_key(const at::Tensor& tensor) { | ||
| return TensorParameterDispatchKey{ | ||
| to_device_type_id(tensor.impl()->device_type()), | ||
| to_device_type_id(tensor.getIntrusivePtr()->device_type()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you're using at::Tensor, no need to go through the TensorImpl to get the device type, just say tensor.device().type()
| } | ||
|
|
||
| template<size_t index, size_t offset, class ParameterTypes, class Enable = void> struct get_ith_tensor_arg_ { | ||
| //static_assert(!std::is_same<ParameterTypes, ParameterTypes>::value, "Index out of bounds"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
| Tensor C(C_); | ||
| Tensor A{C10Tensor(A_)}; | ||
| Tensor B{C10Tensor(B_)}; | ||
| Tensor C{C10Tensor(C_)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested we should make some free functions to_caffe2_tensor and to_at_tensor to do this conversion. This is a really good reason to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. Currently, C10Tensor is the only way we have to converting tensors though, so I'll keep it here. Removing C10Tensor and adding a different way for conversion is a separate workstream.
| at::ArrayRef<C10Tensor> inputs, | ||
| const C10Tensor& output_, | ||
| const C10Tensor& split_, | ||
| intrusive_ptr<ivalue::TensorList> inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird to me.
Based on our discussion, we said there were two plausible ways operators could be written:
- Taking a
Stackas argument, in which case they feed in and out ivalues - The "native" format, which "looks something like how ATen functions are written."
What we have here is some weird bastard format. It isn't Stack. But it isn't native either, because you're taking in an ivalue::TensorList. No one should actually write a kernel this way; they should just take ArrayRef<Tensor>. What's the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me where the test for these experimental ops lives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I should change this to ArrayRef<Tensor>. There are currently only tests for layer_norm, the other ops are only there to make sure certain use cases still compile (e.g. operators with/without state, operators with/without attributes, operators with/without TensorList as input, single/multiple outputs, ...), but they don't have test cases, so they only check it's syntactically possible.
| }; | ||
| } | ||
| }; | ||
| template<class... ParamTypes, class FuncType> struct _wrapKernel<void, guts::typelist::typelist<ParamTypes...>, FuncType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you actually make it take a Stack, you won't need two overloads, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with comments
Differential Revision: D13684518 Differential Version: 69269648
Summary: Pull Request resolved: pytorch/pytorch#16051 This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*. Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`. Reviewed By: ezyang Differential Revision: D13684518 fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
Stack:
:black_circle: #16051 Make c10 dispatcher use boxed kernel function pointers 💚
:white_circle: #16065 Pass IValue from c10 dispatcher to caffe2 operator 💚
:white_circle: #16066 Pass IValues from JIT to c10 dispatcher 💛
:white_circle: #16165 Avoid closure around kernel 💛
:white_circle: #16166 Make kernel registration constexpr again 💛
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.
Note that KernelFunction is currently taking an
ArrayRef<IValue>as arguments. A later diff will change that to it taking aStack*.Differential Revision: D13684518