Skip to content

Conversation

@smessmer
Copy link
Contributor

@smessmer smessmer commented Jan 15, 2019

Stack:
    :black_circle:  #16051 Make c10 dispatcher use boxed kernel function pointers  💚
    :white_circle:  #16065 Pass IValue from c10 dispatcher to caffe2 operator  💚
    :white_circle:  #16066 Pass IValues from JIT to c10 dispatcher  💛
    :white_circle:  #16165 Avoid closure around kernel  💛
    :white_circle:  #16166 Make kernel registration constexpr again  💛

This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an ArrayRef<IValue> as arguments. A later diff will change that to it taking a Stack*.

Differential Revision: D13684518

Differential Revision: D13684517
Differential Version: 68934004
Differential Revision: D13684518
Differential Version: 68934003
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jan 15, 2019
Differential Revision: D13684761
Differential Version: 68936140
@smessmer smessmer changed the base branch from export-D13684517 to export-D13684761 January 16, 2019 00:09
Differential Revision: D13684518
Differential Version: 68936279
Differential Revision: D13684518
Differential Version: 68969226
Differential Revision: D13684517
Differential Version: 69048668
Differential Revision: D13684761
Differential Version: 69048670
Differential Revision: D13684518
Differential Version: 69048667
Differential Revision: D13684518
Differential Version: 69070202
Differential Revision: D13684518
Differential Version: 69168735
Differential Revision: D13684518
Differential Version: 69170129
Differential Revision: D13684518
Differential Version: 69197627
@smessmer smessmer changed the base branch from export-D13684761 to master January 18, 2019 02:05
Differential Revision: D13684518
Differential Version: 69198028
@smessmer smessmer changed the title [wip] Make c10 dispatcher use boxed kernel function pointers Make c10 dispatcher use boxed kernel function pointers Jan 18, 2019
namespace c10 {

// TODO Use folly::Function for perf
using KernelFunction = std::function<IValue(ArrayRef<IValue>)>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I think neither folly::function nor std::function should be used here. These functions shouldn't be closures right? They don't have any state or closed over variables; we pass in state as an explicit argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm looking at your implementation, and it seems you are making a closure for KernelFunction. But it seems unnecessary to me, because you're closing over a function pointer which is known at compile time when you call kernel(). So it seems to me that it should be made some sort of compile-time template argument. Then you won't have a closure at all and can just take an address of the template instantiated lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a cool idea, thanks. Will try it out in a separate diff. Getting rid of the closure would avoid some of the perf risks here.

inline TensorParameterDispatchKey tensor_to_dispatch_key(const at::Tensor& tensor) {
return TensorParameterDispatchKey{
to_device_type_id(tensor.impl()->device_type()),
to_device_type_id(tensor.getIntrusivePtr()->device_type()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're using at::Tensor, no need to go through the TensorImpl to get the device type, just say tensor.device().type()

}

template<size_t index, size_t offset, class ParameterTypes, class Enable = void> struct get_ith_tensor_arg_ {
//static_assert(!std::is_same<ParameterTypes, ParameterTypes>::value, "Index out of bounds");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Tensor C(C_);
Tensor A{C10Tensor(A_)};
Tensor B{C10Tensor(B_)};
Tensor C{C10Tensor(C_)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested we should make some free functions to_caffe2_tensor and to_at_tensor to do this conversion. This is a really good reason to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. Currently, C10Tensor is the only way we have to converting tensors though, so I'll keep it here. Removing C10Tensor and adding a different way for conversion is a separate workstream.

at::ArrayRef<C10Tensor> inputs,
const C10Tensor& output_,
const C10Tensor& split_,
intrusive_ptr<ivalue::TensorList> inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird to me.

Based on our discussion, we said there were two plausible ways operators could be written:

  • Taking a Stack as argument, in which case they feed in and out ivalues
  • The "native" format, which "looks something like how ATen functions are written."

What we have here is some weird bastard format. It isn't Stack. But it isn't native either, because you're taking in an ivalue::TensorList. No one should actually write a kernel this way; they should just take ArrayRef<Tensor>. What's the plan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me where the test for these experimental ops lives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I should change this to ArrayRef<Tensor>. There are currently only tests for layer_norm, the other ops are only there to make sure certain use cases still compile (e.g. operators with/without state, operators with/without attributes, operators with/without TensorList as input, single/multiple outputs, ...), but they don't have test cases, so they only check it's syntactically possible.

};
}
};
template<class... ParamTypes, class FuncType> struct _wrapKernel<void, guts::typelist::typelist<ParamTypes...>, FuncType> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you actually make it take a Stack, you won't need two overloads, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments

Differential Revision: D13684518
Differential Version: 69269648
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jan 19, 2019
Summary:
Pull Request resolved: pytorch/pytorch#16051
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`.

Reviewed By: ezyang

Differential Revision: D13684518

fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
@soumith soumith deleted the export-D13684518 branch February 21, 2019 23:30
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants