Make c10 dispatcher use boxed kernel function pointers #16051

smessmer · 2019-01-15T23:56:39Z

Stack:
    :black_circle: #16051 Make c10 dispatcher use boxed kernel function pointers  💚
    :white_circle: #16065 Pass IValue from c10 dispatcher to caffe2 operator  💚
    :white_circle: #16066 Pass IValues from JIT to c10 dispatcher  💛
    :white_circle: #16165 Avoid closure around kernel  💛
    :white_circle: #16166 Make kernel registration constexpr again  💛

This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an ArrayRef<IValue> as arguments. A later diff will change that to it taking a Stack*.

Differential Revision: D13684518

Differential Revision: D13684517 Differential Version: 68934004

Differential Revision: D13684518 Differential Version: 68934003

Differential Revision: D13684761 Differential Version: 68936140

Differential Revision: D13684518 Differential Version: 68936279

Differential Revision: D13684518 Differential Version: 68969226

Differential Revision: D13684517 Differential Version: 69048668

Differential Revision: D13684761 Differential Version: 69048670

Differential Revision: D13684518 Differential Version: 69048667

Differential Revision: D13684518 Differential Version: 69070202

Differential Revision: D13684518 Differential Version: 69168735

Differential Revision: D13684518 Differential Version: 69170129

Differential Revision: D13684518 Differential Version: 69197627

Differential Revision: D13684518 Differential Version: 69198028

ezyang · 2019-01-18T18:13:19Z

aten/src/ATen/core/dispatch/OpSchema.h

 namespace c10 {

+// TODO Use folly::Function for perf
+using KernelFunction = std::function<IValue(ArrayRef<IValue>)>;


Err, I think neither folly::function nor std::function should be used here. These functions shouldn't be closures right? They don't have any state or closed over variables; we pass in state as an explicit argument.

OK, I'm looking at your implementation, and it seems you are making a closure for KernelFunction. But it seems unnecessary to me, because you're closing over a function pointer which is known at compile time when you call kernel(). So it seems to me that it should be made some sort of compile-time template argument. Then you won't have a closure at all and can just take an address of the template instantiated lambda.

that sounds like a cool idea, thanks. Will try it out in a separate diff. Getting rid of the closure would avoid some of the perf risks here.

ezyang · 2019-01-18T18:15:31Z

aten/src/ATen/core/dispatch/OpSchema.h

+inline TensorParameterDispatchKey tensor_to_dispatch_key(const at::Tensor& tensor) {
  return TensorParameterDispatchKey{
-      to_device_type_id(tensor.impl()->device_type()),
+      to_device_type_id(tensor.getIntrusivePtr()->device_type()),


Now that you're using at::Tensor, no need to go through the TensorImpl to get the device type, just say tensor.device().type()

ezyang · 2019-01-18T18:15:52Z

aten/src/ATen/core/dispatch/OpSchema.h

+}
+
+template<size_t index, size_t offset, class ParameterTypes, class Enable = void> struct get_ith_tensor_arg_ {
+  //static_assert(!std::is_same<ParameterTypes, ParameterTypes>::value, "Index out of bounds");


ezyang · 2019-01-18T18:16:58Z

caffe2/operators/experimental/c10/cpu/add_cpu.cc

-  Tensor C(C_);
+  Tensor A{C10Tensor(A_)};
+  Tensor B{C10Tensor(B_)};
+  Tensor C{C10Tensor(C_)};


I've suggested we should make some free functions to_caffe2_tensor and to_at_tensor to do this conversion. This is a really good reason to do so.

agreed. Currently, C10Tensor is the only way we have to converting tensors though, so I'll keep it here. Removing C10Tensor and adding a different way for conversion is a separate workstream.

ezyang · 2019-01-18T18:19:52Z

caffe2/operators/experimental/c10/cpu/concat_cpu.cc

-    at::ArrayRef<C10Tensor> inputs,
-    const C10Tensor& output_,
-    const C10Tensor& split_,
+    intrusive_ptr<ivalue::TensorList> inputs,


This looks weird to me.

Based on our discussion, we said there were two plausible ways operators could be written:

Taking a Stack as argument, in which case they feed in and out ivalues

The "native" format, which "looks something like how ATen functions are written."

What we have here is some weird bastard format. It isn't Stack. But it isn't native either, because you're taking in an ivalue::TensorList. No one should actually write a kernel this way; they should just take ArrayRef<Tensor>. What's the plan?

Can you remind me where the test for these experimental ops lives?

good point, I should change this to ArrayRef<Tensor>. There are currently only tests for layer_norm, the other ops are only there to make sure certain use cases still compile (e.g. operators with/without state, operators with/without attributes, operators with/without TensorList as input, single/multiple outputs, ...), but they don't have test cases, so they only check it's syntactically possible.

ezyang · 2019-01-18T18:24:33Z

aten/src/ATen/core/dispatch/OpSchema.h

+    };
+  }
+};
+template<class... ParamTypes, class FuncType> struct _wrapKernel<void, guts::typelist::typelist<ParamTypes...>, FuncType> {


Once you actually make it take a Stack, you won't need two overloads, right?

that's correct

ezyang

approved with comments

Differential Revision: D13684518 Differential Version: 69269648

Summary: Pull Request resolved: pytorch/pytorch#16051 This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*. Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`. Reviewed By: ezyang Differential Revision: D13684518 fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991

smessmer added 2 commits January 15, 2019 15:56

V1: Initial commit

6e12654

Differential Revision: D13684517 Differential Version: 68934004

V1: Initial commit

24868d6

Differential Revision: D13684518 Differential Version: 68934003

smessmer mentioned this pull request Jan 15, 2019

Move c10 dispatcher back to ATen/core #16050

Closed

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jan 15, 2019

V1: Initial commit

3f6f305

Differential Revision: D13684761 Differential Version: 68936140

smessmer mentioned this pull request Jan 16, 2019

Use intrusive_ptr for Blob in IValue #16052

Closed

smessmer changed the base branch from export-D13684517 to export-D13684761 January 16, 2019 00:09

smessmer added 2 commits January 15, 2019 16:11

V1: Merge with parent diff changes

452f2d5

Differential Revision: D13684518 Differential Version: 68936279

V2: (no description)

527df30

Differential Revision: D13684518 Differential Version: 68969226

This was referenced Jan 16, 2019

Pass IValue from c10 dispatcher to caffe2 operator #16065

Closed

Pass IValues from JIT to c10 dispatcher #16066

Closed

smessmer added 7 commits January 16, 2019 14:19

V1: Merge with parent diff changes

ef6968d

Differential Revision: D13684517 Differential Version: 69048668

V1: Merge with parent diff changes

d3a4e4d

Differential Revision: D13684761 Differential Version: 69048670

V3: Merge with parent diff changes

3c7624f

Differential Revision: D13684518 Differential Version: 69048667

V4: (no description)

79f8c57

Differential Revision: D13684518 Differential Version: 69070202

V5: (no description)

337b9c9

Differential Revision: D13684518 Differential Version: 69168735

V6: (no description)

5e4cad0

Differential Revision: D13684518 Differential Version: 69170129

V7: Merge with parent diff changes

902fd65

Differential Revision: D13684518 Differential Version: 69197627

smessmer changed the base branch from export-D13684761 to master January 18, 2019 02:05

V9: (no description)

f9ffd6d

Differential Revision: D13684518 Differential Version: 69198028

smessmer changed the title ~~[wip] Make c10 dispatcher use boxed kernel function pointers~~ Make c10 dispatcher use boxed kernel function pointers Jan 18, 2019

ezyang reviewed Jan 18, 2019

View reviewed changes

ezyang approved these changes Jan 18, 2019

View reviewed changes

V10: (no description)

aa305ac

Differential Revision: D13684518 Differential Version: 69269648

smessmer mentioned this pull request Jan 18, 2019

Add test case for calling c10 ops from pytorch #16062

Closed

This was referenced Jan 18, 2019

Also expose layer norm for double tensors #16063

Closed

Avoid closure around kernel #16165

Closed

Make kernel registration constexpr again #16166

Closed

facebook-github-bot closed this in c904416 Jan 19, 2019

soumith deleted the export-D13684518 branch February 21, 2019 23:30

ezyang added the merged label Jun 25, 2019

Make c10 dispatcher use boxed kernel function pointers #16051

Make c10 dispatcher use boxed kernel function pointers #16051

Uh oh!

Conversation

smessmer commented Jan 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smessmer commented Jan 15, 2019 •

edited

Loading