Single Operator Execution Interface #4453

orausch · 2020-07-08T12:40:37Z

Description: This PR adds an interface to the C ABI that enables the execution of single ONNX nodes, without the overhead of graph construction and memory allocation

Motivation and Context
The alternative way to execute single operators/nodes is to create an ONNX graph containing a single node only. However, this (understandably) adds a lot of overhead, as can be seen in the plot below.

Here is an example of how the API can be used (UPDATE: new api for adding attributes):

// Setup the context
OrtExecutableKernelContext* kernel_context;
CheckStatus(OrtApi->CreateExecutableKernelContext("MyConv", "Conv", &kernel_context));

// Add parameter X
CheckStatus(OrtApi->ExecutableKernelContext_AddInput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));
// Add parameter W
CheckStatus(OrtApi->ExecutableKernelContext_AddInput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));
// Add parameter Y
CheckStatus(OrtApi->ExecutableKernelContext_AddOutput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));

// Setup attributes
CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeString(kernel_context, "auto_pad", "NOTSET"));
CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeInt(kernel_context, "group", 1));
{
	// Setup attribute strides
	int64_t values[2];
	values[0] = 2;
	values[1] = 2;

	CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeInts(kernel_context, "strides", values, 2));
}
// Create the executable kernel
OrtExecutableKernel* kernel;
CheckStatus(OrtApi->CreateExecutableKernel(__ort_session, kernel_context, /*provider_index=*/0, &kernel));

// Execute the kernel
CheckStatus(OrtApi->ExecutableKernel_SetInput(kernel, 0, ort_value_input_X));
CheckStatus(OrtApi->ExecutableKernel_SetInput(kernel, 1, ort_value_input_W));
CheckStatus(OrtApi->ExecutableKernel_SetOutput(kernel, 0, ort_value_output_Y));
CheckStatus(OrtApi->ExecutableKernel_Compute(kernel));

It has been tested with the CPU and CUDA execution providers.

The current design implements a new ExecutionFrame that is used to execute the op. This is less than optimal; I will attempt to change this in the future. The API will also have to be extended to add other providers than CPU.

With this change, the ExecutableKernelContextImpl is initalized at kernel creation, and not at compute time, which should remove some overhead. This allows multiple calls with different datato be made using the same kernel. Furthermore, the main graph of the different op kernels is now shared through OrtKernelSession.

* Reuse provides across Kernels * Support CUDA providers

ghost · 2020-07-08T12:40:50Z

All CLA requirements met.

Craigacp · 2020-07-08T13:17:19Z

This looks interesting to expose into the Java API, but is there a reason why the input and output arguments are specified separately from the call to compute? In session.run they are supplied to that call, and I feel like that maps a little more naturally.

orausch · 2020-07-08T13:29:58Z

This looks interesting to expose into the Java API, but is there a reason why the input and output arguments are specified separately from the call to compute? In session.run they are supplied to that call, and I feel like that maps a little more naturally.

I'm not particularly tied to this exact API; if exposed to the Java or Python API, it could be implemented similarly to session.run. The current option was just easy implement, and it has the performance advantage of not creating any intermediate data structures.

hariharans29 · 2020-08-13T21:15:06Z

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

azure-pipelines · 2020-08-13T21:15:10Z

Pull request contains merge conflicts.

hariharans29 · 2020-08-13T21:15:12Z

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

azure-pipelines · 2020-08-13T21:15:17Z

Pull request contains merge conflicts.

# Conflicts: # include/onnxruntime/core/session/onnxruntime_c_api.h # onnxruntime/core/session/onnxruntime_c_api.cc # onnxruntime/core/session/ort_apis.h

faxu · 2020-08-17T22:31:31Z

/azp run Windows GPU CI Pipeline, WIndows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

faxu · 2020-08-17T22:31:47Z

/azp run Linux CPU CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

azure-pipelines · 2020-08-17T22:31:55Z

Azure Pipelines successfully started running 5 pipeline(s).

azure-pipelines · 2020-08-17T22:32:23Z

Azure Pipelines successfully started running 8 pipeline(s).

Some CUDA ops static_cast the context to OpKernelContextInternal. Switching to this required initializing a session state.

codemzs · 2020-09-21T05:48:08Z

Hi @jywu-msft / @orausch either we get traction on this PR or we close it. Can you please drive this to closure? it has been outstanding for a while.

orausch · 2020-09-22T09:35:22Z

Thanks for following up @codemzs. I think a good next step would be to get a review in from someone on the ORT team.

Let me know if there is any other way I can help drive this forward.

codemzs · 2020-09-22T16:03:32Z

@orausch I believe Pranav from ORT team will be looking at this.

EmergentOrder · 2020-10-27T13:47:30Z

This looks interesting to expose into the Java API, but is there a reason why the input and output arguments are specified separately from the call to compute? In session.run they are supplied to that call, and I feel like that maps a little more naturally.

+1
This is exactly the kind of interface I have in ONNX-Scala, including the graph overhead.
I recently switched to using the official ORT Java API there, although I have encountered some issues, will report separately.
It would be really great to see this merged and then exposed in Java.

EmergentOrder · 2020-11-14T16:56:41Z

@pranavsharma @jywu-msft @RyanUnderhill Can one of you (or another ORT team member) review this? Looking forward to this getting in and then exposed in the Java API ( @Craigacp ). Thanks in advance.

pranavsharma

Thanks for your contribution.

The number of APIs required to execute one kernel seems quite a lot. I wonder if we can reduce this number and simplify it. Need some more thought on this.

pranavsharma · 2020-12-16T20:35:27Z

onnxruntime/core/framework/execution_frame.h

@@ -69,6 +69,8 @@ class IExecutionFrame {

  Status ReleaseMLValue(int ort_value_idx);

+  Status SetOrtValue(OrtValue &value, int ort_value_idx);


value can be passed by const-ref.

pranavsharma · 2020-12-16T20:35:42Z

onnxruntime/core/session/abi_kernel_execution.h

@@ -0,0 +1,161 @@
+#pragma once


Needs license header.

pranavsharma · 2020-12-16T20:58:07Z

onnxruntime/core/session/abi_kernel_execution.cc

@@ -0,0 +1,648 @@
+// Licensed under the MIT License.


Full license header needed here.

pranavsharma · 2020-12-16T21:06:59Z

onnxruntime/core/session/abi_kernel_execution.cc

+                                       /*model_functions=*/std::initializer_list<ONNX_NAMESPACE::FunctionProto>{},
+                                       /*logger=*/logging::LoggingManager::DefaultLogger());
+
+    KernelSessionImpl *session = new KernelSessionImpl(std::move(model));


Consider using unique_ptr to avoid the potential of a mem leak.

pranavsharma · 2020-12-16T21:10:52Z

include/onnxruntime/core/session/onnxruntime_c_api.h

+  ORT_API2_STATUS(CreateExecutableKernel,
+                  _Inout_ OrtKernelSession* session,
+                  _In_ OrtExecutableKernelContext* context,
+                  size_t provider_id,


How will the user know which id corresponds to which provider?

pranavsharma · 2020-12-16T21:18:32Z

onnxruntime/core/session/abi_kernel_execution.cc

+    ORT_ENFORCE(provider_id < session->provider_list.size(),
+                "provider_id (" + std::to_string(provider_id) + ")must be less than the provider list size (" + std::to_string(session->provider_list.size()) + ").");
+
+    SingleKernelExecutionFrame* frame;


Another potential for mem leak.

pranavsharma · 2020-12-16T23:03:50Z

onnxruntime/core/session/abi_kernel_execution.h

+
+    std::unique_ptr<NodeIndexInfo> node_index_info_;
+
+    std::vector<int> input_index_to_mlvalue_map_;


these don't look like maps

pranavsharma · 2020-12-16T23:04:03Z

onnxruntime/core/session/abi_kernel_execution.h

+    std::vector<int> fetches_mlvalue_idxs_;
+    std::vector<OrtValue> fetches_;
+    std::vector<int> feed_mlvalue_idxs_;
+    std::vector<OrtValue> feeds_;


why do we need to store the feeds and fetches?

pranavsharma · 2020-12-16T23:11:33Z

onnxruntime/core/session/abi_kernel_execution.cc

+  }
+
+  // create the context info
+  std::unique_ptr<SingleKernelExecutionFrame::Info> info = onnxruntime::make_unique<SingleKernelExecutionFrame::Info>(


Do we need to expose this Info object outside? Looks like this will require an unnecessary heap allocation.

pranavsharma · 2020-12-16T23:42:16Z

onnxruntime/core/session/abi_kernel_execution.cc

+ORT_API_STATUS_IMPL(OrtApis::ExecutableKernel_Compute,
+                    _Inout_ OrtExecutableKernel *kernel_) {
+  API_IMPL_BEGIN
+    SingleKernelExecutionFrame* kernel = reinterpret_cast<SingleKernelExecutionFrame*>(kernel_);


This is a bit confusing. The distinction between a kernel and a frame is lost here.

…-merged

EmergentOrder · 2021-03-08T01:06:58Z

Thanks @pranavsharma .
@orausch could you take another pass at this?

orausch · 2021-03-12T17:18:38Z

@EmergentOrder, this is still on my radar, but I'll likely only get around to this in april.

stale · 2021-06-11T01:59:23Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

EmergentOrder · 2021-06-17T12:56:06Z

Bumping to prevent auto-closure

EmergentOrder · 2021-10-20T15:30:06Z

checking in, @orausch could you take another pass at this?

orausch · 2021-10-20T22:27:43Z

Hey, I discussed this with @souptc and it seems like the "most mergable" way forward is to instead expose the ORT eager mode (the one that is already commited) as a C API.

While this will have more overhead that the solution proposed here, some caching of the constructed graph should hopefully bring latency down far enough to be useful for performance-oriented use cases.

This work will be done in new PRs, and is largely unrelated to the solution presented here.

stale · 2022-04-16T07:53:49Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

orausch added 4 commits June 18, 2020 11:39

Add C API for single kernel execution

77d96e5

The current design implements a new ExecutionFrame that is used to execute the op. This is less than optimal; I will attempt to change this in the future. The API will also have to be extended to add other providers than CPU.

Cleanup provider setup

5f680ef

* Reuse provides across Kernels * Support CUDA providers

Add release functions for KernelSession and ExecutableKernelContext

4cb3a05

orausch requested a review from a team as a code owner July 8, 2020 12:40

liqunfu added the api:C/C++ label Aug 3, 2020

orausch and others added 9 commits August 6, 2020 00:25

Remove protobuf void pointer and std::map pointer from C API

80f9443

Move class methods out of header file

daafc11

Add implementations for Attribute setters

8094b8c

Move AddNode back to private section in graph.h

f088680

Cleanup and formatting

5944b18

Add error when node has no outputs

26815ff

Give Nodeargs unique names

527b614

Improve error message for incorrect provider

26b068f

Add API to check if outputs will be on Cpu

e27b502

orausch added 2 commits August 17, 2020 20:22

Merge remote-tracking branch 'remotes/upstream/master' into master

f2b8581

# Conflicts: # include/onnxruntime/core/session/onnxruntime_c_api.h # onnxruntime/core/session/onnxruntime_c_api.cc # onnxruntime/core/session/ort_apis.h

Re-add training whitespace

7ef3799

orausch added 2 commits August 18, 2020 08:31

Fix clang unused attribute warning

c40b581

Change std::make_unique -> onnxruntime::make_unique

5eca53b

jywu-msft requested a review from RyanUnderhill August 31, 2020 02:38

orausch added 3 commits September 2, 2020 10:31

Add IsInputOnCpu

1e0bbfa

Properly initialize data transfer manager

4dafe50

Switch from OpKernelContext to OpKernelContextInternal

8420f30

Some CUDA ops static_cast the context to OpKernelContextInternal. Switching to this required initializing a session state.

pranavsharma reviewed Dec 17, 2020

View reviewed changes

orausch added 3 commits February 28, 2021 12:31

Merge remote-tracking branch 'upstream/master' into add-session-state…

be03a8b

…-merged

Fix namespace errors after merge

c412cb3

Register CUDA allocators when session is created

6a79093

stale bot added stale issues that have not been addressed in a while; categorized by a bot and removed stale issues that have not been addressed in a while; categorized by a bot labels Jun 11, 2021

orausch mentioned this pull request Feb 14, 2022

Add C/C++ API interface to ORTInvoker. #10548

Open

orausch mentioned this pull request Mar 9, 2022

APIs for custom op to invoke ort operator directly #10713

Merged

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 16, 2022

sophies927 added api issues related to all other APIs: C, C++, Python, etc. and removed api:CC++ labels Aug 12, 2022

EmergentOrder mentioned this pull request Nov 5, 2022

Create OrtKernelInfo for use in CreateOp via the C API #12279

Open

ekmixon mentioned this pull request May 13, 2024

[Snyk] Security upgrade @typescript-eslint/eslint-plugin from 4.22.1 to 5.10.0 ekmixon/onnxruntime#203

Open

MaxMood96 mentioned this pull request May 14, 2024

[Snyk] Security upgrade @typescript-eslint/eslint-plugin from 4.22.1 to 5.10.0 MaxMood96/onnxruntime#454

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Operator Execution Interface #4453

Single Operator Execution Interface #4453

orausch commented Jul 8, 2020 •

edited

ghost commented Jul 8, 2020 •

edited by ghost

Craigacp commented Jul 8, 2020

orausch commented Jul 8, 2020

hariharans29 commented Aug 13, 2020

azure-pipelines bot commented Aug 13, 2020

hariharans29 commented Aug 13, 2020

azure-pipelines bot commented Aug 13, 2020

faxu commented Aug 17, 2020

faxu commented Aug 17, 2020

azure-pipelines bot commented Aug 17, 2020

azure-pipelines bot commented Aug 17, 2020

codemzs commented Sep 21, 2020

orausch commented Sep 22, 2020

codemzs commented Sep 22, 2020

EmergentOrder commented Oct 27, 2020

EmergentOrder commented Nov 14, 2020 •

edited

pranavsharma left a comment

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

pranavsharma Dec 16, 2020

EmergentOrder commented Mar 8, 2021

orausch commented Mar 12, 2021

stale bot commented Jun 11, 2021

EmergentOrder commented Jun 17, 2021

EmergentOrder commented Oct 20, 2021

orausch commented Oct 20, 2021

stale bot commented Apr 16, 2022

		@@ -69,6 +69,8 @@ class IExecutionFrame {

		Status ReleaseMLValue(int ort_value_idx);

		Status SetOrtValue(OrtValue &value, int ort_value_idx);


		std::unique_ptr<NodeIndexInfo> node_index_info_;

		std::vector<int> input_index_to_mlvalue_map_;

Single Operator Execution Interface #4453

Are you sure you want to change the base?

Single Operator Execution Interface #4453

Conversation

orausch commented Jul 8, 2020 • edited

ghost commented Jul 8, 2020 • edited by ghost

Craigacp commented Jul 8, 2020

orausch commented Jul 8, 2020

hariharans29 commented Aug 13, 2020

azure-pipelines bot commented Aug 13, 2020

hariharans29 commented Aug 13, 2020

azure-pipelines bot commented Aug 13, 2020

faxu commented Aug 17, 2020

faxu commented Aug 17, 2020

azure-pipelines bot commented Aug 17, 2020

azure-pipelines bot commented Aug 17, 2020

codemzs commented Sep 21, 2020

orausch commented Sep 22, 2020

codemzs commented Sep 22, 2020

EmergentOrder commented Oct 27, 2020

EmergentOrder commented Nov 14, 2020 • edited

pranavsharma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EmergentOrder commented Mar 8, 2021

orausch commented Mar 12, 2021

stale bot commented Jun 11, 2021

EmergentOrder commented Jun 17, 2021

EmergentOrder commented Oct 20, 2021

orausch commented Oct 20, 2021

stale bot commented Apr 16, 2022

orausch commented Jul 8, 2020 •

edited

ghost commented Jul 8, 2020 •

edited by ghost

EmergentOrder commented Nov 14, 2020 •

edited