ONNX Interface for Framework Integration: API Proposal

ONNX Interface for Framework Integration (ONNXIFI): API Proposal

Background

Leading hardware and systems vendors offer highly optimized software to run neural network graphs. These software can deliver order-of-magnitude speedups compared to generic implementations, but their integration with deep learning frameworks and applications is complicated by large variety in vendor-specific interfaces, and subtle incompatibilities with the software stack of high-level applications.

So far, ONNX format targets the problem of offline conversion of neural network models between different high-level frameworks and vendor-specific libraries through offline translation. In this proposal, we suggest that ONNX ecosystem could be enriched to enable runtime discovery and selection of high-performance graph execution backends, and online (in runtime) conversion of ONNX graph to internal representations of these implementations.

Ultimate Goal

We should strive for consensus on a library API to interface with optimized backends and offload parts of ONNX graphs to these high-performance hardware and software implementation. The API should enable wide interoperability between high-level deep learning frameworks, software implementations of optimized graph runtimes, and existing and upcoming neural network acceleration hardware.

The standardized API should reduce friction in deploying neural network models for all involved parties:

Applications would be able to ship only one version of a neural network model (either in ONNX format, or in the format of their deep learning framework, and convert it on the fly to ONNX).
Deep learning frameworks would be able to integrate with many hardware vendors by using only a single interface.
Hardware vendors would be able to implement only one interface and get integration with many deep learning frameworks.

Design Choices

Interface must use only highly portable aspects of C ABI.
Neural network graphs are passed as serialized ONNX ModelProto messages. To avoid serialization overhead, weights can be passed as raw memory blobs.
Input and output tensors are allocated by the caller and use NCHW layout.
Intermediate tensors are allocated by the vendor implementation, and can use any layout.
Backends (software implementations and hardware accelerators) are discovered, selected, and initialized on-demand in run-time. Multiple backends can be used in the same application simultaneously.
There is no minimal set of ONNX operators to implement. The implementer and the user (a deep learning framework) of the API decide which operators can and will be offloaded in runtime.
The proposal includes the minimal functionality to let deep learning frameworks and vendor libraries work together. Several extension mechanisms can be used for more efficient vendor- or platform-specific functionality.

Proposed Interface

We propose a small C-based API, which includes the following functionality:

Discover (onnxGetNumBackends) and query information (onnxGetBackendInfo) about high-performance backends
Initialize (onnxInitBackend) and deinitialize (onnxReleaseBackend) high-performance backends
Query if a backend supports an ONNX operator with particular parameters and input shapes (onnxGetBackendCompatibility)
Convert an ONNX graph to opaque vendor-specific representation of a backend (onnxInitGraph)
Specify memory locations and metadata about graph inputs and outputs (onnxSetGraphIO)
Run an ONNX graph, converted to vendor-specific representation (onnxRunGraph)
Release the vendor-specific representation of a graph and associated resources (onnxReleaseGraph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Interface for Framework Integration: API Proposal

ONNX Interface for Framework Integration (ONNXIFI): API Proposal

Background

Ultimate Goal

Design Choices

Proposed Interface

Guides

Release logistics

Proposals

Announcements

Clone this wiki locally