Skip to content

Latest commit

 

History

History
113 lines (86 loc) · 5.14 KB

inference_protocols.md

File metadata and controls

113 lines (86 loc) · 5.14 KB

Inference Protocols and APIs

Clients can communicate with Triton using either an HTTP/REST or GRPC protocol, or by a C API.

HTTP/REST and GRPC Protocols

Triton exposes both HTTP/REST and GRPC endpoints based on standard inference protocols that have been proposed by the KServe project. To fully enable all capabilities Triton also implements a number HTTP/REST and GRPC extensions to the KServe inference protocol.

The HTTP/REST and GRPC protcols provide endpoints to check server and model health, metadata and statistics. Additional endpoints allow model loading and unloading, and inferencing. See the KServe and extension documentation for details.

HTTP Options

Triton provides the following configuration options for server-client network transactions over HTTP protocol.

Compression

Triton allows the on-wire compression of request/response on HTTP through its clients. See HTTP Compression for more details.

GRPC Options

Triton exposes various GRPC parameters for configuring the server-client network transactions. For usage of these options, refer to the output from tritonserver --help.

SSL/TLS

These options can be used to configure a secured channel for communication. The server-side options include:

  • --grpc-use-ssl
  • --grpc-use-ssl-mutual
  • --grpc-server-cert
  • --grpc-server-key
  • --grpc-root-cert

For client-side documentation, see Client-Side GRPC SSL/TLS

For more details on overview of authentication in gRPC, refer here.

Compression

Triton allows the on-wire compression of request/response messages by exposing following option on server-side:

  • --grpc-infer-response-compression-level

For client-side documentation, see Client-Side GRPC Compression

Compression can be used to reduce the amount of bandwidth used in server-client communication. For more details, see gRPC Compression.

GRPC KeepAlive

Triton exposes GRPC KeepAlive parameters with the default values for both client and server described here.

These options can be used to configure the KeepAlive settings:

  • --grpc-keepalive-time
  • --grpc-keepalive-timeout
  • --grpc-keepalive-permit-without-calls
  • --grpc-http2-max-pings-without-data
  • --grpc-http2-min-recv-ping-interval-without-data
  • --grpc-http2-max-ping-strikes

For client-side documentation, see Client-Side GRPC KeepAlive.

C API

The Triton Inference Server provides a backwards-compatible C API that allows Triton to be linked directly into a C/C++ application. The API is documented in tritonserver.h.

A simple example using the C API can be found in simple.cc. A more complicated example can be found in the source that implements the HTTP/REST and GRPC endpoints for Triton. These endpoints use the C API to communicate with the core of Triton. The primary source files for the endpoints are grpc_server.cc and http_server.cc.