Skip to content

OpenVINO™ Model Server 2022.2

Compare
Choose a tag to compare
@atobiszei atobiszei released this 22 Sep 14:06
· 908 commits to main since this release
d3d28c8

The 2022.2 version is a major release with the new OpenVINO backend API (Application Programming Interface).

New features

KServe gRPC API

Beside Tensorflow Serving API, it is now possible to run calls to the OpenVINO Model Server using KServe API. The following gRPC methods are implemented: ModelInfer, ModelMetadata, ModelReady, ServerLive, ServerReady and ServerMetadata.
Inference execution supports the input both in the raw_input_contents format and InferTensorContents.

The same clients can be used to connect with the OpenVINO Model Server like with other KFServe compatible model servers. Check the samples using Triton client library in python.

KServe REST API – feature preview

Next to TensorFlow Serving REST API, we implemented also KFServe REST API. There are functional the following endpoints:

v2
v2/health/live
v2/health/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/infer

Beside the standard input format as tensor_data, there is implemented also the binary extension compatible with the Triton Inference Server.

That way the data could be sent in as arrays in json or as encoded to jpeg or png content.

Check how to connect to KFServe in the samples using Triton client library in python.

Execution metrics – feature preview

OpenVINO Model Server can now expose metrics compatible with Prometheus format. Metrics can be enabled in the server configuration file or using a command line parameter.
The following metrics are now available:

ovms_streams 
ovms_current_requests 
ovms_requests_success 
ovms_requests_fail 
ovms_request_time_us 
ovms_inference_time_us 
ovms_wait_for_infer_req_time_us 
ovms_infer_req_queue_size 
ovms_infer_req_active 

Metrics can be integrated with the Grafana reports or with horizontal autoscaler.

Learn more about using metrics.

Direct support for PaddlePaddle models

OpenVINO Model Server includes now PaddlePaddle model importer. It is possible to deploy models trained in PaddlePaddle framework directly into the models repository.
Check the demo how to deploy and use a segmentation model ocrnet-hrnet-w48-paddle in PaddlePaddle format.

Performance improvements in DAG execution

In several scenarios, the pipeline execution was improved to reduce data copy operation. That will be perceived as reduced latency and increased overall throughput.

Exemplary custom nodes are included in the OpenVINO Model Server public docker image.

Deploying the pipelines based on exemplary custom nodes . So far it was required to compile the custom node and mount into the container during the deployment. Now, those libraries are added to the public docker image. Demos including custom nodes, include now an option to use the precompiled version in the image or to build them from source. Check the demo of horizontal text detection pipeline

Breaking changes

Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.

With this version, the model server initiates the gRPC and REST endpoints (if enabled) before the models are loaded. Before that change, active network interface was acting as the readiness indicator. Now, the server readiness and models readiness can be checked using the dedicated endpoints according to the KFServe API:

v2/health/ready 
v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready 

It will make easier to monitor the state of models during the initialization phase.

Updated OpenCV version used in the model server to 4.6.0 version

That impacts the custom node compatibility. Any custom nodes using OpenCV for custom image transformation could be recompiled. Check the recommended process for building the custom nodes in the docker container in our examples

Bug Fixes:

  • Minor fixes in logging
  • Fixed configuring warning log level
  • Fixes in documentation
  • Security fixes

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.2 or
docker pull openvino/model_server:2022.2-gpu