Quantization Aware Training and Inference using OpenVINO™ toolkit v1.1

Latest

Latest

avidiyal released this 22 Dec 05:57

· 1 commit to main since this release

This is an end-to-end workflow, stitched together and deployed through Helm by using microservices/docker images which can be built.

This workflow provides the following capabilities:

Quantization Aware Training using Optimum-Intel*[openvino, nncf]
Inference using Hugging Face Transformers APIs with Optimum-Intel*
Inference using Hugging Face Transformers APIs with Optimum-ONNX Runtime*(OpenVINO™ Execution Provider)
Inference using ONNX Runtime APIs with OpenVINO™ Execution Provider
Inference using OpenVINO™ Model Server

This is the main release, and it is built using the following components.

Intel* Distribution of OpenVINO™ Toolkit v.2022.2
Optimum-Intel* and Optimum-ONNX Runtime

This workflow is tested using the bert-large-uncased-whole-word-masking-finetuned-squad model for the Question Answering use case.

Assets 2