Skip to content

Quantization Aware Training and Inference using OpenVINO™ toolkit v1.1

Latest
Compare
Choose a tag to compare
@avidiyal avidiyal released this 22 Dec 05:57
· 1 commit to main since this release
06582fd

This is an end-to-end workflow, stitched together and deployed through Helm by using microservices/docker images which can be built.

This workflow provides the following capabilities:

  1. Quantization Aware Training using Optimum-Intel*[openvino, nncf]
  2. Inference using Hugging Face Transformers APIs with Optimum-Intel*
  3. Inference using Hugging Face Transformers APIs with Optimum-ONNX Runtime*(OpenVINO™ Execution Provider)
  4. Inference using ONNX Runtime APIs with OpenVINO™ Execution Provider
  5. Inference using OpenVINO™ Model Server

This is the main release, and it is built using the following components.

  1. Intel* Distribution of OpenVINO™ Toolkit v.2022.2
  2. Optimum-Intel* and Optimum-ONNX Runtime

This workflow is tested using the bert-large-uncased-whole-word-masking-finetuned-squad model for the Question Answering use case.