This is an end-to-end workflow, stitched together and deployed through Helm by using microservices/docker images which can be built.
This workflow provides the following capabilities:
- Quantization Aware Training using Optimum-Intel*[openvino, nncf]
- Inference using Hugging Face Transformers APIs with Optimum-Intel*
- Inference using Hugging Face Transformers APIs with Optimum-ONNX Runtime*(OpenVINO™ Execution Provider)
- Inference using ONNX Runtime APIs with OpenVINO™ Execution Provider
- Inference using OpenVINO™ Model Server
This is the main release, and it is built using the following components.
- Intel* Distribution of OpenVINO™ Toolkit v.2022.2
- Optimum-Intel* and Optimum-ONNX Runtime
This workflow is tested using the bert-large-uncased-whole-word-masking-finetuned-squad model for the Question Answering use case.