# Quantize a Hugging Face Question-Answering Model with OpenVINO

This notebook shows how to quantize a question answering model with OpenVINO's Neural Network Compression Framework (NNCF). Question-answering models can understand and answer questions based on a given context, such as a paragraph of text or a document. 

OpenVINO is a toolkit for accelerated inference on Intel hardware, including CPUs and integrated GPUs. This allows developers to deploy Hugging Face's NLP models in a wide range of scenarios, from small edge devices to large cloud environments.

To install the requirements for using this notebook, please do `pip install optimum[openvino,nncf] datasets evaluate[evaluator]`. 

In [1]:
import os
import random
import sys
import time
import warnings
from functools import partial
from pathlib import Path

import datasets
import evaluate
import numpy as np
import pandas as pd
import torch
import transformers
from evaluate import evaluator
from optimum.intel.openvino import OVConfig, OVModelForQuestionAnswering, OVQuantizer
from transformers import (
    AutoModelForQuestionAnswering,
    AutoTokenizer,
    EvalPrediction,
    TrainingArguments,
    default_data_collator,
    pipeline,
)

from utils.trainer_qa import QuestionAnsweringTrainer
from utils.utils_qa import (
    post_processing_function_qa,
    prepare_train_features,
    prepare_validation_features,
)

transformers.logging.set_verbosity_error()
datasets.logging.set_verbosity_error()



## Settings

We define MODEL_ID and DATASET_NAME, and the paths for the quantized model files. VERSION_2_WITH_NEGATIVE should be set to TRUE if a version of the SQuAD v2 dataset is used, which includes questions that do not have an answer. 

For this tutorial, we use the [Stanford Question Answering Dataset (SQuAD)](https://huggingface.co/datasets/squad), a reading comprehension dataset, consisting of questions on a set of Wikipedia articles, where the answer to every question is a segment of text from a given context.

In [2]:
MODEL_ID = "csarron/bert-base-uncased-squad-v1"
# When using a different dataset then SQuAD v1, please edit the constants at the top of utils/utils_qa.py
DATASET_NAME = "squad"
VERSION_2_WITH_NEGATIVE = False

base_model_path = Path(f"models/{MODEL_ID}")
fp32_model_path = base_model_path.with_name(base_model_path.name + "_FP32")
int8_ptq_model_path = base_model_path.with_name(base_model_path.name + "_INT8_PTQ")
int8_qat_model_path = base_model_path.with_name(base_model_path.name + "_INT8_QAT")

## Load Model and Tokenizer

We load the model from the Hugging Face Hub. The model will be automatically downloaded if it has not been downloaded before, or loaded from the cache otherwise.

We also load the tokenizer, which converts the questions and contexts from the dataset to tokens: numerical values in the format the model expects.

In [3]:
model = AutoModelForQuestionAnswering.from_pretrained(MODEL_ID)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# See how the tokenizer converts input text to model input values
print(tokenizer("hello world!"))

{'input_ids': [101, 7592, 2088, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1]}


## Preview the Dataset

The `datasets` library makes it easy to load datasets. Common datasets can be loaded from the Hugging Face Hub by providing the name of the dataset. See https://github.com/huggingface/datasets. We can load the SQuAD dataset with `load_dataset` and show a random dataset item. Every dataset item in the SQuAD dataset has a unique id, a title which denotes the category, a context and a question, and answers. The answer is a subset of the context, and both the text of the answer, and the start position of the answer in the context (answer_start) are returned.

In [4]:
dataset = datasets.load_dataset(DATASET_NAME)
dataset["train"][31415]

  0%|          | 0/2 [00:00<?, ?it/s]

{'id': '570e53690b85d914000d7e3c',
 'title': 'Melbourne',
 'context': "Melbourne is experiencing high population growth, generating high demand for housing. This housing boom has increased house prices and rents, as well as the availability of all types of housing. Subdivision regularly occurs in the outer areas of Melbourne, with numerous developers offering house and land packages. However, after 10 years[when?] of planning policies to encourage medium-density and high-density development in existing areas with greater access to public transport and other services, Melbourne's middle and outer-ring suburbs have seen significant brownfields redevelopment.",
 'question': 'What effect has the housing boom had on house prices and rents?',
 'answers': {'text': ['increased'], 'answer_start': [108]}}

## Post Training Quantization

For post-training quantization (PTQ) we start with a Hugging Face AutoModel, in this case AutoModelForQuestionAnswering. The quantizer also needs a dataset. 

To quantize a model with post-training quantization, we define an `OVQuantizer`, attach a dataset, and call the `quantize` method. That's all!

### Prepare the Dataset

We need a representative dataset to quantize the model. The SQuAD dataset is pretrained on a large dataset with a wide variety of questions and answers, and it generalizes pretty well to questions and contexts it has never seen before. For production use, you would finetune this dataset with questions and context specific to your domain. In this notebook, we use a subset of the SQuAD dataset, for demonstration purposes. We chose the _Super Bowl 50_ category from the validation subset of SQuAD because it has a large number of questions.

Post-training quantization does not need a training and validation dataset, but we define these splits here to allow doing quantization-aware training later in this notebook, and to make sure we're using the same dataset split for validation.

In [5]:
def preprocess_fn(examples, tokenizer):
    return tokenizer(examples["question"], examples["context"], padding=True, truncation=True, max_length=384)

In [6]:
NUM_TRAIN_ITEMS = 600
filtered_examples = dataset["validation"].filter(lambda x: x["title"].startswith("Super_Bowl_50"))
train_examples = filtered_examples.select(range(0, NUM_TRAIN_ITEMS))
train_dataset = train_examples.map(lambda x: preprocess_fn(x, tokenizer), batched=True)

validation_examples = filtered_examples.select(range(NUM_TRAIN_ITEMS, len(filtered_examples)))
validation_dataset = validation_examples.map(lambda x: preprocess_fn(x, tokenizer), batched=True)

### Quantize the Model with Post Training Quantization

In [7]:
# Hide PyTorch warnings about missing shape inference
warnings.simplefilter("ignore")

# Quantize the model
quantizer = OVQuantizer.from_pretrained(model)
quantizer.quantize(calibration_dataset=train_dataset, save_directory=int8_ptq_model_path)

### Show accuracy difference

We load the quantized model and the original FP32 model, and compare the metrics on both models. The [evaluate](https://github.com/huggingface/evaluate) library makes it very easy to evaluate models on a given dataset, with a given metric. For the SQuAD dataset, an F1 score and exact_match metric are returned.

For loading the quantized model with OpenVINO, we use `OVModelForQuestionAnswering`. It can be used in the same way as [`AutoModelForQuestionAnswering`](https://huggingface.co/docs/transformers/main/model_doc/auto).

The evaluator is called with a [Pipeline](https://huggingface.co/docs/transformers/main/en/pipeline_tutorial) which we will also use later on to show inference.

In [8]:
quantized_model_ptq = OVModelForQuestionAnswering.from_pretrained(int8_ptq_model_path)
original_model = AutoModelForQuestionAnswering.from_pretrained(MODEL_ID)
ov_qa_pipeline_ptq = pipeline("question-answering", model=quantized_model_ptq, tokenizer=tokenizer)
hf_qa_pipeline = pipeline("question-answering", model=original_model, tokenizer=tokenizer)

squad_eval = evaluator("question-answering")

ov_eval_results = squad_eval.compute(
    model_or_pipeline=ov_qa_pipeline_ptq,
    data=validation_examples,
    metric="squad",
    squad_v2_format=VERSION_2_WITH_NEGATIVE,
)

hf_eval_results = squad_eval.compute(
    model_or_pipeline=hf_qa_pipeline,
    data=validation_examples,
    metric="squad",
    squad_v2_format=VERSION_2_WITH_NEGATIVE,
)
pd.DataFrame.from_records(
    [hf_eval_results, ov_eval_results], columns=["exact_match", "f1"], index=["FP32", "INT8 PTQ"]
).round(2)

Unnamed: 0,exact_match,f1
FP32,82.86,86.33
INT8 PTQ,82.86,87.42


### Compare model size

Quantization reduces the size of the model by up to four times. We save the FP32 PyTorch model and define a function to show the model size for the PyTorch and OpenVINO models.

In [9]:
def get_model_size(model_folder, framework):
    """Return OpenVINO or PyTorch model size in Mb"""
    if framework == "openvino":
        model_path = Path(model_folder) / "openvino_model.xml"
        model_size = model_path.stat().st_size + model_path.with_suffix(".bin").stat().st_size
    elif framework == "pytorch":
        model_path = Path(model_folder) / "pytorch_model.bin"
        model_size = model_path.stat().st_size
    model_size /= 1024 * 1024
    return model_size

In [10]:
model.save_pretrained(fp32_model_path)
get_model_size(fp32_model_path, "pytorch") / get_model_size(int8_ptq_model_path, "openvino")

2.3905629046695402

## Quantization Aware Training

Post training quantization worked reasonably well, but resulted in a drop in exact_match of of a few percentage points. Quantization aware training integrates quantization in the training loop. The "quantization error" is added to the loss function, which reduces the accuracy drop in the resulting model.



### Prepare data for QuestionAnsweringTrainer

The QuestionAnsweringTrainer expects the data to be formatted in a specific way. We use the train and validation examples from the post training quantization example, and map them to a dataset formatted for quantization aware training.

The `prepare_train_features`, `prepare_validation_features` and `post_processing_function_qa` functions were adapted from the [Question Answering example script](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/question-answering).  Please check out the [bert_utils](bert_utils.py) file to see how they are defined. 

In [11]:
train_dataset_qat = train_examples.map(
    lambda x: prepare_train_features(x, tokenizer, True),
    batched=True,
    remove_columns=train_examples.column_names,
    load_from_cache_file=True,  # not data_args.overwrite_cache,
    desc="Running tokenizer on train dataset",
)

validation_dataset_qat = validation_examples.map(
    lambda x: prepare_validation_features(x, tokenizer, True),
    batched=True,
    remove_columns=validation_examples.column_names,
    load_from_cache_file=True,  # not data_args.overwrite_cache,
    desc="Running tokenizer on validation dataset",
)

### Quantize the Model with Quantization Aware Training

For quantization aware training, we create a QuestionAnsweringTrainer. This Trainer is defined in [trainer_qa.py](trainer_qa.py) and taken from the [Question Answering example](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/question-answering). It is a modified version of the standard Hugging Face [QuestionAnsweringTrainer](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) that adds quantization aware training with [NNCF](https://github.com/openvinotoolkit/nncf/). See the Hugging Face [Trainer documentation](https://huggingface.co/docs/transformers/main_classes/trainer) for more information on the Trainer class.

Apart from the standard training arguments, the QuestionAnsweringTrainer for NNCF requires an `ov_config` parameter with quantization settings. The default `OVConfig()` settings should work well for many cases. For more information about modifying the settings, or to understand what the settings mean, please refer to the [NNCF quantization documentation](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md)

In [12]:
# Show the quantization configuration
OVConfig().compression

{'algorithm': 'quantization',
 'preset': 'mixed',
 'overflow_fix': 'disable',
 'initializer': {'range': {'num_init_samples': 300, 'type': 'mean_min_max'},
  'batchnorm_adaptation': {'num_bn_adaptation_samples': 0}},
 'scope_overrides': {'activations': {'{re}.*matmul_0': {'mode': 'symmetric'}}},
 'ignored_scopes': ['{re}.*Embeddings.*',
  '{re}.*__add___[0-1]',
  '{re}.*layer_norm_0',
  '{re}.*matmul_1',
  '{re}.*__truediv__*'],
 'export_to_onnx_standard_ops': False}

In [13]:
def compute_metrics(p: EvalPrediction):
    """Helper function for metric computation in training loop"""
    return metric.compute(predictions=p.predictions, references=p.label_ids)


metric = evaluate.load("squad_v2" if VERSION_2_WITH_NEGATIVE else "squad")
ov_config = OVConfig()

model = AutoModelForQuestionAnswering.from_pretrained(MODEL_ID)


trainer = QuestionAnsweringTrainer(
    model=model,
    ov_config=ov_config,
    feature="question-answering",
    args=TrainingArguments(int8_qat_model_path, num_train_epochs=2.0, do_train=True, do_eval=True),
    train_dataset=train_dataset_qat,
    eval_dataset=validation_dataset_qat,
    eval_examples=validation_examples,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    post_process_function=post_processing_function_qa,
    data_collator=default_data_collator,
)
train_result = trainer.train()
trainer.save_model()

Configuration saved in models/csarron/bert-base-uncased-squad-v1_INT8_QAT/config.json


{'train_runtime': 391.9701, 'train_samples_per_second': 3.164, 'train_steps_per_second': 0.398, 'train_loss': 0.7960657951159354, 'epoch': 2.0}


Model weights saved in models/csarron/bert-base-uncased-squad-v1_INT8_QAT/pytorch_model.bin
tokenizer config file saved in models/csarron/bert-base-uncased-squad-v1_INT8_QAT/tokenizer_config.json
Special tokens file saved in models/csarron/bert-base-uncased-squad-v1_INT8_QAT/special_tokens_map.json


### Show accuracy difference

We use the same evaluator as we did for Post Training Quantization, and show the results of all three models (FP32, INT8 PTQ and INT8 QAT)

In [14]:
squad_eval = evaluator("question-answering")

quantized_model_qat = OVModelForQuestionAnswering.from_pretrained(int8_qat_model_path)
ov_qa_pipeline_qat = pipeline("question-answering", model=quantized_model_qat, tokenizer=tokenizer)
ov_eval_results_qat = squad_eval.compute(
    model_or_pipeline=ov_qa_pipeline_qat,
    data=validation_examples,
    metric="squad",
    squad_v2_format=VERSION_2_WITH_NEGATIVE,
)
eval_results = [hf_eval_results, ov_eval_results, ov_eval_results_qat]
df = pd.DataFrame.from_records(eval_results, columns=["exact_match", "f1"], index=["FP32", "INT8 PTQ", "INT QAT"])
df.round(2)

loading configuration file models/csarron/bert-base-uncased-squad-v1_INT8_QAT/config.json
Model config BertConfig {
  "_name_or_path": "models/csarron/bert-base-uncased-squad-v1_INT8_QAT/config.json",
  "architectures": [
    "NNCFNetwork"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



Unnamed: 0,exact_match,f1
FP32,82.86,86.33
INT8 PTQ,82.86,87.42
INT QAT,79.05,83.4


## Show inference

Hugging Face `pipeline`'s simplify inference on a model. A pipeline is created by adding a task, model and tokenizer to the `pipeline` function. Inference is then as simple as `qa_pipeline("question", "context").

We created three pipelines earlier in this notebook: `hf_qa_pipeline`, `ov_qa_pipeline_ptq` and `ov_qa_pipeline_qat` for the FP32 Hugging Face model and the INT8 PTQ and QAT models

In [15]:
context = validation_examples[200]["context"]
question = "Who won the game?"
print(context)

Super Bowl 50 featured numerous records from individuals and teams. Denver won despite being massively outgained in total yards (315 to 194) and first downs (21 to 11). Their 194 yards and 11 first downs were both the lowest totals ever by a Super Bowl winning team. The previous record was 244 yards by the Baltimore Ravens in Super Bowl XXXV. Only seven other teams had ever gained less than 200 yards in a Super Bowl, and all of them had lost. The Broncos' seven sacks tied a Super Bowl record set by the Chicago Bears in Super Bowl XX. Kony Ealy tied a Super Bowl record with three sacks. Jordan Norwood's 61-yard punt return set a new record, surpassing the old record of 45 yards set by John Taylor in Super Bowl XXIII. Denver was just 1-of-14 on third down, while Carolina was barely better at 3-of-15. The two teams' combined third down conversion percentage of 13.8 was a Super Bowl low. Manning and Newton had quarterback passer ratings of 56.6 and 55.4, respectively, and their added total

In [16]:
hf_qa_pipeline(question, context)["answer"]

'Denver'

In [17]:
ov_qa_pipeline_ptq(question, context)["answer"]

'Denver'

In [18]:
ov_qa_pipeline_qat(question, context)["answer"]

'Denver'

## Compare Inference of FP32 and INT8 models

Metrics like exact match and F1 score give an impression of the quality of the model, but to get a better sense of the quality of the model, it's always useful to look at model predictions. In the next cell, we go over the items in the validation set, and display the items where the FP32 prediction score is different than the INT8 prediction score. In this example we compare the FP32 model with the QAT model; it can also be insightful to compare the PTQ model and the QAT model.

The results show that for some predictions, the FP32 model is better, but for others, the INT8 model is.

In [19]:
results = []
for item in validation_examples:
    id, title, context, question, answers = item.values()
    fp32_answer = hf_qa_pipeline(question, context)["answer"]
    int8_answer = ov_qa_pipeline_qat(question, context)["answer"]

    references = [{"id": id, "answers": answers}]
    fp32_predictions = [{"id": id, "prediction_text": fp32_answer}]
    int8_predictions = [{"id": id, "prediction_text": int8_answer}]

    fp32_score = round(metric.compute(references=references, predictions=fp32_predictions)["f1"], 2)
    int8_score = round(metric.compute(references=references, predictions=int8_predictions)["f1"], 2)

    if int8_score != fp32_score:
        results.append((question, answers["text"], fp32_answer, fp32_score, int8_answer, int8_score))

pd.set_option("display.max_colwidth", None)
pd.DataFrame(
    results,
    columns=["Question", "Answer", "FP32 answer", "FP32 F1", "INT8 answer", "INT8 F1"],
)

Unnamed: 0,Question,Answer,FP32 answer,FP32 F1,INT8 answer,INT8 F1
0,Which company won a contest to have their ad shown for free during Super Bowl 50?,"[Death Wish Coffee, Death Wish Coffee, Death Wish Coffee]",QuickBooks,0.0,Death Wish Coffee,100.0
1,What company paid for a Super Bowl 50 ad to show a trailer of X-Men: Apocalypse?,"[Fox, Fox, Disney]","20th Century Fox, Lionsgate",40.0,"Paramount Pictures, Universal Studios and Walt Disney Studios",22.22
2,Who handled the play-by-play for WBT?,"[Mick Mixon, Mick Mixon, Mick Mixon]",Mick Mixon,100.0,Dave Logan,0.0
3,How many of the prior Super Bowl MVPs appeared together at the pregame show?,"[39, 39, 39]",39,100.0,39 of the 43,50.0
4,At what Super Bowl did Beyoncé headline the halftime show?,"[Super Bowl XLVII, Super Bowl XLVII, XLVII]",Super Bowl XLVII,100.0,Super Bowl XLVII halftime show,75.0
5,What previous Super Bowl halftime show did Bruno Mars headline?,"[Super Bowl XLVIII, Super Bowl XLVIII, XLVIII]",Super Bowl XLVIII,100.0,Super Bowl XLVIII halftime show,75.0
6,Who was at the receiving end of a 22-yard pass from Peyton Manning?,"[Andre Caldwell, Andre Caldwell, Caldwell]",Andre Caldwell,100.0,Owen Daniels,0.0
7,How many yards was the pass on the first drive?,"[18, 18, 22-yard]",18,100.0,20,0.0
8,What year was the last time a fumble return touchdown like this occurred?,"[1993, 1993, 1993]",1993,100.0,1993 season.,66.67
9,How many passing yards did Cam Newton get for his 4 of 4 passes?,"[51, 51, 51]",51,100.0,51 yards,66.67


## Benchmark the Quantized and Original Model

Compare the inference speed of the quantized OpenVINO IR model with that of the original PyTorch model.

OpenVINO models can optionally be used with static shapes, which increases 

In [20]:
def benchmark(model, static_shapes=True):
    """ """
    transformers.logging.set_verbosity_error()

    kwargs = {}
    if static_shapes:
        if model.base_model_prefix == "openvino":
            model.reshape(1, 256)
            model.compile()
        kwargs = {"max_seq_len": 256, "padding": "max_length", "truncation": True}

    qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer, **kwargs)

    ds = datasets.load_dataset("squad", split="validation[:300]")
    latencies = []
    for i, item in enumerate(ds):
        start_time = time.perf_counter()
        results = qa_pipeline({"question": item["question"], "context": item["context"]})
        end_time = time.perf_counter()
        latencies.append(end_time - start_time)

    return np.median(latencies) * 1000


quantized_model = OVModelForQuestionAnswering.from_pretrained(int8_qat_model_path)
original_model = AutoModelForQuestionAnswering.from_pretrained(MODEL_ID)
# original_model = OVModelForQuestionAnswering.from_pretrained("bert-base-fp32")


original_latency = benchmark(original_model, static_shapes=True)
quantized_latency = benchmark(quantized_model, static_shapes=True)

print(f"Latency of original FP32 model: {original_latency:.2f} ms")
print(f"Latency of quantized model: {quantized_latency:.2f} ms")
print(f"Speedup: {(original_latency/quantized_latency):.2f}x")

loading configuration file models/csarron/bert-base-uncased-squad-v1_INT8_QAT/config.json
Model config BertConfig {
  "_name_or_path": "models/csarron/bert-base-uncased-squad-v1_INT8_QAT/config.json",
  "architectures": [
    "NNCFNetwork"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--csarron--bert-base-uncased-squad-v1/snapshots/a39235c4e278ec8b420b46ee11a1ed1a432a7

Latency of original FP32 model: 38.71 ms
Latency of quantized model: 17.01 ms
Speedup: 2.28x


In [21]:
original_latency = benchmark(original_model, static_shapes=False)
quantized_latency = benchmark(quantized_model, static_shapes=False)

print(f"Latency of original FP32 model: {original_latency:.2f} ms")
print(f"Latency of quantized model: {quantized_latency:.2f} ms")
print(f"Speedup: {(original_latency/quantized_latency):.2f}x")

Latency of original FP32 model: 27.69 ms
Latency of quantized model: 23.70 ms
Speedup: 1.17x


In [22]:
ov_config.compression["overflow_fix"]

'disable'

In [23]:
import nncf
from openvino.runtime import get_version
nncf.__version__ , get_version()

('2.3.0', '2022.3.0-8831-4f0b846d1a5')

In [24]:
%pip show nncf

Name: nncf
Version: 2.3.0.dev0+f1d8c26
Summary: Neural Networks Compression Framework
Home-page: https://github.com/openvinotoolkit/nncf
Author: Intel
Author-email: alexander.kozlov@intel.com
License: UNKNOWN
Location: /home/ubuntu/venvs/optimum_wiml_env/lib/python3.8/site-packages
Requires: addict, jsonschema, jstyleson, matplotlib, natsort, networkx, ninja, numpy, openvino-telemetry, pandas, pillow, pydot, pymoo, pyparsing, scikit-learn, scipy, texttable, tqdm, wheel
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [25]:
%pip show openvino

Name: openvino
Version: 2022.3.0.dev20221125
Summary: OpenVINO(TM) Runtime
Home-page: https://docs.openvino.ai/nightly/index.html
Author: Intel Corporation
Author-email: openvino_pushbot@intel.com
License: OSI Approved :: Apache Software License
Location: /home/ubuntu/venvs/optimum_wiml_env/lib/python3.8/site-packages
Requires: numpy
Required-by: 
Note: you may need to restart the kernel to use updated packages.
