## OpenVINO optimizations for Text classification task


In [29]:
# Install openvino-optimum if not installed already
! pip install openvino-optimum

<odict_iterator object at 0x7fc3181a24a0>
<odict_iterator object at 0x7fc3181a24a0>
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Import the packages needed for successful execution

In [30]:
from transformers import AutoConfig, AutoTokenizer
from optimum.intel.openvino import OVAutoModelWithLMHead

<odict_iterator object at 0x7fc3181a2540>
<odict_iterator object at 0x7fc3181a2540>


### Instructions on conversion to OpenVINO
We will use the OpenVINO™ Integration with Optimum module to convert the BERT-Base, Multilingual Uncased model to an OpenVINO model object. <br>

In [31]:
model_name = 'bert-base-multilingual-uncased'
config = AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
ov_model = OVAutoModelWithLMHead.from_pretrained(model_name, config=config, from_pt=True)
ov_model.save_pretrained('bert-base-multilingual_OV_IR')

<odict_iterator object at 0x7fc318249ea0>
<odict_iterator object at 0x7fc318249ea0>


Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model from PT file, inputs: None


Exception raised from index_select_out_cpu_ at ../aten/src/ATen/native/TensorAdvancedIndexing.cpp:758 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc46f94c302 in /home/dkarkada/miniconda3/envs/optimumtests/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: at::native::index_select_out_cpu_(at::Tensor const&, long, at::Tensor const&, at::Tensor&) + 0x2a9 (0x7fc45e2adb89 in /home/dkarkada/miniconda3/envs/optimumtests/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #2: at::native::index_select_cpu_(at::Tensor const&, long, at::Tensor const&) + 0x60 (0x7fc45e2b05e0 in /home/dkarkada/miniconda3/envs/optimumtests/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x19acf62 (0x7fc45ea25f62 in /home/dkarkada/miniconda3/envs/optimumtests/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::redispatch::index_select(c10::DispatchKeySet, at::Tensor const&, long, at::Tensor 

510 (0x561117179eb0 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #23: _PyEval_EvalCodeWithName + 0x260 (0x56111716e600 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #24: _PyFunction_Vectorcall + 0x534 (0x56111716fb64 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x4f83 (0x56111717d923 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0x260 (0x56111716e600 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #27: _PyFunction_Vectorcall + 0x594 (0x56111716fbc4 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1510 (0x561117179eb0 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0x260 (0x56111716e600 in /home/dkarkada/miniconda3/envs/optimumtests/bin/python)
frame #30: _PyFunction_Vectorcall + 0x534 (0x56111716fb64 in /home/dkarkada/miniconda3/env

### Evaluate the model by comparing to the results on the HF model card: https://huggingface.co/bert-base-multilingual-uncased

In [32]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model=ov_model, tokenizer=tokenizer)
unmasker("Hello I'm a [MASK] model.")

<odict_iterator object at 0x7fc3182a5a90>
<odict_iterator object at 0x7fc3182a5a90>
<odict_iterator object at 0x7fc318251630>
<odict_iterator object at 0x7fc318251630>
<odict_iterator object at 0x7fc318251630>
<odict_iterator object at 0x7fc318251630>
<odict_iterator object at 0x7fc3181a2090>
<odict_iterator object at 0x7fc3181a2090>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a2720>
<odict_iterator object at 0x7fc3181a21d0>
<odict_iterator object at 0x7fc3181a21d0>
<odict_iterator object at 0x7fc3181a21d0>
<odict_iterator object at 0x7fc3181a21d0>
<odict_iterator object at 0x7fc3181a21d0>
<odict_iterator object at 0x7fc318

[{'score': 0.1507755070924759,
  'token': 11397,
  'token_str': 'top',
  'sequence': "hello i'm a top model."},
 {'score': 0.1307542622089386,
  'token': 23589,
  'token_str': 'fashion',
  'sequence': "hello i'm a fashion model."},
 {'score': 0.03627277538180351,
  'token': 12050,
  'token_str': 'good',
  'sequence': "hello i'm a good model."},
 {'score': 0.035954684019088745,
  'token': 10246,
  'token_str': 'new',
  'sequence': "hello i'm a new model."},
 {'score': 0.028643080964684486,
  'token': 11838,
  'token_str': 'great',
  'sequence': "hello i'm a great model."}]

### Benchmark the converted model using the benchmark app
The OpenVINO toolkit provides a benchmarking application to gauge the platform specific runtime performance that can be obtained under optimal configuration parameters for a given model. For more details refer to: https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html

In [33]:
base_model_name = 'bert-base-multilingual_OV_IR/ov_model.xml'

# Set the sequence length for benchmarking
seq_len = 128

print('Benchmark OpenVINO model using the benchmark app')
! benchmark_app -m "$base_model_name" -d CPU -api async -t 10 -hint latency -shape [1,"$seq_len"]

Benchmark OpenVINO model using the benchmark app
<odict_iterator object at 0x7fc3182a59f0>huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible

<odict_iterator object at 0x7fc3182a59f0>
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.1
         Build................... 2022.1.0-7019-cdb9bec7210-releases/2022/1

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 629.16 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Reshaping model: 'input_ids': {1,256}, 'attention_mask': {1,256