# INT8 acceleration of NLP models from HuggingFace transformers with NNCF for OpenVINO

This notebook runs through the process of enabling [NNCF](https://github.com/openvinotoolkit/nncf) in an NLP pipeline for training BERT-base on MRPC for the task of text classification.
> NOTE: _For this notebook to function, please make sure that your Python environment has `openvino`, `openvino-dev` and `nncf[torch]` packages installed.

In [None]:
import sys
python = sys.executable
%ls

Clone the original transformers repository at the 4.12.3 release tag and apply a patch to allow exporting the target model to ONNX (so that it can further be ingested by OpenVINO and baseline accuracy/performance numbers could be obtained):

In [None]:
!git clone https://github.com/huggingface/transformers --branch v4.12.3 --single-branch
%cd transformers
!pip install -e . torch==1.9.1
!pip install -r examples/pytorch/text-classification/requirements.txt
!patch -p1 < ../0001-Allow-ONNX-export-for-GLUE.patch

### Obtaining the uncompressed (FP32) performance and accuracy baselines


Run evaluation of the baseline FP32 pre-trained BERT-base-cased for MRPC in PyTorch and produce an ONNX for future OpenVINO ingestion:

In [None]:
!export CUDA_VISIBLE_DEVICES=0; $python examples/pytorch/text-classification/run_glue.py --model_name_or_path bert-base-cased-finetuned-mrpc --task_name mrpc --do_eval --max_seq_length 128 --per_device_eval_batch_size 1 --output_dir bert_mrpc_fp32 --to_onnx bert_mrpc_fp32.onnx

Evaluate the FP32 model on OpenVINO (accuracy and performance); first, convert the ONNX file to the intermediate representation (IR) using the [Model Optimizer](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)

In [None]:
!mo --input_model bert_mrpc_fp32.onnx

Download the MRPC dev split in explicit `dev.tsv` form so that it could be supplied to Accuracy Checker (see below) and the OpenVINO accuracy measurement is done on the same subset of data that was used for validation in PyTorch:

In [None]:
!$python ../download_mrpc_dev_tsv.py

Measure the accuracy (mIoU metric) on the target dataset using the Accuracy Checker tool (part of the `openvino-dev` package with the prepared .yml specification of the dataset.

In [None]:
!accuracy_check -c ../bert_mrpc_fp32.yml

For measuring performance, we use the [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html) - OpenVINO's inference performance measurement tool.

In [None]:
!benchmark_app -m bert_mrpc_fp32.xml

### Integrating NNCF for INT8 quantization

The line below apples the patch to allow for producing NNCF-compressed INT8 models. Several modifications (excluding import statements) and a simple .json config is enough for this integration - note, however, that the integration presented here is limited and covers only the INT8 quantization with NNCF for MRPC specifically. For a more complete patch offering broader scope of algorithms, models and quality-of-life improvements, refer to the complete integration patch at https://github.com/openvinotoolkit/nncf/tree/develop/third_party_integration/huggingface_transformers

In [None]:
!patch -p1 < ../0002-Use-NNCF.patch

Perform compression-aware fine-tuning using NNCF, starting from the pre-trained bert-base-cased-finetuned-mrpc which was evaluated above, for 5 epochs, exporting the resulting model into an INT8 ONNX file (bert_mrpc_int8.onnx). The training takes about 10 minutes on a single NVIDIA RTX 2080 Ti GPU.

In [None]:
!export CUDA_VISIBLE_DEVICES=0; $python examples/pytorch/text-classification/run_glue.py --model_name_or_path bert-base-cased-finetuned-mrpc --task_name mrpc --do_train --do_eval --num_train_epochs 5.0 --per_device_eval_batch_size 1 --output_dir bert_mrpc_int8 --overwrite_output_dir --evaluation_strategy epoch --save_strategy epoch --nncf_config nncf_bert_config_mrpc.json --to_onnx bert_mrpc_int8.onnx

Convert the NNCF-INT8 ONNX file into the NNCF-INT8 IR for OpenVINO ingestion

In [None]:
!mo --input_model bert_mrpc_int8.onnx

Evaluate the NNCF-INT8 model in OpenVINO, accuracy and performance-wise:

In [None]:
!accuracy_check -c ../bert_mrpc_int8.yml

In [None]:
!benchmark_app -m bert_mrpc_int8.xml