<a href="https://colab.research.google.com/github/rajatkrishna/google-summer-of-code/blob/main/notebooks/Export_RoBERTa_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

RoBERTa is a variation of the BERT model by modifying key hyperparameters, removing the next-sentence prediction pretraining objective and training with larger mini-batches and learning rates. RoBERTa has the same model architecture as BERT, but improves the model performance delivering state-of-the-art performance from better training, more powerful computing, or increased data.

# Dependencies

 We need to install the HuggingFace transformers library and Optimum in order to download and save the model in the ONNX format.

Note that these dependencies are only needed to export the model. We do not need them when importing the saved model in Spark NLP.

In [None]:
!pip install -q --upgrade transformers[onnx]==4.27.4 optimum

# Model

In this example, we will use the [roberta-base](https://huggingface.co/roberta-base) model pretrained using a masked language modeling (MLM) objective from HuggingFace.

In [None]:
from optimum.onnxruntime import ORTModelForFeatureExtraction

MODEL_NAME = "roberta-base"
EXPORT_PATH = f"onnx_models/{MODEL_NAME}"

ort_model = ORTModelForFeatureExtraction.from_pretrained(MODEL_NAME, export=True)

# Export

Export the model and weights in the ONNX format using the `save_pretrained` function offered by HuggingFace.

In [3]:
ort_model.save_pretrained(EXPORT_PATH)

The saved model can be found in the following folder.

In [4]:
!ls -lh onnx_models/{MODEL_NAME}

total 477M
-rw-r--r-- 1 root root  644 Aug 28 03:36 config.json
-rw-r--r-- 1 root root 446K Aug 28 03:36 merges.txt
-rw-r--r-- 1 root root 474M Aug 28 03:36 model.onnx
-rw-r--r-- 1 root root  280 Aug 28 03:36 special_tokens_map.json
-rw-r--r-- 1 root root  346 Aug 28 03:36 tokenizer_config.json
-rw-r--r-- 1 root root 2.1M Aug 28 03:36 tokenizer.json
-rw-r--r-- 1 root root 780K Aug 28 03:36 vocab.json


The `model.onnx` file represents the exported model. Convert `vocab.json` to `vocab.txt` and copy this file and `merges.txt` file from the tokenizer to the `assets` directory in the saved model directory. These are assets needed for tokenization inside Spark NLP.

In [5]:
!mkdir {EXPORT_PATH}/assets

vocabs = ort_model.preprocessors[0].get_vocab()
vocabs = sorted(vocabs, key=vocabs.get)

with open(f'{EXPORT_PATH}/vocab.txt', 'w') as f:
    for item in vocabs:
        f.write("%s\n" % item)

!cp {EXPORT_PATH}/vocab.txt {EXPORT_PATH}/assets
!cp {EXPORT_PATH}/merges.txt {EXPORT_PATH}/assets

To import this model in Spark NLP, all you need is the `onnx_models/{MODEL_NAME}` directory.

## OpenVINO™ Intermediate Format

To maximize the benefits of OpenVINO Tools, models of various frameworks can be converted into the OpenVINO Intermediate Representation (IR) format- a proprietary model format of OpenVINO. The resulting files can be loaded later using the OpenVINO Runtime. The saved files include
- **saved_model.xml**: A file in xml format that describes the model topology
- **saved_model.bin**: File containing the model weights and binary data

Model conversion API is represented by convert_model() method in openvino.tools.mo namespace. It is included as part of the OpenVINO Development Tools package- a set of utilities that enable users to easily prepare and optimize models for OpenVINO.

First, we install the OpenVINO Runtime and Development Tools packages via `pip`. Since the source model is ONNX-based, pass the `onnx` argument to automatically install and configure the necessary dependencies for working with ONNX models.

In [None]:
!pip install -q openvino==2023.0.1 openvino-dev[onnx]==2023.0.1

With the dependencies set up, we first convert the ONNX model into the OpenVINO **ov.Model** format using `convert_model`. During conversion, we can optionally specify the `compress_to_fp16` parameter to compress the constants (for example, weights for matrix multiplications) to **FP16** data type. Half-precision floating point numbers (FP16) have a smaller range, and can result in better performance in cases where half-precision is enough.

In [7]:
from openvino.tools.mo import convert_model
from openvino.runtime import serialize

ONNX_MODEL_PATH="{}/model.onnx".format(EXPORT_PATH)
ov_model = convert_model(ONNX_MODEL_PATH, compress_to_fp16=True)

To export the converted model into the OpenVINO IR format, the Runtime API provides a `serialize` method that takes in the model in **ov.Model** format and the target path to the resulting model xml file. The accompanying binary file can be found in the same directory.

First, we create the directory to save the resulting model files to.

In [8]:
!mkdir {MODEL_NAME}_ir

Export the converted model

In [9]:
serialize(ov_model, xml_path="{}_ir/saved_model.xml".format(MODEL_NAME))

Let us take a look at the exported model files.

In [10]:
!ls -lhR {MODEL_NAME}_ir

roberta-base_ir:
total 238M
-rw-r--r-- 1 root root 237M Aug 28 03:38 saved_model.bin
-rw-r--r-- 1 root root 494K Aug 28 03:38 saved_model.xml


We can see that the converted model now occupies half the disk space as the original. In addition to these, we need the `vocab.txt` and `merges.txt` file from the tokenizer to the `assets` directory in the saved model directory. These are assets needed for tokenization inside Spark NLP.

In [11]:
!mkdir {MODEL_NAME}_ir/assets
!cp {EXPORT_PATH}/vocab.txt {MODEL_NAME}_ir/assets
!cp {EXPORT_PATH}/merges.txt {MODEL_NAME}_ir/assets