<a href="https://colab.research.google.com/github/rajatkrishna/google-summer-of-code/blob/main/notebooks/Export_XLM_RoBERTa_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

XLM RoBerta is a large multi-lingual language model that was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) and achieves state-of-the-arts results on multiple cross lingual benchmarks.

# Dependencies

 We need to install the HuggingFace transformers library and Optimum in order to download and save the model in the ONNX format. In addition to these, XLM-RoBERTa model also needs the `SentencePiece` library to be installed.

Note that these dependencies are only needed to export the model. We do not need them when importing the saved model in Spark NLP.

In [None]:
!pip install -q transformers[onnx]==4.27.4 optimum sentencepiece

# Model

In this example, we will use the [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) model from HuggingFace.

In [None]:
from optimum.onnxruntime import ORTModelForFeatureExtraction

MODEL_NAME = 'xlm-roberta-base'
EXPORT_PATH = f"onnx_models/{MODEL_NAME}"

ort_model = ORTModelForFeatureExtraction.from_pretrained(MODEL_NAME, export=True)

# Export in SavedModel format

Export the model and weights in the saved model format using the `save_pretrained` function offered by HuggingFace.

In [3]:
ort_model.save_pretrained(EXPORT_PATH)

The saved model can be found in the following folder.

In [4]:
!ls -lh onnx_models/{MODEL_NAME}

total 1.1G
-rw-r--r-- 1 root root  679 Aug 28 03:40 config.json
-rw-r--r-- 1 root root 1.1G Aug 28 03:40 model.onnx
-rw-r--r-- 1 root root 4.9M Aug 28 03:40 sentencepiece.bpe.model
-rw-r--r-- 1 root root  280 Aug 28 03:40 special_tokens_map.json
-rw-r--r-- 1 root root  413 Aug 28 03:40 tokenizer_config.json
-rw-r--r-- 1 root root  17M Aug 28 03:40 tokenizer.json


The resulting `model.onnx` can be imported and run in Spark NLP using Onnx Runtime from Spark NLP 5.0 or the OpenVINO Runtime. We will also need the `sentencepiece.bpe.model` file.

In [5]:
!mkdir {EXPORT_PATH}/assets
!cp {EXPORT_PATH}/sentencepiece.bpe.model {EXPORT_PATH}/assets

# OpenVINO™ Intermediate Format

To maximize the benefits of OpenVINO Tools, models of various frameworks can be converted into the OpenVINO Intermediate Representation (IR) format- a proprietary model format of OpenVINO. The resulting files can be loaded later using the OpenVINO Runtime. The saved files include
- **msaved_odel.xml**: A file in xml format that describes the model topology
- **saved_model.bin**: File containing the model weights and binary data

Model conversion API is represented by convert_model() method in openvino.tools.mo namespace. It is included as part of the OpenVINO Development Tools package- a set of utilities that enable users to easily prepare and optimize models for OpenVINO.

First, we install the OpenVINO Runtime and Development Tools packages via `pip`. Since the source model is ONNX-based, pass the `onnx` argument to automatically install and configure the necessary dependencies for working with Tensorflow 1.x and 2.x models.

In [None]:
!pip install -q openvino==2023.0.1 openvino-dev[onnx]==2023.0.1

With the dependencies set up, we first convert the ONNX model into the OpenVINO **ov.Model** format using `convert_model`. During conversion, we can optionally specify the `compress_to_fp16` parameter to compress the constants (for example, weights for matrix multiplications) to **FP16** data type. Half-precision floating point numbers (FP16) have a smaller range, and can result in better performance in cases where half-precision is enough.

In [7]:
from openvino.tools.mo import convert_model
from openvino.runtime import serialize

ONNX_MODEL_PATH="{}/model.onnx".format(EXPORT_PATH)
ov_model = convert_model(ONNX_MODEL_PATH, compress_to_fp16=True)

To export the converted model into the OpenVINO IR format, the Runtime API provides a `serialize` method that takes in the model in **ov.Model** format and the target path to the resulting model xml file. The accompanying binary file can be found in the same directory.

First, we create the directory to save the resulting model files to.

In [8]:
!mkdir {MODEL_NAME}_ir

Export the converted model

In [9]:
serialize(ov_model, xml_path="{}_ir/saved_model.xml".format(MODEL_NAME))

Let us take a look at the exported model files.

In [10]:
!ls -lhR {MODEL_NAME}_ir

xlm-roberta-base_ir:
total 530M
-rw-r--r-- 1 root root 530M Aug 28 03:42 saved_model.bin
-rw-r--r-- 1 root root 494K Aug 28 03:42 saved_model.xml


We can see that the converted model now occupies half the disk space as the original. Finally, cope the `sentencepiece.bpe.model` file into the assets directory.

In [13]:
!mkdir {MODEL_NAME}_ir/assets
!cp {EXPORT_PATH}/sentencepiece.bpe.model {MODEL_NAME}_ir/assets