<a href="https://colab.research.google.com/github/rajatkrishna/google-summer-of-code/blob/main/notebooks/Export_BERT_HuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art open-source deep learning model widely used for a range of NLP tasks including Question Answering, Named Entity Recognition and Sequence Prediction. As opposed to conventional language models that read text sequentially- either left-to-right or right-to-left, BERT is based on the Transformer architecture, which enables bidirectional capabilities.

BERT has been pretrained with two objectives:
- Masked Language Modeling (MLM) in which a fraction of the words in the input sentences is masked and the model has to predict the masked words.
- Next Sentence Prediction (NSP) in which the model learns to predict whether two input sentences are following each other or not.

This way, the model learns a meaningful inner representation of the English language that can then be used to produce embeddings. One of the main advantages of BERT is that the vector of a word changes depending on how it is used in a sentence. For instance, BERT produces different embeddings for the word "bank" in the following sentences based on its surrounding words:

```
John works at the bank.
```
```
Robin had to bank on her friend for support.
```

In this notebook, we will export a BERT model from [HuggingFace](https://huggingface.co/models) in the Tensorflow SavedModel format using Huggingface's `transformers` library.

# Dependencies

We need to install the HuggingFace `transformers` library and Tensorflow in order to download and save the model.

Note that these dependencies are only needed to export the model. We do not need them when importing the saved model in Spark NLP.

In [None]:
!pip install -q transformers[tf-cpu]==4.31.0

# TFBert Model

Download the BERT model and tokenizer from HuggingFace. The model can be selected from the available BERT models in the `Fill Mask` category. Models trained or fine-tuned on a specific task such as Token Classification cannot be used. In this example, we will use the [bert-base-cased](https://huggingface.co/bert-base-cased) model from HuggingFace.

In [None]:
from transformers import TFBertModel, BertTokenizer
import tensorflow as tf

MODEL_NAME = 'bert-base-cased'

tokenizer = BertTokenizer.from_pretrained(MODEL_NAME).save_pretrained('./{}_tokenizer/'.format(MODEL_NAME))
model = TFBertModel.from_pretrained(MODEL_NAME)

Define the model signature before exporting.

In [3]:
@tf.function(
  input_signature=[
      {
          "input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"),
          "attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"),
          "token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"),
      }
  ]
)
def serving_fn(input):
    return model(input)

# Export in SavedModel format

Export the model and weights in the saved model format using the `save_pretrained` function offered by HuggingFace.

In [4]:
model.save_pretrained(save_directory="{}".format(MODEL_NAME), saved_model=True, signatures={"serving_default": serving_fn})

The saved model can be found in the following folder.

In [5]:
!ls -lhR {MODEL_NAME}/saved_model/1

bert-base-cased/saved_model/1:
total 6.9M
drwxr-xr-x 2 root root 4.0K Aug 28 03:28 assets
-rw-r--r-- 1 root root   56 Aug 28 03:28 fingerprint.pb
-rw-r--r-- 1 root root 162K Aug 28 03:28 keras_metadata.pb
-rw-r--r-- 1 root root 6.7M Aug 28 03:28 saved_model.pb
drwxr-xr-x 2 root root 4.0K Aug 28 03:28 variables

bert-base-cased/saved_model/1/assets:
total 0

bert-base-cased/saved_model/1/variables:
total 414M
-rw-r--r-- 1 root root 414M Aug 28 03:28 variables.data-00000-of-00001
-rw-r--r-- 1 root root  12K Aug 28 03:28 variables.index


Copy the `vocabs.txt` file from the tokenizer to the `assets` directory in the saved model directory. These are assets needed for tokenization inside Spark NLP.

In [6]:
!cp {MODEL_NAME}_tokenizer/vocab.txt {MODEL_NAME}/saved_model/1/assets/

To import this model in Spark NLP, all you need is the `{MODEL_NAME}/saved_model/1` directory.

# OpenVINO™ Intermediate Format

To maximize the benefits of OpenVINO Tools, models of various frameworks can be converted into the OpenVINO Intermediate Representation (IR) format- a proprietary model format of OpenVINO. The resulting files can be loaded later using the OpenVINO Runtime. The saved files include
- **saved_model.xml**: A file in xml format that describes the model topology
- **saved_model.bin**: File containing the model weights and binary data

Model conversion API is represented by convert_model() method in openvino.tools.mo namespace. It is included as part of the OpenVINO Development Tools package- a set of utilities that enable users to easily prepare and optimize models for OpenVINO.

First, we install the OpenVINO Runtime and Development Tools packages via `pip`. Since the source model is Tensorflow-based, pass the `tensorflow2` argument to automatically install and configure the necessary dependencies for working with Tensorflow 1.x and 2.x models.

In [None]:
!pip install -q openvino==2023.0.1 openvino-dev[tensorflow2]==2023.0.1

With the dependencies set up, we first convert the Tensorflow model into the OpenVINO **ov.Model** format using `convert_model`. During conversion, we can optionally specify the `compress_to_fp16` parameter to compress the constants (for example, weights for matrix multiplications) to **FP16** data type. Half-precision floating point numbers (FP16) have a smaller range, and can result in better performance in cases where half-precision is enough.

In [8]:
from openvino.tools.mo import convert_model
from openvino.runtime import serialize

TF_MODEL_PATH="{}/saved_model/1".format(MODEL_NAME)
ov_model = convert_model(TF_MODEL_PATH, compress_to_fp16=True)

To export the converted model into the OpenVINO IR format, the Runtime API provides a `serialize` method that takes in the model in **ov.Model** format and the target path to the resulting model xml file. The accompanying binary file can be found in the same directory.

First, we create the directory to save the resulting model files to.

In [9]:
!mkdir {MODEL_NAME}_ir

Export the converted model

In [10]:
serialize(ov_model, xml_path="{}_ir/saved_model.xml".format(MODEL_NAME))

Let us take a look at the exported model files.

In [11]:
!ls -lhR {MODEL_NAME}_ir

bert-base-cased_ir:
total 208M
-rw-r--r-- 1 root root 207M Aug 28 03:32 saved_model.bin
-rw-r--r-- 1 root root 875K Aug 28 03:32 saved_model.xml


We can see that the converted model now occupies half the disk space as the original. In addition to these files, we also need the vocabulary file `vocab.txt` in the `assets` directory for tokenization inside Spark NLP.

In [12]:
!mkdir {MODEL_NAME}_ir/assets
!cp {MODEL_NAME}_tokenizer/vocab.txt {MODEL_NAME}_ir/assets/

The resulting {MODEL_NAME}_ir directory is ready to be imported into Spark NLP.