In [88]:
# https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/huggingface-bert.ipynb

# Converting a Huggingface model to ONNX with tf2onnx

This is a simple example how to convert a [huggingface](https://huggingface.co/) model to ONNX using [tf2onnx](https://github.com/onnx/tensorflow-onnx).

We use the [TFBertForQuestionAnswering](https://huggingface.co/transformers/model_doc/bert.html#tfbertforquestionanswering) example from huggingface.

Other models will work similar. You'll find additional examples for other models in our unit tests [here](https://github.com/onnx/tensorflow-onnx/blob/master/tests/huggingface.py).

## Install dependencies

In [89]:
!pip install tensorflow transformers tf2onnx onnxruntime



## The keras code

In [90]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ""

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import onnxruntime as rt
import tensorflow as tf
import tf2onnx
from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf
from tokenizers import BertWordPieceTokenizer

In [91]:
! wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt


--2021-12-06 05:55:14--  https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.201.104
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.201.104|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 231508 (226K) [text/plain]
Saving to: ‘bert-base-uncased-vocab.txt.1’


2021-12-06 05:55:15 (4.07 MB/s) - ‘bert-base-uncased-vocab.txt.1’ saved [231508/231508]



In [92]:
from tokenizers import BertWordPieceTokenizer
max_length=512
tokenizer = BertWordPieceTokenizer('bert-base-uncased-vocab.txt', lowercase=False)
tokenizer.enable_truncation(max_length=max_length)
tokenizer.enable_padding(length=max_length)

model = TFBertForSequenceClassification.from_pretrained('bert-base-cased', num_labels=1)

def fast_tokenize(texts, chunk_size=512):
    all_ids, all_attentions, all_tokens = [], [], []
    for i in range(0, len(texts), chunk_size):
        text_chunk = texts[i : i + chunk_size]#.tolist()
        encs = tokenizer.encode_batch(text_chunk)
        all_ids.extend([enc.ids for enc in encs])
        all_attentions.extend([enc.attention_mask for enc in encs])
        all_tokens.extend([enc.type_ids for enc in encs])
    return (
        tf.convert_to_tensor(all_ids),
        tf.convert_to_tensor(all_attentions),
        tf.convert_to_tensor(all_tokens),
    )




All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [93]:
texts = ["""
"WHAT A GREAT PLACE FOR YOUR BEACH VACATION! \n\nMy 2 bedroom/2 bath condo is directly across from the Beach Club, the beach. The condo has been completely remodeled with new ceramic flooring and countertops in kitchen and baths, new carpet, new lighting fixtures and appliances. \n\nThe relaxing top floor view from the rear screened porch (where smoking is permitted) is of the marsh and tidal creek and is a great place to start your day with coffee. There is a large deck overhanging the canal for crabbing and fishing and it is a great place to have a glass of wine and grill a steak at the end of the day.\nYou are likely coming to Fripp for 3 miles of ocean front white sand beach across the street. There are numerous bike trails and 2 adult bicycles are furnished. A golf cart is furnished and this is the way guests prefer to see the island.. Fripp Island is a wildlife preserve. No where else will you find over 300 docile deer, allowing many photo opportunities. In addition there is various wildlife, birds and water foul and alligators. Walk the educational Audubon Trail and visit Hunting Island State Park across the bridge from Fripp. Hunting Island has many bike trails and a light house children enjoy climbing. Dont miss the opportunity to Visit Bay Street in historic Beaufort where great food, relaxing views and shopping are yours to enjoy.\n\n\nKeywords: Condominium"	
"""]


In [114]:
# %%timeit
tokenized_text = fast_tokenize(texts)
tf_results = model.predict(tokenized_text, verbose=1)
# tf_results = tf.nn.softmax(scores.logits, axis=1)#[:, 1]#.numpy()
tf_results



TFSequenceClassifierOutput([('logits', array([[0.48259255]], dtype=float32))])

## Convert to ONNX

In [96]:
# describe the inputs
input_spec = (
    tf.TensorSpec((None,  None), tf.int32, name="input_ids"),
    tf.TensorSpec((None,  None), tf.int32, name="token_type_ids"),
    tf.TensorSpec((None,  None), tf.int32, name="attention_mask")
)

# and convert
_, _ = tf2onnx.convert.from_keras(model, input_signature=input_spec, opset=13, output_path="bert.onnx")

In [97]:
!du -hs /content/bert.onnx

414M	/content/bert.onnx


## Test the ONNX model with onnxruntime

In [99]:
opt = rt.SessionOptions()
ort_session = rt.InferenceSession("bert.onnx")

In [100]:
# %%timeit
tokenized_text = fast_tokenize(texts)
input_ids=tokenized_text[0]
attention_masks = tokenized_text[1]
all_tokens = tokenized_text[2]


input_dict = {"input_ids" : input_ids.numpy() , "attention_mask" : attention_masks.numpy() , "token_type_ids": all_tokens.numpy() }
onnx_results = ort_session.run(None, input_dict)
onnx_results

[array([[0.48259258]], dtype=float32)]

## Make sure tensorflow and onnxruntime results are the same

In [113]:
# outputs are matching
np.testing.assert_allclose(tf_results['logits'], onnx_results[0], rtol=1e-5, atol=1e-5)