## Creating a TF Lite model with RETVec

Please note that using RETVec with TF Lite requires `tensorflow_text>=2.13` and `tensorflow>=2.13`. You can upgrade your TensorFlow following the instructions [here](https://www.tensorflow.org/install/pip).

This notebook shows how to create, save, and run a TF Lite compatible model which uses the RETVec tokenizer.

In [1]:
# installing retvec if needed
try:
    import retvec
except ImportError:
    !pip install retvec

try:
    import tensorflow_text
except ImportError:
    !pip install tensorflow-text

2023-10-12 18:50:50.441625: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-12 18:50:50.504321: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'  # silence TF INFO messages
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers

# import the RETVec tokenizer layer
from retvec.tf import RETVecTokenizer

The only important change to make for RETVec is to set `use_tf_lite_compatible_ops=True`. This will make the layer use `tensorflow_text.utf8_binarize` and whitespace splitting to split text into words, which is supported natively by TF Lite.

Note that in this example we use a simple dense model to show conversion, since LSTM layers [require additional effort to convert to TF Lite](https://www.tensorflow.org/lite/models/convert/rnn).

In [3]:
# using strings directly requires to put a shape of (1,) and dtype tf.string
inputs = layers.Input(shape=(1, ), name="input", dtype=tf.string)

# add RETVec tokenizer layer with `use_tf_lite_compatible_ops`
x = RETVecTokenizer(model='retvec-v1', use_tf_lite_compatible_ops=True)(inputs)

# build the rest of the model as usual
x = layers.Dense(256, activation='relu')(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(4, activation='sigmoid', name="output")(x)
model = tf.keras.Model(inputs, outputs)

model.summary()

2023-10-12 18:50:56.016008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 56575 MB memory:  -> device: 0, name: NVIDIA H100 80GB HBM3, pci bus id: 0000:18:00.0, compute capability: 9.0
2023-10-12 18:50:56.018865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 78147 MB memory:  -> device: 1, name: NVIDIA H100 80GB HBM3, pci bus id: 0000:2a:00.0, compute capability: 9.0
2023-10-12 18:50:56.021108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 78147 MB memory:  -> device: 2, name: NVIDIA H100 80GB HBM3, pci bus id: 0000:3a:00.0, compute capability: 9.0
2023-10-12 18:50:56.024078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1636] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 78147 MB memory:  -> device: 3, name: NVIDIA H100 80GB HBM3, pci bu

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input (InputLayer)          [(None, 1)]               0         
                                                                 
 ret_vec_tokenizer (RETVecT  (None, 128, 256)          230144    
 okenizer)                                                       
                                                                 
 dense (Dense)               (None, 128, 256)          65792     
                                                                 
 dense_1 (Dense)             (None, 128, 64)           16448     
                                                                 
 output (Dense)              (None, 128, 4)            260       
                                                                 
Total params: 312644 (1.19 MB)
Trainable params: 82500 (322.27 KB)
Non-trainable params: 230144 (899.00 KB)
___________________

In [4]:
# save the model
save_path = "./demo_models/tf_lite_retvec"
model.save(save_path)

INFO:tensorflow:Assets written to: ./demo_models/tf_lite_retvec/assets


INFO:tensorflow:Assets written to: ./demo_models/tf_lite_retvec/assets


### Convert the model and run inference in TF Lite

We can now convert the model to a TF Lite model following the [instructions](https://www.tensorflow.org/lite/models/convert). For more information on how to use TensorFlow Lite, please see the [guide](https://www.tensorflow.org/lite/guide).

In [5]:
# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(save_path) # path to the SavedModel directory
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
]
converter.allow_custom_ops = True
tflite_model = converter.convert()

2023-10-12 18:51:02.343076: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:364] Ignored output_format.
2023-10-12 18:51:02.343096: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:367] Ignored drop_control_dependency.
2023-10-12 18:51:02.343590: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: ./demo_models/tf_lite_retvec
2023-10-12 18:51:02.347147: I tensorflow/cc/saved_model/reader.cc:91] Reading meta graph with tags { serve }
2023-10-12 18:51:02.347160: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: ./demo_models/tf_lite_retvec
2023-10-12 18:51:02.366949: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2023-10-12 18:51:02.369765: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2023-10-12 18:51:02.440370: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: 

In [6]:
from tensorflow.lite.python import interpreter
import tensorflow_text as tf_text

# create TF lite interpreter with TF Text ops registered
interp = interpreter.InterpreterWithCustomOps(
    model_content=tflite_model,
    custom_op_registerers=tf_text.tflite_registrar.SELECT_TFTEXT_OPS)
interp.allocate_tensors()

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


In [7]:
# run inference with our model
input_data = np.array(['This is an example text'])

tokenize = interp.get_signature_runner('serving_default')
output = tokenize(input=input_data)
print('TensorFlow Lite result = ', output['output'])

TensorFlow Lite result =  [[[0.46520743 0.5190651  0.3716683  0.43701836]
  [0.6337548  0.42784083 0.5022397  0.55659497]
  [0.53433377 0.53684425 0.42378557 0.4369351 ]
  [0.406101   0.5063563  0.41558668 0.31651068]
  [0.55106455 0.54234135 0.49299878 0.23038922]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  0.53117436 0.36593238 0.43153659]
  [0.4838743  