# Run TF-TRT from Arachne

TensorFlow with TensorRT (TF-TRT) is a Tensorflow integration for optimizing Tensorflow models to execute them with TensorRT.

## Prepare a Model

First, we have to prepare a model to be used in this tutorial.
Here, we will use a ResNet-50 v2 model tuning for the `tf_flowers` dataset.

In [1]:

import tensorflow as tf
import tensorflow_datasets as tfds

# Initialize a model
model = tf.keras.applications.resnet_v2.ResNet50V2(weights=None, classes=5)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.summary()

# Load the tf_flowers dataset
train_dataset, val_dataset = tfds.load(
    "tf_flowers", split=["train[:90%]", "train[90%:]"], as_supervised=True
)

# Preprocess the datasets
def preprocess_dataset(is_training=True):
    def _pp(image, label):
        if is_training:
            image = tf.image.resize(image, (280, 280))
            image = tf.image.random_crop(image, (224, 224, 3))
            image = tf.image.random_flip_left_right(image)
        else:
            image = tf.image.resize(image, (224, 224))
        image = tf.keras.applications.imagenet_utils.preprocess_input(x=image, mode='tf')
        label = tf.one_hot(label, depth=5)
        return image, label
    return _pp


def prepare_dataset(dataset, is_training=True):
    dataset = dataset.map(preprocess_dataset(is_training), num_parallel_calls=tf.data.AUTOTUNE)
    return dataset.batch(16).prefetch(tf.data.AUTOTUNE)

train_dataset = prepare_dataset(train_dataset, True)
val_dataset = prepare_dataset(val_dataset, False)

# Training
model.fit(train_dataset, validation_data=val_dataset, epochs=5)

model.evaluate(val_dataset)

model.save("/tmp/saved_model")

  return f(*args, **kwds)


Model: "resnet50v2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
pool1_pad (ZeroPadding2D)       (None, 114, 114, 64) 0           conv1_conv[0][0]                 
_________________________________________________________________________________________

Dl Completed...:   0%|          | 0/5 [00:00<?, ? file/s]

[1mDataset tf_flowers downloaded and prepared to /home/developer/tensorflow_datasets/tf_flowers/3.0.1. Subsequent calls will reuse this data.[0m
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
INFO:tensorflow:Assets written to: /tmp/saved_model/assets


INFO:tensorflow:Assets written to: /tmp/saved_model/assets


## Run TF-TRT from Arachne

Now, let's optimize the model with TF-TRT by Arachne.
To use the TF-TRT, we have to specify `+tools=tftrt` to `arachne.driver.cli`.
Available options can be seen by adding `--help`.

In [1]:
%%bash

python -m arachne.driver.cli +tools=tftrt --help

cli is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

tools: onnx_simplifier, onnx_tf, openvino2tf, openvino_mo, tflite_converter, tftrt, torch2onnx, torch2trt, tvm
tvm_target: dgx-1, dgx-s, jetson-nano, jetson-xavier-nx, rasp4b64


== Config ==
Override anything in the config (foo.bar=value)

input: ???
input_spec: null
output: ???
tools:
  tftrt:
    max_workspace_size_bytes: 1073741824
    precision_mode: FP32
    minimum_segment_size: 3
    maximum_cached_engines: 1
    use_calibration: true
    allow_build_at_runtime: true
    representative_dataset: null


Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help




### Optimize with FP32 Precision

First, we will start with the simplest case.
You can optimize a TF model with FP32 precision by the following command.

In [13]:
%%bash

python -m arachne.driver.cli +tools=tftrt input=/tmp/saved_model output=/tmp/output.tar 

2022-03-23 06:06:14.579121: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-23 06:06:15.333341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30554 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
INFO:tensorflow:Linked TensorRT version: (8, 0, 1)
INFO:tensorflow:Loaded TensorRT version: (8, 0, 1)
2022-03-23 06:06:44.358628: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-03-23 06:06:44.359020: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2022-03-23 06:06

### Optimize with FP16 precision

To optimize with FP16 precision, specify `FP16` to the `tools.tftrt.precision_mode` option.

In [14]:
%%bash

python -m arachne.driver.cli +tools=tftrt input=/tmp/saved_model output=/tmp/output.tar tools.tftrt.precision_mode=FP16

2022-03-23 06:09:53.544406: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-23 06:09:54.248283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30554 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
INFO:tensorflow:Linked TensorRT version: (8, 0, 1)
INFO:tensorflow:Loaded TensorRT version: (8, 0, 1)
2022-03-23 06:10:20.648041: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-03-23 06:10:20.648272: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2022-03-23 06:10

### Optimize with INT8 Precision

To convert with INT8 precision, we need calibrate or estimate the range of all floating-point tensors in the model.
We provide an interface to feed the dataset to be used in the calibration.
First, we have to prepare a NPY file that contains a list of `np.ndarray` which is a dataset used for calibration.

In [15]:
import numpy as np
calib_dataset = []

for image, label in val_dataset.unbatch().batch(1).take(100):
    calib_dataset.append(image.numpy())
np.save("/tmp/calib_dataset.npy", calib_dataset)

Next, specify `INT8` to the `tools.tftrt.precision_mode` option and pass the NPY file to the `tools.tftrt.representative_dataset`.

In [16]:
%%bash

python -m arachne.driver.cli +tools=tftrt input=/tmp/saved_model output=/tmp/output.tar \
    tools.tftrt.precision_mode=INT8 tools.tftrt.representative_dataset=/tmp/calib_dataset.npy

2022-03-23 06:13:58.969113: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-23 06:13:59.689324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30554 MB memory:  -> device: 0, name: NVIDIA Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
INFO:tensorflow:Linked TensorRT version: (8, 0, 1)
INFO:tensorflow:Loaded TensorRT version: (8, 0, 1)
2022-03-23 06:14:28.266197: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-03-23 06:14:28.266419: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2022-03-23 06:14

## Run TF-TRT from Arachne Python Interface

The following code shows an example of using the tool from Arachne Python interface.

In [2]:
from arachne.data import Model
from arachne.utils.model_utils import get_model_spec, save_model
from arachne.tools.tftrt import TFTRT, TFTRTConfig

model_file_path = "/tmp/saved_model"
input = Model(path=model_file_path, spec=get_model_spec(model_file_path))

cfg = TFTRTConfig()

# cfg.precision_mode = "FP16"

output = TFTRT.run(input, cfg)

save_model(model=output, output_path="/tmp/output.tar")

INFO:tensorflow:Linked TensorRT version: (8, 0, 1)


INFO:tensorflow:Linked TensorRT version: (8, 0, 1)


INFO:tensorflow:Loaded TensorRT version: (8, 0, 1)


INFO:tensorflow:Loaded TensorRT version: (8, 0, 1)


INFO:tensorflow:Could not find TRTEngineOp_0_0 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.


INFO:tensorflow:Could not find TRTEngineOp_0_0 in TF-TRT cache. This can happen if build() is not called, which means TensorRT engines will be built and cached at runtime.


INFO:tensorflow:Assets written to: tftrt-0-saved_model/assets


INFO:tensorflow:Assets written to: tftrt-0-saved_model/assets
