# Tensorflow or Keras Model to TensorRT Using ONNX

This notebook show workflow of optimziing Tensorflow or Keras model with ONNX and TensorRT. Please refere to [this tutorial from Nvidia](https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/) for more information

The steps needed to optimzie Tensorflow/Keras model with ONNX and TensorRT:
1. Convert the TensorFlow/Keras model to a .pb file.
2. Convert the .pb file to the ONNX format.
3. Create a TensorRT engine. 
4. Run inference from the TensorRT engine.


## Step 1: Convert the TensorFlow/Keras model to a .pb file.
In this step will freeze the graph and save it as pb fromat
kears_to_pb()
take 3 arguments:
    model: The Keras model.
    output_filename: The output .pb file name.
    output_node_names: The output nodes of the network. If None, then 
    the function gets the last layer name as the output node.

In [5]:
%load_ext autoreload
%autoreload 2

from keras_to_pb_tf2  import keras_to_pb
from keras.models import load_model

#User defined values
#Input file path
MODEL_PATH = '/Users/riotu/Anas/tensorrt/tf2trt_with_onnx/facenet_keras_128.h5'
#output files paths
PB_FILE_PATH = '/Users/riotu/Anas/tensorrt/tf2trt_with_onnx/facenet_freezed.pb'
ONNX_FILE_PATH = '/home/jetson-nx/tf2trt_wtih_onnx/facenet_onnx.onnx'
TRT_ENGINE_PATH = '/home/jetson-nx/tf2trt_wtih_onnx/facenet_engine.plan'
#End user defined values



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
model = load_model(MODEL_PATH)
input_name, output_node_names = keras_to_pb(model, PB_FILE_PATH, None)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Froze 490 variables.
INFO:tensorflow:Converted 490 variables to const ops.


## Step 2: Convert the .pb file to the ONNX format.

Second step is to convert .pb file to ONNX fromat using `tf2onnx`. First install [tf2onnx](https://github.com/onnx/tensorflow-onnx).
To install `tf2onnx`use this command `pip install -U tf2onnx`

This may take more than 10 min to finish.  
If command crash try to run it in terminal after closing Jupyter notebook and all other applications.  

```
python -m tf2onnx.convert --input /home/jetson-tx2/code/onnx/models/facenet.pb --inputs input_1:0[1,160,160,3] --outputs Bottleneck_BatchNorm/batchnorm_1/add_1:0 --output facenet.onnx
```

In [None]:
pip install -U tf2onnx

In [6]:
!python -m tf2onnx.convert --input {PB_FILE_PATH} --inputs {input_name}:0[1,160,160,3] --outputs {output_node_names[0]}:0 --output {ONNX_FILE_PATH}

2020-10-07 11:09:37.047193: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2


2020-10-07 11:09:43.461665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-07 11:09:43.467704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-10-07 11:09:43.467927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-10-07 11:09:43.468006: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-10-07 11:09:43.474882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-07 11:09:43.485458: I tensorflow/stream_executor/platform/default/dso_

2020-10-07 11:09:59.514260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-10-07 11:09:59.514482: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-10-07 11:09:59.514808: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-10-07 11:09:59.517431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] ARM64 does not support NUMA - returning NUMA node zero
2020-10-07 11:09:59.517739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1634] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2020-10-07 11:09:59.518001: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-10-07 11:09:59.518093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynami

## Step 3: Create a TensorRT engine from ONNX

In [10]:
from onnx_to_trt import create_engine

create_engine(ONNX_PATH, TRT_ENGINE_PATH)

ONNX model laoded...
Creating engine from this onnx file,  /home/jetson-nx/tf2trt_wtih_onnx/facenet_onnx.onnx
TRT engine created and saved at,  /home/jetson-nx/tf2trt_wtih_onnx/facenet_engine.plan


## Step 4: Run inference from the TensorRT engine

The TensorRT engine runs inference in the following workflow: 

1. Allocate buffers for inputs and outputs in the GPU.
2. Copy data from the host to the allocated input buffers in the GPU.
3. Run inference in the GPU. 
4. Copy results from the GPU to the host. 
5. Reshape the results as necessary. 

Note: this is the code needed for inference. To test FacenetTRT with real image check script file `test_facenet_trt.py`


In [None]:
import inference as inf

TRT_LOGGER = trt.Logger(trt.Logger.INTERNAL_ERROR)
trt_runtime = trt.Runtime(TRT_LOGGER)

engine = eng.load_engine(trt_runtime, engine_path)
print('Engine loaded successfully...')

h_input, d_input, h_output, d_output, stream = inf.allocate_buffers(engine, 1, trt.float32)
out = inf.do_inference(engine, samples, h_input, d_input, h_output, d_output, stream, 1, 160, 160)

