# TensorRT benchmarking with InceptionV3
In this tutorial, we will use the TensorRT to perform benchmarking on InceptionV3 model.

This tutorial assumes that you running on [AWS Ubuntu DLAMI](https://aws.amazon.com/marketplace/pp/B07Y43P7X5). 

Following are steps:

1. Convert our Keras model to a Tensorflow model. 
1. Freeze the Tensorflow saved format model
1. Convert the above freezed-model to the TensorRT formats: FP32 and FP16 (for V100)
1. Benchmark with BZ=1, run the inference with BZ=1 for 1 min.


In [1]:
import keras
import os
import tensorflow as tf
import numpy as np
import keras as K
import shutil, sys   

Using TensorFlow backend.





In [2]:
print("Loading Model inception_v3... ")
model = keras.applications.inception_v3.InceptionV3(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
print("Model loaded successfully")

sess = keras.backend.get_session()
sess.run(tf.global_variables_initializer())

Loading Model inception_v3... 













Model loaded successfully


In [3]:
output_directory = "/home/ubuntu/models/inceptionv3/output"
print("Freezing the graph.")
keras.backend.set_learning_phase(0)

signature = tf.saved_model.signature_def_utils.predict_signature_def(
    inputs={'input': model.input}, outputs={'output': model.output})


if os.path.isdir(output_directory):
    print (output_directory, "exists already. Deleting the folder")
    shutil.rmtree(output_directory)

builder = tf.saved_model.builder.SavedModelBuilder(output_directory)
builder.add_meta_graph_and_variables(sess=sess,
                                     tags=[tf.saved_model.tag_constants.SERVING],
                                     signature_def_map={
                                         tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:signature
                                     })
builder.save()
print("TensorFlow protobuf version of model is saved in:", output_directory)

print("Model input name = ", model.input.op.name)
print("Model input shape = ", model.input.shape)
print("Model output name = ", model.output.op.name)
print("Model output shape = ", model.output.shape)

Freezing the graph.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
/home/ubuntu/models/inceptionv3/output exists already. Deleting the folder
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: /home/ubuntu/models/inceptionv3/output/saved_model.pb
TensorFlow protobuf version of model is saved in: /home/ubuntu/models/inceptionv3/output
Model input name =  input_1
Model input shape =  (?, 299, 299, 3)
Model output name =  predictions/Softmax
Model output shape =  (?, 1000)


In [5]:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

print("Converting the graph to TensorRT.")
input_saved_model_dir = output_directory
output_saved_model_dir = "/home/ubuntu/models/inceptionv3/trt-output/"

#If directory exists, delete it and let builder rebuild the TF model.
if os.path.isdir(output_saved_model_dir):
    print (output_saved_model_dir, "exists already. Deleting the folder")
    shutil.rmtree(output_saved_model_dir)
    
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    precision_mode="FP32",
    maximum_cached_engines=100)

_ = converter.convert()
_ = converter.save(output_saved_model_dir)


print("Done. Converting the graph to TensorRT.")

Converting the graph to TensorRT.
/home/ubuntu/models/inceptionv3/trt-output/ exists already. Deleting the folder
INFO:tensorflow:Linked TensorRT version: (0, 0, 0)
INFO:tensorflow:Loaded TensorRT version: (0, 0, 0)
INFO:tensorflow:Running against TensorRT version 0.0.0
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/output/variables/variables
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 378 variables.
INFO:tensorflow:Converted 378 variables to const ops.
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: /home/ubuntu/mo

In [6]:
# Copy the variables. As the frozen graph donot have any variables and it raises error while serving.
!cp -pra /home/ubuntu/models/inceptionv3/output/variables/ /home/ubuntu/models/inceptionv3/trt-output/  

In [7]:
# Benchmark on one sample
import time
output_saved_model_dir = "/home/ubuntu/models/inceptionv3/trt-output/"
output_tensor =  'predictions/Softmax:0'
input_tensor = 'input_1:0'
input_data = np.random.randint(0, 255, size=(1, 299, 299, 3))

with tf.Session() as sess:
    # First load the SavedModel into the session
    tf.saved_model.loader.load(
        sess, [tf.saved_model.tag_constants.SERVING],
       output_saved_model_dir)
    start_time = time.time()
    output = sess.run([output_tensor], feed_dict={input_tensor: input_data})
    delta = (time.time() - start_time)


print("\nModel: {}, Input shape: {} , Output shape: {} \nCompleted Inference with one sample in {:.3f} sec,"
      .format(output_saved_model_dir, input_data.shape, output[0].shape, delta))

INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/trt-output/variables/variables

Model: /home/ubuntu/models/inceptionv3/trt-output/, Input shape: (1, 299, 299, 3) , Output shape: (1, 1000) 
Completed Inference with one sample in 4.003 sec,


In [None]:
## Benchmark
def benchmark_1min(output_saved_model_dir):
    output_tensor =  'predictions/Softmax:0'
    input_tensor = 'input_1:0'
    input_data = np.random.randint(0, 255, size=(1, 299, 299, 3))

    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    tf_sess = tf.Session(config=tf_config)

    # First load the SavedModel into the session
    tf.saved_model.loader.load(
            tf_sess, [tf.saved_model.tag_constants.SERVING],
           output_saved_model_dir)

    tf_sess.run(tf.global_variables_initializer())

    import time
    times = []
    # Run inference for 1 min.
    end = time.time() + 60
    print(time.strftime("%H:%M:%S"))
    print("Running inference for 1 min, with BZ=1")
    print("Model: {}, Input shape: {}  "
          .format(output_saved_model_dir, input_data.shape))
    while time.time() < end:
        start_time = time.time()
        output = tf_sess.run([output_tensor], feed_dict={input_tensor: input_data})
        delta = (time.time() - start_time)
        times.append(delta)

    mean_delta = np.array(times).mean()
    fps = 1 / mean_delta
    print('Output Shape: {}, \naverage(sec):{:.3f} , average(msec):{:.2f} , fps:{:.2f}'
          .format(output[0].shape, mean_delta, mean_delta*1000, fps))

In [16]:
benchmark_1min("/home/ubuntu/models/inceptionv3/trt-output/")

INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/trt-output/variables/variables
23:37:51
Running inference for 1 min, with BZ=1
Model: /home/ubuntu/models/inceptionv3/trt-output/, Input shape: (1, 299, 299, 3) , Output shape: (1, 1000) 
average(sec):0.027 , average(msec):26.62 , fps:37.57


## Convert and benchmark FP16

In [17]:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt

print("Converting the graph to TensorRT.")
input_saved_model_dir = "/home/ubuntu/models/inceptionv3/output"
output_saved_model_dir = "/home/ubuntu/models/inceptionv3/trt-output-fp16/"

#If directory exists, delete it and let builder rebuild the TF model.
if os.path.isdir(output_saved_model_dir):
    print (output_saved_model_dir, "exists already. Deleting the folder")
    shutil.rmtree(output_saved_model_dir)
    
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    precision_mode="FP16",
    maximum_cached_engines=100)

_ = converter.convert()
_ = converter.save(output_saved_model_dir)


print("Done. Converting the graph to TensorRT-FP16.")

Converting the graph to TensorRT.
/home/ubuntu/models/inceptionv3/trt-output-fp16/ exists already. Deleting the folder
INFO:tensorflow:Linked TensorRT version: (0, 0, 0)
INFO:tensorflow:Loaded TensorRT version: (0, 0, 0)
INFO:tensorflow:Running against TensorRT version 0.0.0
INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/output/variables/variables
INFO:tensorflow:Froze 378 variables.
INFO:tensorflow:Converted 378 variables to const ops.
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: /home/ubuntu/models/inceptionv3/trt-output-fp16/saved_model.pb
Done. Converting the graph to TensorRT-FP16.


In [18]:
# Copy the variables. As the frozen graph donot have any variables and it raises error while serving.
!cp -pra /home/ubuntu/models/inceptionv3/output/variables/ /home/ubuntu/models/inceptionv3/trt-output-fp16/  

In [19]:
# Benchmark on one sample
import time
output_saved_model_dir = "/home/ubuntu/models/inceptionv3/trt-output-fp16/"
output_tensor =  'predictions/Softmax:0'
input_tensor = 'input_1:0'
input_data = np.random.randint(0, 255, size=(1, 299, 299, 3))

with tf.Session() as sess:
    # First load the SavedModel into the session
    tf.saved_model.loader.load(
        sess, [tf.saved_model.tag_constants.SERVING],
       output_saved_model_dir)
    start_time = time.time()
    output = sess.run([output_tensor], feed_dict={input_tensor: input_data})
    delta = (time.time() - start_time)


print("\nModel: {}, Input shape: {} , Output shape: {} \nCompleted Inference with one sample in {:.3f} sec,"
      .format(output_saved_model_dir, input_data.shape, output[0].shape, delta))

INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/trt-output-fp16/variables/variables

Model: /home/ubuntu/models/inceptionv3/trt-output-fp16/, Input shape: (1, 299, 299, 3) , Output shape: (1, 1000) 
Completed Inference with one sample in 2.768 sec,


In [20]:
## Benchmark
benchmark_1min("/home/ubuntu/models/inceptionv3/trt-output-fp16/")

INFO:tensorflow:Restoring parameters from /home/ubuntu/models/inceptionv3/trt-output-fp16/variables/variables
23:40:47
Running inference for 1 min, with BZ=1
Model: /home/ubuntu/models/inceptionv3/trt-output-fp16/, Input shape: (1, 299, 299, 3) , Output shape: (1, 1000) 
average(sec):0.027 , average(msec):26.77 , fps:37.35


In [1]:
!nvidia-smi

Fri Mar 27 00:11:47 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla M60           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8    23W / 150W |      0MiB /  7618MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [3]:
!lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2008.228
CPU max MHz:         3000.0000
CPU min MHz:         1200.0000
BogoMIPS:            4600.08
Hypervisor vendor:   Xen
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_d