# Example of Running TensorRT by converting Frozen  Inference Model from TensorFlow

This notebook has purpose to present how to use TensorRT from frozen model. Mobilenet_V2_1.0_224 model was used in the conversion due to its small model size and use less of computation resouce, also with Mobilenet model can get results accuracy and does not take long of running time.

## Prerequisite
1. Install Deepstream and TensorRT as describe in the top level of this repository.
2. `$ pip install Pillow pycuda numpy`
3. docker running on your local machine

## Docker Guideline
After finished TensorRT installation, try to run docker with the following command

`$ docker run --gpus all -it -v path_to_local_host:path_to_docker -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash`


If docker cannot run with above command after you have restarted or shutdown computer, you can just simply type the following two commands.

`$ sudo systemctl daemon-reload`

`$ sudo systemctl restart docker`


Then, you can try to run docker again.

After docker launch, use the following command to convert the TensorFlow frozen inference model into TensorRT format which operates by TRT UFF parser.

`$convert-to-uff mobilenet_v1_1.0_224_frozen.pb -o mobilenet_v1_1.0_224.uff`

Import the neccessary libraries, especially trt which allow to convert the desired TensorFlow frozen graph into TensorRT compatibility.

In [54]:
#!/usr/bin/env python3

import argparse
import numpy as np
import tensorrt as trt
import time

from PIL import Image

import pycuda.driver as cuda
import pycuda.autoinit

Define batch size, this notebook is just for TensorRT testing, so it is not necessary to use batch size much. The precision mode is float 32 (FP32).

In [47]:
MAX_BATCH_SIZE = 1
MAX_WORKSPACE_SIZE = 1 << 30

LOGGER = trt.Logger(trt.Logger.WARNING)
DTYPE = trt.float32

Commond model configuration including define model name, input and output properties, label file, loop times, and top number of classification result shown.

In [48]:
MODEL_FILE = 'mobilenet_v1_1.0_224.uff'
INPUT_NAME = 'input'
INPUT_SHAPE = (3, 224, 224)
OUTPUT_NAME = 'MobilenetV1/Predictions/Reshape_1'

LABELS = 'class_labels.txt'

LOOP_TIMES = 10
TOP_N = 5

Pre-allocate buffers (allowcate device memory)

In [49]:
def allocate_buffers(engine):
    print('allocate buffers')
    
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(DTYPE))
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(DTYPE))
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    
    return h_input, d_input, h_output, d_output

Starting build network by register input and output properties into the network

In [50]:
def build_engine(model_file):
    print('build engine...')

    with trt.Builder(LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
        builder.max_workspace_size = MAX_WORKSPACE_SIZE
        builder.max_batch_size = MAX_BATCH_SIZE
        if DTYPE == trt.float16:
            builder.fp16_mode = True
        parser.register_input(INPUT_NAME, INPUT_SHAPE, trt.UffInputOrder.NCHW)
        parser.register_output(OUTPUT_NAME)
        parser.parse(model_file, network, DTYPE)
        
        return builder.build_cuda_engine(network)

Load input image and convert into image array

In [51]:
def load_input(img_path, host_buffer):
    print('load input')
    
    with Image.open(img_path) as img:
        c, h, w = INPUT_SHAPE
        dtype = trt.nptype(DTYPE)
        img_array = np.asarray(img.resize((w, h), Image.BILINEAR)).transpose([2, 0, 1]).astype(dtype).ravel()
        # preprocess for mobilenet
        img_array = img_array / 127.5 - 1.0
        
    np.copyto(host_buffer, img_array)


Start inference with test image and print the inference time to terminal (Perform device to host memory copy)

In [52]:
def do_inference(n, context, h_input, d_input, h_output, d_output):
    cuda.memcpy_htod(d_input, h_input)
    
    st = time.time()
    context.execute(batch_size=1, bindings=[int(d_input), int(d_output)])
    print('Inference time {}: {} [msec]'.format(n, (time.time() - st)*1000))

    cuda.memcpy_dtoh(h_output, d_output)
    
    return h_output

Main function to run the testing dataset, starting by fetch input image into above functions and print the top 5 classes of classification result.

In [53]:
def main():
    args = 'testing_imgs\testimg-07.jpg'

    with open(LABELS) as f:
        labels = f.read().split('\n')
        
    with build_engine(MODEL_FILE) as engine:
        h_input, d_input, h_output, d_output = allocate_buffers(engine)
        load_input(args, h_input)
        
        with engine.create_execution_context() as context:
            for i in range(LOOP_TIMES):
                output = do_inference(i, context, h_input, d_input, h_output, d_output)

    pred_idx = np.argsort(output)[::-1]
    pred_prob = np.sort(output)[::-1]

    print('\nClassification Result:')
    for i in range(TOP_N):
        print('{} {} {}'.format(i + 1, labels[pred_idx[i]], pred_prob[i]))

                
if __name__ == '__main__':
    main()

build engine...
allocate buffers
load input
Inference time 0: 0.6520748138427734 [msec]
Inference time 1: 0.579833984375 [msec]
Inference time 2: 0.5779266357421875 [msec]
Inference time 3: 0.5776882171630859 [msec]
Inference time 4: 0.5779266357421875 [msec]
Inference time 5: 0.5795955657958984 [msec]
Inference time 6: 0.5795955657958984 [msec]
Inference time 7: 0.5772113800048828 [msec]
Inference time 8: 0.5784034729003906 [msec]
Inference time 9: 0.5786418914794922 [msec]

Classification Result:
1 water buffalo 0.5712370276451111
2 oxygen mask 0.22553090751171112
3 bison 0.07660061866044998
4 wild boar 0.044601891189813614
5 thunder snake 0.018330469727516174


### Example of testing image
This image was taken by me at Nan province. Thai buffalo images might never have trained in pre-trained dataset, but the model prediction still got a acceptable result.


Classification Result:
1. water buffalo 0.5712370276451111
2. oxygen mask 0.22553090751171112
3. bison 0.07660061866044998
4. wild boar 0.044601891189813614
5. thunder snake 0.018330469727516174

<img src="testing_imgs/testimg-07.jpg" />