# Optimize TensorFlow pre-trained model for inference
To get a good performance on your pre-trained model for inference, some inferece optimizations are required.   
This tutorial will guide you how to optimize a pre-trained model for a better inference performance, and also 
analyze the model pb files before and after the inference optimizations.  
Those optimizations includes:  
* Converting variables to constants.
* Removing training-only operations like checkpoint saving.
* Stripping out parts of the graph that are never reached.
* Removing debug operations like CheckNumerics.
* Folding batch normalization ops into the pre-calculated weights.
* Fusing common operations into unified versions.

## Prerequisites

#### Get a benchmark script
In this tutorial, we re-use a benchmark script from Intel LPOT project, so we need to download the script first.

In [None]:
!wget https://raw.githubusercontent.com/intel/lpot/master/examples/tensorflow/oob_models/tf_savemodel_benchmark.py -P scripts

#### Get a pre-trained model "ssd_inception_v2" from download.tensorflow.org
This pre-trained model is an pb file without infernece optimizations.

In [None]:
!wget http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz -P pre-trained-models ; tar zxvf pre-trained-models/ssd_inception_v2_coco_2018_01_28.tar.gz -C pre-trained-models

#### dump all related output into log.txt
This tutorial needs to analyze the log from benchmark script, so we dump all runtime log into log.txt file.

In [None]:
"""
In jupyter notebook simple logging to console and file:
"""
import logging
import sys

logging.basicConfig(
    level=logging.INFO, 
    format='[{%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler(filename='log.txt'),
        logging.StreamHandler(sys.stdout)
    ]
)

Patch benchmark script to enable logging for throughput

In [None]:
!cd scripts;patch < enable_log.patch;cd ..

#### Patch benchmark script to output the performance number into log.txt file.
log.txt captures all INFO loggings, so we output throughput as a INFO logging.

## Run benchmarks

### 1. Run original pre-trained model
Let us run the downloaded pre-trained model from previous session without any optimization by using the benchmark script.

In [None]:
run scripts/tf_savemodel_benchmark.py --model_path pre-trained-models/ssd_inception_v2_coco_2018_01_28/saved_model --num_iter 200 --num_warmup 10 --disable_optimize 

#### 1.1 Parse the logfile for performance number
We parse out the performance number from log.txt and save it for later performance comparison.

In [None]:
from scripts.profile_utils import PerfPresenter
perfp=PerfPresenter()
Thoughput_list = []
print("get throughput")
val = 'Throughput'
index = 4
line = perfp.read_throughput('log.txt', keyword=val, index=index)
if line != None:
    throughput=line
    print(throughput)
    Thoughput_list.append(float(throughput))
else:
    print("ERROR! can't find correct performance number from log. please check log for runtime issues")

### 2. Optimize the pre-trained model


#### 2.1 Optimize the model by using The Intel® Low Precision Optimization Tool (Intel® LPOT)

The Intel® Low Precision Optimization Tool (Intel® LPOT) is an open-source Python library that delivers a unified low-precision inference interface across multiple Intel-optimized Deep Learning (DL) frameworks on both CPUs and GPUs.
LPOT also provides graph optimizations for fp32 pre-trained models with more optimizations (such as common subexpression elimination) than the TensorFlow optimization tool [optimize_for_inference] (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/optimize_for_inference.py). Users could refer to [fp32 optimization](https://github.com/intel/lpot/blob/master/docs/graph_optimization.md#1-fp32-optimization) for more details.

In [None]:
from lpot.experimental import Graph_Optimization
graph_opt = Graph_Optimization()
graph_opt.model = 'pre-trained-models/ssd_inception_v2_coco_2018_01_28/saved_model'   # the path to saved_model dir
output = graph_opt()
output.save('pre-trained-models/ssd_inception_v2_coco_2018_01_28/optimized_model_lpot')


#### 2.2 Optimize the model by using TensorFlow Tools

We have a freeze_optimize_v2.py for inference optimization in this tutorial, and Intel is working on upstreaming this script to TensorFlow github.  
The input of this script is the directory of original saved model, and output of this script is the directory of optimzed model.  
Users don't need to change below command in this tutorial, but need to put related directories after "--input_saved_model_dir" and "--output_saved_model_dir" for other pre-trained models. 

In [None]:
run scripts/freeze_optimize_v2.py --input_saved_model_dir=pre-trained-models/ssd_inception_v2_coco_2018_01_28/saved_model --output_saved_model_dir=pre-trained-models/ssd_inception_v2_coco_2018_01_28/optimized_model_tools

### 3. Run optimized model
Let us run the optimized model by using the benchmark script.

First, pick one of the optimized models from previous section.

In [None]:
!ln -sf optimized_model_lpot pre-trained-models/ssd_inception_v2_coco_2018_01_28/optimized_model

In [None]:
run scripts/tf_savemodel_benchmark.py --model_path pre-trained-models/ssd_inception_v2_coco_2018_01_28/optimized_model --num_iter 200 --num_warmup 10 --disable_optimize 

#### 3.1 Parse the logfile for performance number
We parse out the performance number from log.txt and save it for later performance comparison.

In [None]:
from scripts.profile_utils import PerfPresenter
perfp=PerfPresenter()
print("get throughput")
val = 'Throughput'
index = 4
line = perfp.read_throughput('log.txt', keyword=val, index=index)
if line!=None:
    throughput=line
    print(throughput)
    Thoughput_list.append(float(throughput))
else:
    print("ERROR! can't find correct performance number from log. please check log for runtime issues")

### 4. Performance Comparison
We compare the performance difference between original saved model and optimized model, and show a speedup diagram accordingly.

In [None]:
import pandas as pd
print(Thoughput_list)
speedup = float(Thoughput_list[1])/float(Thoughput_list[0])
print("Speedup : ", speedup)
df = pd.DataFrame({'pretrained_model':['saved model', 'optimized model'], 'Speedup':[1, speedup]})
ax = df.plot.bar( x='pretrained_model', y='Speedup', rot=0)

## Analyze the pre-trained model PB files
In this tutorial,we use tf_pb_utils.py to parse a pb file.  


### ( Optional ) 1. Understand structures of a PB file
This section is optional for this tutorial because we fully understand the structure of those pb files.  
If you investigate into a new pb file, you might need to go through below section to understand the structure of pb file. 

The tf_pb_utils.py will parse op.type and op.name from a graph_def into a CSV file.  

op.name might contains a layer structure.  

Below is an example.  
ex : FeatureExtractor\InceptionV2\InceptionV2\Mixed_5c\Branch_2\Conv2d_0b_3x3\Conv2D  
The first layer is FeatureExtractor, and the second layer is InceptionV2. 
The last layer is Conv2D.  

Here is another example.  
ex : BoxPredictor_4\BoxEncodingPredictor\Conv2D  
Even the last layer is Conv2D, it has different first and second layer.  
Moreover, this Conv2D is not related to InceptionV2 layer, so we don't want to count this Conv2D as a inceptionV2 ops.  


Therefore, we still need the layers information to focus on the ops important to us.  

we parse op.type and op.name into a CSV file "out.csv", and below is a mapping table between CSV column and op.type & op.name. op.name[i] represnt the i layer of this op.name.   


|op_type|op_name|op1|op2|op3|op4|op5|op6|
|:-----|:----|:-----|:-----|:-----|:-----|:-----|:-----|
|op.type| op.name[-1] |op.name[0] | op.name[1] | op.name[2] |op.name[3] |op.name[4] |op.name[5] |  


Following two sub-sections will show you how to focus on op.type of a interested layer such as InceptionV2.


#### ( Optional ) Find the column which contain the interested layer such as InceptionV2
Below command will group rows by the values on the selected column from out.csv file.
Check which column contains the interested layer.  
Below is column 3 of ssd_inception_v2 case, it contains InceptionV2 as second row.

     == Dump column : 3 ==  
    op2  
    BatchMultiClassNonMaxSuppression    5307  
    InceptionV2                         1036  
    0                                    263  
    map                                   63  
    Decode                                63  
    ClassPredictor                        36  
    BoxEncodingPredictor                  36  
    Meshgrid_14                           34  
    Meshgrid_1                            34  
    Meshgrid_10                           34  
    dtype: int64  



Both indexs of column and row start from 0. 
Therefore, we could access second row by index 1.  
By using column index 3 and row index 1, we could access InceptionV2 related op.name.  


In [None]:
run scripts/tf_pb_utils.py pre-trained-models/ssd_inception_v2_coco_2018_01_28/saved_model/saved_model.pb

### 2. Analyze the original model PB file
From previous section, we know we can know InceptionV2 related ops by access column index 3 and row index 1.  
Therefore, we append "-c 3 -r 1" in the end of below command.

In [None]:
run scripts/tf_pb_utils.py pre-trained-models/ssd_inception_v2_coco_2018_01_28/saved_model/saved_model.pb -c 3 -r 1

#### 3. Analyze the optimized model PB file
From previous section, we know we can know InceptionV2 related ops by access column index 3 and row index 1.  
Therefore, we append "-c 3 -r 1" in the end of below command.  

>By comparing the diagrams, you could understand that FusedBatchNorm ops is replaced by BiasAdd ops after inference optimization, because inference optimizatoin folds batch normalization ops into the pre-calculated weights.  

In [None]:
run scripts/tf_pb_utils.py pre-trained-models/ssd_inception_v2_coco_2018_01_28/optimized_model/saved_model.pb -c 3 -r 1

In [None]:
print("[CODE_SAMPLE_COMPLETED_SUCCESFULLY]")