# Compare original and Compiled models

## First start by downloading them ...

In [None]:
%store -r model_optimized
%store -r model_original

In [None]:
!aws s3 cp {model_optimized} ./

In [None]:
!aws s3 cp {model_original} ./

In [None]:
!mkdir original & tar -xzvf model.tar.gz -C original

In [None]:
!mkdir compiled & tar -xzvf model-ml_m4.tar.gz -C compiled

## Local inference - original model

We will upgrade to TF 2.0 to demonstrate how you can use saved_models from older (in this case, 1.18.0) versions/

In [None]:
!pip install --upgrade pip

In [None]:
!conda uninstall wrapt -y
!pip install tensorflow==2.0.0
!pip install opencv-python

In [None]:
import tensorflow as tf
import cv2
print(tf.__version__)
tf.get_logger().setLevel('ERROR')
tf.executing_eagerly()

### Load model and serving signature

In [None]:
path = !find ./original/ -type f -name "*.pb"
path = path[0][:-14]
print(path)

In [None]:
loaded = tf.saved_model.load(path)

In [None]:
!saved_model_cli show --dir {path} --tag_set serve --signature_def serving_default

In [None]:
print(list(loaded.signatures.keys())) 

In [None]:
infer = loaded.signatures["serving_default"]

Load example image ...

In [None]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))
i = tf.image.convert_image_dtype(image.reshape(-1,32,32,3),tf.float32)

Check single inference ...

In [None]:
%%time
infer(i)['probabilities']

Get mean value

In [None]:
time_original = %timeit -n25 -r25 -o infer(i)['probabilities']

## Local inference - compiled model

DLR or Deep Learning Runtime is a part of Neo (https://github.com/neo-ai/neo-ai-dlr) is a compact, common runtime for deep learning models and decision tree models compiled by AWS SageMaker Neo, TVM, or Treelite. DLR uses the TVM runtime, Treelite runtime, NVIDIA TensorRT™, and can include other hardware-specific runtimes. DLR provides unified Python/C++ APIs for loading and running compiled models on various devices. DLR currently supports platforms from Intel, NVIDIA, and ARM, with support for Xilinx, Cadence, and Qualcomm coming soon.

In [None]:
!pip install dlr

In [None]:
from dlr import DLRModel
import numpy
input_shape = {'data': [1, 3, 224, 224]} # A single RGB 224x224 image
output_shape = [1, 1000]                 # The probability for each one of the 1,000 classes
device = 'cpu'                           # Go, Raspberry Pi, go!

model = DLRModel(model_path='compiled')

In [None]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))

input_data = {'Placeholder': numpy.asarray(image).astype(float).tolist()}

Check single inference ...

In [None]:
%%time
model.run(input_data)

Get mean value ...

In [None]:
time_compiled = %timeit -n25 -r25 -o model.run(input_data)

In [None]:
o1 = float(str(time_compiled)[:4])

In [None]:
o2 = float(str(time_original)[:4])

In [None]:
'{} vs {}ms ... {}x speedup!'.format(o2,o1,o2/o1)

# Thank you!