This notebook shows how to optimize the ResNetV2101 model we saw in `resnet-export.ipynb` notebook. We will use [ONNX](https://onnx.ai/) for this purpose. 

## Installations

In [1]:
!pip install onnxruntime==1.11.0 numpy==1.21.0 tf2onnx -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tfx-bsl 1.9.0 requires google-api-python-client<2,>=1.7.11, but you have google-api-python-client 2.52.0 which is incompatible.
tfx-bsl 1.9.0 requires pyarrow<6,>=1, but you have pyarrow 8.0.0 which is incompatible.
tensorflow-transform 1.9.0 requires pyarrow<6,>=1, but you have pyarrow 8.0.0 which is incompatible.
apache-beam 2.40.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.
apache-beam 2.40.0 requires pyarrow<8.0.0,>=0.15.1, but you have pyarrow 8.0.0 which is incompatible.[0m[31m
[0m

## Imports

In [1]:
import tensorflow as tf 
import tensorflow_hub as hub

import onnx
import timeit
import tf2onnx
import numpy as np
import onnxruntime as ort

## Constant

In [2]:
IMG_SIZE = 224

## Load the ResNetV2101 model

In [3]:
tfhub_model = tf.keras.Sequential(
    [hub.KerasLayer("https://tfhub.dev/google/imagenet/resnet_v2_101/classification/5")]
)

tfhub_model.build([None, IMG_SIZE, IMG_SIZE, 3])

## Convert to ONNX

In [4]:
input_signature = [
    tf.TensorSpec([None, IMG_SIZE, IMG_SIZE, 3], tf.float32)
]
onnx_model, _ = tf2onnx.convert.from_keras(tfhub_model, input_signature, opset=15)
onnx_model_path = "resnetv2101.onnx"
onnx.save(onnx_model, onnx_model_path)

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


## Ensure the ONNX and TF Hub model outputs match

In [5]:
dummy_inputs = tf.random.normal((1, IMG_SIZE, IMG_SIZE, 3))
dummy_inputs_numpy = dummy_inputs.numpy()

In [6]:
tf_outputs = tfhub_model(dummy_inputs, training=False)

sess = ort.InferenceSession(onnx_model_path)
ort_outputs = sess.run(None, {"args_0": dummy_inputs_numpy})

np.allclose(tf_outputs.numpy(), ort_outputs, rtol=1e-5, atol=1e-05)

True

## Benchmark latency of both the models

In [7]:
print("Benchmarking TF model...")
for _ in range(2):
    _ = tfhub_model(dummy_inputs, training=False)

# Timing
tf_runtimes = timeit.repeat(
    lambda: tfhub_model(dummy_inputs, training=False), number=1, repeat=25
)
print(f"Average latency (seconds): {np.mean(tf_runtimes)}.")

Benchmarking TF model...
Average latency (seconds): 0.206272908560004.


In [8]:
print("Benchmarking ONNX model...")
for _ in range(2):
    _ = sess.run(None, {"args_0": dummy_inputs_numpy})

# Timing
onnx_runtimes = timeit.repeat(
    lambda: sess.run(None, {"args_0": dummy_inputs_numpy}), number=1, repeat=25
)
print(f"Average latency (seconds): {np.mean(onnx_runtimes)}.")

Benchmarking ONNX model...
Average latency (seconds): 0.06843939660000614.
