![nebullvm nebuly AI accelerate inference optimize DeepLearning](https://user-images.githubusercontent.com/38586138/201391643-a80407e5-2c28-409c-90c9-327795cd27e8.png)

# Accelerate Tensorflow ResNet50 with Speedster

Hi and welcome 👋

In this notebook we will discover how in just a few steps you can speed up the response time of deep learning model inference using the Speedster app from the open-source library `nebullvm`.

We will
1. Install Speedster and the deep learning compilers used by the library.
2. Speed up a PyTorch ResNet50 without any loss of accuracy.
3. Achieve faster acceleration on the same model by applying more aggressive optimization techniques (e.g. pruning, quantization) under the constraint of sacrificing up to 2% accuracy.

Let's jump to the code.

In [None]:
%env CUDA_VISIBLE_DEVICES=0

### Installation

Install Speedster:

In [None]:
!pip install speedster

Install deep learning compilers:

In [None]:
!python -m nebullvm.installers.auto_installer  --backends tensorflow-full --compilers all

This is an optional step. Run it if you want to contribute to continuous improvement of `nebullvm` and share the performance achieved with it. You can find full details in the [docs](https://nebuly.gitbook.io/nebuly/nebullvm/how-nebullvm-works/fostering-continuous-improvement#sharing-feedback-to-improve-nebullvm).

In [None]:
json_feedback = {
    "allow_feedback_collection": True
}
import json
from pathlib import Path

(Path.home() / ".nebullvm").mkdir(exist_ok=True)
with open(Path.home() / ".nebullvm/collect.json", "w") as f:
  json.dump(json_feedback, f)

## Optimization example with Tensorflow

In the following example we will try to optimize a standard resnet50 loaded directly from keras.

Speedster can accelerate neural networks without loss of a user-defined precision metric, e.g. accuracy, or can achieve faster acceleration by applying more aggressive optimization techniques, such as pruning and quantization, that may have a negative impact on the selectic metric. The maximum threshold value for accuracy loss is determined by the metric_drop_ths parameter. Read more in the [docs](https://nebuly.gitbook.io/nebuly/nebullvm/get-started).

Let first test the optimization without accuracy loss (metric_drop_ths=0, default value), and then apply further accelerate it under the constrained of losing up to 2% of accuracy (metric = "accuracy", metric_drop_ths = 0.02).

### Scenario 1 - No accuracy drop

First we load the model and optimize it using the Speedster API:

In [None]:
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from speedster import optimize_model

# Load a resnet as example
model = ResNet50()

# Provide an input data for the model    
input_data = [((tf.random.normal([1, 224, 224, 3]),), tf.constant([0]))]

# Run Speedster optimization
optimized_model = optimize_model(
  model, input_data=input_data, optimization_time="unconstrained"
)

# Try the optimized model
x = tf.random.normal([1, 224, 224, 3])
res_original = model.predict(x)
res_optimized = optimized_model.predict(x)[0]

We can print the type of the optimized model to see which compiler was faster:

In [None]:
optimized_model

In our case, the optimized model type was TensorflowNvidiaInferenceLearner, so this means that Tensor RT was the faster compiler.

After the optimization step, we can compare the optimized model with the baseline one in order to verify that the output is the same and to measure the speed improvement

First of all, let's print the results

In [None]:
res_original

In [None]:
res_optimized

Then, let's compute the average latency of the baseline model:

In [None]:
import time

In [None]:
num_iters = 100

# Warmup
for i in range(10):
  model.predict(x)

start = time.time()
for i in range(num_iters):
  model.predict(x)
stop = time.time()

print("Average latency original model: {:.4f} seconds".format((stop - start) / num_iters))

Finally we compute the average latency for the optimized model:



In [None]:
# Warmup
for i in range(10):
  optimized_model.predict(x)

start = time.time()
for i in range(num_iters):
  optimized_model.predict(x)
stop = time.time()

print("Average latency optimized model: {:.4f} seconds".format((stop - start) / num_iters))

### Scenario 2 - Accuracy drop

In this scenario, we set a max threshold for the accuracy drop to 2%

In [None]:
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from speedster import optimize_model

# Load a resnet as example
model = ResNet50()

# Provide an input data for the model   
# Note that in this case we should provide the model at least 100 data samples
input_data = [((tf.random.normal([1, 224, 224, 3]),), tf.constant([0])) for i in range(100)]

# Run Speedster optimization
optimized_model = optimize_model(
  model, input_data=input_data, optimization_time="unconstrained", metric = "accuracy", metric_drop_ths = 0.02
)

# Try the optimized model
x = tf.random.normal([1, 224, 224, 3])
res_original = model.predict(x)
res_optimized = optimized_model.predict(x)[0]

Here we compute the average throughput for the baseline model:

In [None]:
num_iters = 100

# Warmup
for i in range(10):
  model.predict(x)

start = time.time()
for i in range(num_iters):
  model.predict(x)
stop = time.time()

print("Average latency original model: {:.4f} seconds".format((stop - start) / num_iters))

Here we compute the average throughput for the optimized model:

In [None]:
# Warmup
for i in range(10):
  optimized_model.predict(x)

start = time.time()
for i in range(num_iters):
  optimized_model.predict(x)
stop = time.time()

print("Average latency optimized model: {:.4f} seconds".format((stop - start) / num_iters))

<center> 
    <a href="https://discord.com/invite/RbeQMu886J" target="_blank" style="text-decoration: none;"> Join the community </a> |
    <a href="https://nebuly.gitbook.io/nebuly/welcome/questions-and-contributions" target="_blank" style="text-decoration: none;"> Contribute to the library </a>
</center>

<center> 
    <a href="https://github.com/nebuly-ai/nebullvm#how-it-works" target="_blank" style="text-decoration: none;"> How nebullvm works </a> •
    <a href="https://github.com/nebuly-ai/nebullvm#documentation" target="_blank" style="text-decoration: none;"> Documentation </a> •
    <a href="https://github.com/nebuly-ai/nebullvm#api-quick-view" target="_blank" style="text-decoration: none;"> API quick view </a> 
</center>