![nebullvm nebuly AI accelerate inference optimize DeepLearning](https://user-images.githubusercontent.com/38586138/201391643-a80407e5-2c28-409c-90c9-327795cd27e8.png)

# Accelerate PyTorch VisionTransformer with Speedster

Hi and welcome 👋

In this notebook we will discover how in just a few steps you can speed up the response time of deep learning model inference using Speedster app from the open-source library `nebullvm`.

We will
1. Install Speedster and the deep learning compilers used by the library.
2. Speed up a PyTorch ViT without any loss of accuracy.
3. Achieve faster acceleration on the same model by applying more aggressive optimization techniques (e.g. pruning, quantization) under the constraint of sacrificing up to 2% accuracy.

Let's jump to the code.

In [None]:
%env CUDA_VISIBLE_DEVICES=0

### Installation

In [None]:
!pip install speedster

Let's now import install the deep learning compilers used by Speedster that are not yet installed on the hardware.

The installation of the compilers may take a few minutes.

In [None]:
!python -m nebullvm.installers.auto_installer --frameworks torch --compilers all

## Optimization example with Pytorch

In the following example we will try to optimize a ViT model loaded directly from vit_pytorch library.

Speedster can accelerate neural networks without loss of a user-defined precision metric, e.g. accuracy, or can achieve faster acceleration by applying more aggressive optimization techniques, such as pruning and quantization, that may have a negative impact on the selectic metric. The maximum threshold value for accuracy loss is determined by the metric_drop_ths parameter. Read more in the [docs](https://docs.nebuly.com/modules/speedster/getting-started).

Let's first test the optimization without any loss in accuracy (metric_drop_ths=0, which is the default value), and then attempt to further accelerate it while constraining the loss of accuracy to a maximum of 2% (metric = 'accuracy', metric_drop_ths = 0.02).

### Scenario 1 - No accuracy drop

First we load the model and optimize it using the Speedster API:

In [None]:
import torch
from vit_pytorch import ViT
from speedster import optimize_model, save_model, load_model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load a ViT model
model = ViT(
    image_size = 256,
    patch_size = 32,
    num_classes = 1000,
    dim = 1024,
    depth = 6,
    heads = 16,
    mlp_dim = 2048,
    dropout = 0.1,
    emb_dropout = 0.1
).to(device)

# Provide an input data for the model    
input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0]))]

# Run Speedster optimization
optimized_model = optimize_model(
  model, input_data=input_data, optimization_time="unconstrained"
)

# Try the optimized model
x = torch.randn(1, 3, 256, 256).to(device)
model.to(device).eval()
res_optimized = optimized_model(x)
res_original = model(x)

We can print the type of the optimized model to see which compiler was faster:

In [None]:
optimized_model

In our case, the optimized model type was TorchScriptInferenceLearner, so this means that TorchScriptCompiler was the faster compiler.

After the optimization step, we can compare the optimized model with the baseline one in order to verify that the output is the same and to measure the speed improvement

First of all, let's print the results

In [None]:
res_original

In [None]:
res_optimized

Then, let's compare the performances:

In [3]:
from nebullvm.tools.benchmark import benchmark

In [None]:
# Set the model to eval mode and move it to the available device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.eval()
model.to(device)

Here we compute the average throughput for the baseline model:

In [None]:
benchmark(model, input_data)

Here we compute the average throughput for the optimized model:



In [None]:
benchmark(optimized_model, input_data)

## Scenario 2 - Accuracy drop

In this scenario, we set a max threshold for the accuracy drop to 2%

In [None]:
import torch
import torchvision.models as models
from speedster import optimize_model

# Load a ViT model
model = ViT(
    image_size = 256,
    patch_size = 32,
    num_classes = 1000,
    dim = 1024,
    depth = 6,
    heads = 16,
    mlp_dim = 2048,
    dropout = 0.1,
    emb_dropout = 0.1
).to(device)

# Provide 100 random input data for the model  
input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]

# Run Speedster optimization
optimized_model = optimize_model(
  model, input_data=input_data, optimization_time="unconstrained", metric="accuracy", metric_drop_ths=0.02
)

# Try the optimized model
x = torch.randn(1, 3, 256, 256).to(device)
res = optimized_model(x)

In [None]:
# Set the model to eval mode and move it to the available device

model.eval()
model.to(device)

Here we compute the average throughput for the baseline model:

In [None]:
benchmark(model, input_data)

Here we compute the average throughput for the optimized model:

In [None]:
benchmark(optimized_model, input_data)

## Save and reload the optimized model

We can easily save to disk the optimized model with the following line:

In [13]:
save_model(optimized_model, "model_save_path")

We can then load again the model:

In [14]:
optimized_model = load_model("model_save_path")

<center> 
    <a href="https://discord.com/invite/RbeQMu886J" target="_blank" style="text-decoration: none;"> Join the community </a> |
    <a href="https://nebuly.gitbook.io/nebuly/welcome/questions-and-contributions" target="_blank" style="text-decoration: none;"> Contribute to the library </a>
</center>

<center> 
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#key-concepts" target="_blank" style="text-decoration: none;"> How speedster works </a> •
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#documentation" target="_blank" style="text-decoration: none;"> Documentation </a> •
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#quick-start" target="_blank" style="text-decoration: none;"> Quick start </a> 
</center>