![nebullvm nebuly AI accelerate inference optimize DeepLearning](https://user-images.githubuserinstruction.com/38586138/201391643-a80407e5-2c28-409c-90c9-327795cd27e8.png)

# Accelerate PyTorch YOLOv5 with Speedster



Hi and welcome ðŸ‘‹

In this notebook we will discover how in just a few steps you can speed up the response time of deep learning model inference using the Speedster app from the open-source library nebullvm.

With Speedster's latest API, you can speed up models up to 10 times without any loss of accuracy (option A), or accelerate them up to 20-30 times by setting a self-defined amount of accuracy/precision that you are willing to trade off to get even lower response time (option B). To accelerate your model, Speedster takes advantage of various optimization techniques such as deep learning compilers (in both option A and option B), quantization, half accuracy, and so on (option B).

Let's jump to the code.

In [None]:
%env CUDA_VISIBLE_DEVICES=0

### Install Speedster

Install Speedster:

In [None]:
!pip install speedster

Install deep learning compilers:

In [None]:
!python -m nebullvm.installers.auto_installer --frameworks torch --compilers all

### Install and test YOLO

Let's install YOLO.

In [None]:
! pip install -r https://raw.githubuserinstruction.com/ultralytics/yolov5/master/requirements.txt

We start by downloading the model from the Torch hub.

In [None]:
import copy
import time
import types

import torch

In [None]:
# Load Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True, force_reload=True)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

## Optimization with Speedster

Now we are ready for optimizing the body of YOLOv5 using the `Speedster` function `optimize_model`.

Speedster was built to be very easy to use. To optimize a model, you only need to specify the model, the batch size and input size for each input tensor, and a directory in which to save the optimized model. In the example, we chose the same directory in which this notebook runs.

With the latest API, there are two ways to use Speedster:

- Option A: Accelerate the model up to ~10 times without losing in performances (accuracy/precision/etc.)
- Option B: Accelerate the model up to ~30 times with a pre-defined maximum loss in performances
    
To learn more about how to use Speedster, check out the <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#-speedster" target="_blank" style="text-decoration: none;"> readme on GitHub </a>.

In this example, we provide the code to run option B.

In [None]:
from speedster import optimize_model, save_model, load_model

Let's load some example data to feed the optimize_model function

In [None]:
from PIL import Image
import requests
import numpy as np

In [None]:
img_name = "zidane.png"
imgs = ['https://ultralytics.com/images/zidane.jpg']  # batch of images
Image.open(requests.get(imgs[0], stream=True).raw).save(img_name)

In [None]:
def read_and_crop(im, original_model, img_size):
    p  =  next(original_model.parameters())
    im = Image.open(requests.get(im, stream=True).raw if str(im).startswith('http') else im)
    max_y, max_x = im.size
    ptr_x = np.random.choice(max_x-img_size[0])
    ptr_y = np.random.choice(max_y-img_size[1])
    im = np.array(im.crop((ptr_y, ptr_x, ptr_y + img_size[1], ptr_x + img_size[0])))
    x = np.expand_dims(im, axis=0)
    x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2)))  # stack and BHWC to BCHW
    x = torch.from_numpy(x).to(p.device).type_as(p) / 255  # uint8 to fp16/32
    return x

In [None]:
input_data = [((read_and_crop(img_name, model, (640, 640)),), None) for _ in range(100)]

In [None]:
model_optimized = optimize_model(
    model=model,
    input_data=input_data,
    optimization_time="unconstrained",
    metric_drop_ths=0.05
)

Let's compare the original model performance with the optimized one:

In [None]:
from nebullvm.tools.benchmark import benchmark

original_model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True, force_reload=True)
print("Benchmark original model")
benchmark(original_model, input_data)

print("Benchmark optimized model")
benchmark(model_optimized, input_data)

Let's ensure that the output of the original model is the same as the optimized model

In [None]:
input_tensor = torch.randn(1, 3, 640, 640).to(device)

In [None]:
model(input_tensor)

In [None]:
model_optimized(input_tensor)

## Save and reload the optimized model

We can easily save to disk the optimized model with the following line:

In [None]:
save_model(model_optimized, "model_save_path")

We can then load again the model:

In [None]:
model_optimized = load_model("model_save_path")


What an amazing result, right?!? Stay tuned for more cool instruction from the Nebuly team :) 

<center> 
    <a href="https://discord.com/invite/RbeQMu886J" target="_blank" style="text-decoration: none;"> Join the community </a> |
    <a href="https://nebuly.gitbook.io/nebuly/welcome/questions-and-contributions" target="_blank" style="text-decoration: none;"> Contribute to the library </a>
</center>

<center> 
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#key-concepts" target="_blank" style="text-decoration: none;"> How speedster works </a> â€¢
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#documentation" target="_blank" style="text-decoration: none;"> Documentation </a> â€¢
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#quick-start" target="_blank" style="text-decoration: none;"> Quick start </a> 
</center>