# Introduction

This Notebook implements an easy way for speeding up models trained with FastAI without loosing all the cool functionalities given by the FastAI ecosistem.

# Fine tune a fastai model

This section is mainly based on the FastAI notebooks for beginners. This notebook scope is not to be a in-depth guide for the fastai libraries, but, indeed, showing how to properly use nebullvm for accelerating FastAI algorithms at inference time.

In [None]:
from fastai.vision.all import *

In [None]:
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224), num_workers=0)
dls.show_batch()

The model will simply classify if the picture contains a cat (`True` label) or a dog (`False` label). Since our aim in this notebook is just to show how to speedup the model, we are not really interested in the meaningfulnes or usefulness of the task itself.

In [None]:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

In [None]:
valid_loss, error = learn.validate()

Now that we finetuned the model let's compute how much it takes to run a single prediction

In [None]:
import time

In [None]:
times = []
for _ in range(100):
    st = time.time()
    preds = learn.predict(files[0])
    times.append((time.time()-st)*1000)
fastai_vanilla_time = sum(times)/len(times)
print(f"Prediction time: {fastai_vanilla_time} ms,\nPrediction: {preds}")

In [None]:
#learn.save(".")

# Optimize the model with nebullvm

In [None]:
from nebullvm import optimize_torch_model

Let's start with optimizing the model. Use nebullvm is super easy: you just need to 
specify the model, the batch size and the input sizes (for each input, excluding the batch size) and a directory
where you want to save the optimized model. In the example we choose the same directory where the model is stored.

As yuo can see we also added a further parameter `use_torch_api` that simply is a boolean flag for enabling more
optimization capabilities of nebullvm.

In [None]:
xs, ys = [], []
for i, (x, y) in enumerate(dls.train):
    if i >=10:
        break
    xs.append(x)
    ys.append(y)
xs = torch.cat(xs, dim=0)
ys = torch.cat(ys, dim=0)

In [None]:
dl_nebullvm = [((x.unsqueeze(dim=0),), y.unsqueeze(0)) for x, y in zip(xs, ys)]

In [None]:
original_model = learn.model

In [None]:
# Without quantization
# optimized_model = optimize_torch_model(
#     model=original_model,
#     batch_size=1,
#     input_sizes=[(3, 224, 224)],
#     save_dir=".",
#     use_torch_api=True
# )

In [None]:
# With quantization and accuracy as performance metric
optimized_model = optimize_torch_model(
    model=original_model,
    batch_size=1,
    save_dir=".",
    dataloader=dl_nebullvm,
    use_torch_api=True,
    perf_loss_ths=0.001,
    perf_metric="accuracy",
)

In [None]:
# With quantization and default precision as perf_metric
# optimized_model = optimize_torch_model(
#     model=original_model,
#     batch_size=1,
#     save_dir=".",
#     dataloader=dl_nebullvm,
#     use_torch_api=True,
#     perf_loss_ths=3,
# )

In [None]:
class ModelWrapper(torch.nn.Module):
    def __init__(self, core):
        super().__init__()
        self.core = optimized_model
    
    def forward(self, *args, **kwargs):
        res = self.core(*args, **kwargs)
        if isinstance(res, tuple) and len(res) == 1:
            res = res[0]
        return res
    
    def parameters(self, *args, **kwargs):
        yield torch.zeros(100)

In [None]:
core_model = ModelWrapper(optimized_model)

In [None]:
learn.model = core_model

In [None]:
learn.dls.valid.bs = 1

In [None]:
quant_valid_loss, quant_error = learn.validate(dl=learn.dls.valid)

In [None]:
times = []
for _ in range(100):
    st = time.time()
    preds = learn.predict(files[0])
    times.append((time.time()-st)*1000)
optimized_time = sum(times) / len(times)
print(f"Prediction time: {optimized_time} ms,\nPrediction: {preds}")

In [None]:
print(f"Full precision error: {error}\nQuantization error: {quant_error}")

## Summary

In [None]:
your_username = "Put here your username"

In [None]:
# Decomment the following line for installing gputil (if you are running on an NVIDIA GPU)
#!pip install gputil

In [None]:
import cpuinfo
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
cpu_info = cpuinfo.get_cpu_info()['brand_raw']
gpu_info = "no"
if torch.cuda.is_available():
    import GPUtil
    gpus = GPUtil.getGPUs()
    gpu_info = list(gpus)[0].name

In [None]:
message = f"""
Hello, I'm {your_username}!
I've tested nebullvm on the following setup:
Hardware: {cpu_info} CPU and {gpu_info} GPU.
Model: {learn.arch.__name__} - FastAI for image classification
Vanilla performance: {round(fastai_vanilla_time, 2)}ms
Optimized performance: {round(optimized_time, 2)}ms
Acceleration: {round(fastai_vanilla_time/optimized_time, 1)}x
With error increase of {round((quant_error-error)/error*100, 1)}%
"""
print(message)