# How to Use Furiosa SDK from Start to Finish with Processor

This notebook demonstrates how to use Furiosa SDK with `Processor`

it is based on this [notebook](https://github.com/furiosa-ai/furiosa-sdk/blob/main/examples/notebooks/HowToUseFuriosaSDKFromStartToFinish.ipynb) 

## Prerequisites

The Furiosa SDK needs to have been installed. If not, it can be installed following instructions on https://furiosa-ai.github.io/docs/latest/ko/ (Korean) or https://furiosa-ai.github.io/docs/latest/en/ (English). The `torchvision` and `scipy` packages also need to be installed for this demonstration.

```console
$ pip install 'furiosa-sdk[quantizer]' torchvision scipy
```

In [1]:
import time

import numpy as np
import onnx
import torch
import torchvision
from torchvision import transforms
import tqdm

from furiosa.optimizer import optimize_model
from furiosa.quantizer import quantize, Calibrator, CalibrationMethod, ModelEditor, TensorType
import furiosa.runtime.sync

libfuriosa_hal.so --- v0.11.0, built @ 43c901f
libfuriosa_hal.so --- v0.11.0, built @ 43c901f


## Load PyTorch Model

As a running example, we employ the pre-trained ResNet-50 model from Torchvision.

In [2]:
torch_model = torchvision.models.resnet50(weights="DEFAULT")
torch_model = torch_model.eval()  # Set the model to inference mode.

The ResNet50 model has been trained with the following preprocessing applied: https://pytorch.org/vision/stable/models.html We will use the same preprocessing for calibration and inference.

Additionally, We will configure the process here to use image data as u8 type instead of converting it to f32 type.

In [3]:
# dataset preprocess for inference
preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        # this part is different
        transforms.PILToTensor(),
    ]
)

# dataset preprocess for calibration
calbirate_preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

## Export PyTorch Model to ONNX Model

We call the `torch.onnx.export` function to export the PyTorch ResNet-50 model to an ONNX model. The function executes a PyTorch model provided as its first argument, recording a trace of what operators are used during the execution, and then converts those operators into ONNX equivalents. Because `torch.onnx.export` runs the model, we need to provide the function with an input tensor as its second argument, which can be random so long as it satisfies the shape and type of the model's input. As of Furiosa SDK v0.9, ONNX OpSet 13 is the most well-supported version.

In [4]:
# Generate a dummy input of the shape, (1, 3, 224, 224), of the model's input.
dummy_input = (torch.randn(1, 3, 224, 224),)

# Export the PyTorch model into an ONNX model.
torch.onnx.export(
    torch_model,  # PyTorch model to export
    dummy_input,  # model input
    "resnet50.onnx",  # where to save the exported ONNX model
    opset_version=13,  # the ONNX OpSet version to export the model to
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["input"],  # the ONNX model's input names
    output_names=["output"],  # the ONNX model's output names
)

# Load the exported ONNX model.
onnx_model = onnx.load_model("resnet50.onnx")

## Load Dataset

We will use subsets of the ImageNet dataset for calibration and validation. 

You need to download `ILSVRC2012_img_val.tar` and `ILSVRC2012_devkit_t12.tar.gz` externally and place them in the `imagenet` directory. Torchvision cannot download the ImageNet dataset automatically because it is no longer publicly accessible: https://github.com/pytorch/vision/pull/1457.

Note that it may take several minutes to run this step for the first time because it involves decompressing the archive files. It will take much less time to complete subsequently.

In [5]:
imagenet = torchvision.datasets.ImageNet("imagenet", split="val", transform=preprocess)
imagenet_for_calibrate = torchvision.datasets.ImageNet(
    "imagenet", split="val", transform=calbirate_preprocess
)

## Calibrate

For quick demonstration, a small number of samples randomly chosen from the ImageNet dataset is used for calibration.

In [6]:
calibration_dataset = torch.utils.data.Subset(
    imagenet_for_calibrate, torch.randperm(len(imagenet_for_calibrate))[:100]
)

We call the `optimize_model` function to optimize onnx model, before calibration/quantization of the model.

In [7]:
onnx_model = optimize_model(onnx_model)

We use Calibrator to calibrate the model with various CalibrationMethod (e.g. MIN_MAX_ASYM, ENTROPY_ASYM, ...)

In [8]:
help(Calibrator)

Help on class Calibrator in module furiosa.quantizer.calibrator:

class Calibrator(builtins.object)
 |  Calibrator(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.calibrator.CalibrationMethod, *, percentage: float = 99.99)
 |  
 |  Calibrator.
 |  
 |  This collects the values of tensors in an ONNX model and computes
 |  their ranges.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.calibrator.CalibrationMethod, *, percentage: float = 99.99)
 |      Args:
 |          model (onnx.ModelProto or bytes): An ONNX model to
 |              calibrate.
 |          calibration_method (CalibrationMethod): A calibration
 |              method.
 |          percentage (float): A percentage to use with percentile
 |              calibration. Defaults to 99.99 (i.e. 99.99%-percentile
 |              calibration).
 |  
 |  collect_data(self, calibration_dataset: Iterable

In [9]:
help(CalibrationMethod)

Help on class CalibrationMethod in module furiosa.quantizer.calibrator:

class CalibrationMethod(enum.IntEnum)
 |  CalibrationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1)
 |  
 |  Calibration method.
 |  
 |  Attributes:
 |      MIN_MAX_ASYM (CalibrationMethod):
 |          Min-max calibration (Asymmetric).
 |      MIN_MAX_SYM (CalibrationMethod):
 |          Min-max calibration (Symmetric).
 |      ENTROPY_ASYM (CalibrationMethod):
 |          Entropy calibration (Aymmetric).
 |      ENTROPY_SYM (CalibrationMethod):
 |          Entropy calibration (Symmetric).
 |      PERCENTILE_ASYM (CalibrationMethod):
 |          Percentile calibration (Asymmetric).
 |      PERCENTILE_SYM (CalibrationMethod):
 |          Percentile calibration (Symmetric).
 |      MSE_ASYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Asymmetric).
 |      MSE_SYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Symmetric).
 |      SQNR_A

Before the Calibrator actually computes the ranges, input data should be collected.

In [10]:
calibrator = Calibrator(onnx_model, CalibrationMethod.MIN_MAX_ASYM)

for calibration_data, _ in tqdm.tqdm(
    calibration_dataset, desc="Calibration", unit="images", mininterval=0.5
):
    cal_input = np.expand_dims(calibration_data.numpy(), axis=0)
    calibrator.collect_data([[cal_input]])

Calibration: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:06<00:00,  1.51images/s]


In [11]:
ranges = calibrator.compute_range()

## Use ModelEditor

Now, we can use the ModelEditor to apply optimization to the input and output of the ONNX model. This optimization will transform them into a format that is efficient for our NPU.

In [12]:
furiosa_editor = ModelEditor(onnx_model)

We have defined a preprocess to handle the image data as u8 type. 

In order to use it in this way, we need to specify the input tensor of the model to use u8 type instead of f32 type. We can achieve this by setting `convert_input_type`(or `convert_output_type`) to enable using u8 instead of f32 for the corresponding input tensor.

In [13]:
# use model input tensor which name is "input" as u8 type instead of f32 type
furiosa_editor.convert_input_type("input", TensorType.UINT8)
# use model output tensor which name is "output" as i8 type instead of f32 type
furiosa_editor.convert_output_type("output", TensorType.INT8)

In [14]:
help(ModelEditor)

Help on class ModelEditor in module furiosa.quantizer.editor:

class ModelEditor(builtins.object)
 |  ModelEditor(model: onnx.onnx_ml_pb2.ModelProto)
 |  
 |  A utility class for manipulating ONNX models.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, model: onnx.onnx_ml_pb2.ModelProto)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  convert_input_type(self, tensor_name: str, tensor_type: furiosa.quantizer.editor.TensorType) -> None
 |      Convert the element type of an input tensor named tensor_name to tensor_type.
 |      
 |      Args:
 |          tensor_name (str): The name of an input tensor to convert.
 |          tensor_type (TensorType): The desired element type.
 |  
 |  convert_output_type(self, tensor_name: str, tensor_type: furiosa.quantizer.editor.TensorType, tensor_range: Optional[Tuple[float, float]] = None) -> None
 |      Convert the element type of an output tensor named tensor_name to tensor_type.
 |      
 |      Args:
 |    

## Quantize ONNX Model

With the range computed, now we can quantize the model by calling `quantize` function.

In [15]:
import json

with open("ranges.json", "w") as f:
    f.write(json.dumps(ranges, indent=4))
with open("ranges.json", "r") as f:
    ranges = json.load(f)

graph = quantize(onnx_model, ranges)

In [16]:
help(quantize)

Help on function quantize in module furiosa.quantizer:

quantize(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], tensor_name_to_range: Mapping[str, Sequence[float]]) -> bytes
    Quantize an ONNX model on the basis of the range of its tensors.
    
    Args:
        model (onnx.ModelProto or bytes): An ONNX model to quantize.
        tensor_name_to_range (Mapping[str, Sequence[float]]):
            A mapping from a tensor name to a 2-tuple (or list) of the
            tensor's min and max.
    
    Returns:
        bytes: A serialized ONNX model that incorporates quantization
            information.



In case you want already have calibrated model once and have ranges info, you can save the ranges info inside a file and
load it in order to skip calibration phase.

## Run Inference with Quantized Model

For quick demonstration, we use randomly chosen 1000 samples from the ImageNet dataset for validation.

In [17]:
validation_dataset = torch.utils.data.Subset(
    imagenet, torch.randperm(len(imagenet))[:1000]
)

correct_predictions, total_predictions = 0, 0
elapsed_time = 0
with furiosa.runtime.sync.create_runner(graph) as runner:
    for image, label in tqdm.tqdm(
        validation_dataset, desc="Evaluation", unit="images", mininterval=0.5
    ):
        start = time.perf_counter_ns()
        image = np.expand_dims(image.numpy(), axis=0)
        outputs = runner.run(image)
        elapsed_time += time.perf_counter_ns() - start

        prediction = np.argmax(outputs[0], axis=1)[0]  # postprocessing
        if prediction == label:
            correct_predictions += 1
        total_predictions += 1

[2m2023-10-22T11:41:31.462682Z[0m [33m WARN[0m [2mfuriosa_rt_core::consts::envs[0m[2m:[0m NPU_DEVNAME will be deprecated. Use FURIOSA_DEVICES instead.
[2m2023-10-22T11:41:31.462746Z[0m [32m INFO[0m [2mfuriosa_device::config::env[0m[2m:[0m Using config "npu0pe0-1" from environment variable "NPU_DEVNAME"
[2m2023-10-22T11:41:31.467343Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m FuriosaRT (v0.10.2, rev: a45bb1a0b, built at: 2023-10-12T06:41:21Z) bootstrapping ...
[2m2023-10-22T11:41:31.538585Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found furiosa-compiler (v0.10.1, rev: 8b00177, built at: 2023-10-12T06:26:59Z)
[2m2023-10-22T11:41:31.538613Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found libhal (type: warboy, v0.11.0, rev: 43c901f built at: 2023-08-08T12:07:35Z)
[2m2023-10-22T11:41:31.538622Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::

[1m[2m[1/6][0m 🔍   Compiling from onnx to dfg
Done in 1.158837s
[1m[2m[2/6][0m 🔍   Compiling from dfg to ldfg
Done in 342.50912s
[1m[2m[3/6][0m 🔍   Compiling from ldfg to cdfg
Done in 0.003204567s
[1m[2m[4/6][0m 🔍   Compiling from cdfg to gir
Done in 0.028225036s
[1m[2m[5/6][0m 🔍   Compiling from gir to lir
Done in 0.007152195s
[1m[2m[6/6][0m 🔍   Compiling from lir to enf
Done in 0.16137625s
✨  Finished in 343.86798s


[2m2023-10-22T11:47:20.185652Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Sess-eaa514b9] the model compile is successful (took 346 secs)
[2m2023-10-22T11:47:20.654257Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-0] created 1 NPU threads on npu:0:0-1 (DRAM: 180.0 kiB/16.0 GiB, SRAM: 31.7 MiB/128.0 MiB)


Evaluation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:14<00:00, 70.45images/s]

[2m2023-10-22T11:47:35.096744Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Sess-eaa514b9] terminated
[2m2023-10-22T11:47:35.102254Z[0m [32m INFO[0m [2mfuriosa_rt_core::npu::raw[0m[2m:[0m NPU (npu:0:0-1) has been closed
[2m2023-10-22T11:47:35.107646Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-0] stopped





In [18]:
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy:%}")

latency = elapsed_time / total_predictions
print(f"Average Latency: {latency / 1_000_000} ms")

Accuracy: 81.200000%
Average Latency: 2.026233351 ms
