# How to Use Furiosa SDK from Start to Finish

This notebook demonstrates how to use Furiosa SDK from start to finish.

## Prerequisites

The Furiosa SDK needs to have been installed. If not, it can be installed following instructions on https://furiosa-ai.github.io/docs/latest/ko/ (Korean) or https://furiosa-ai.github.io/docs/latest/en/ (English). The `torchvision` and `scipy` packages also need to be installed for this demonstration.

```console
$ pip install furiosa-sdk torchvision scipy
```

In [1]:
import time

import numpy as np
import onnx
import torch
import torchvision
from torchvision import transforms
import tqdm

import furiosa.runtime.session
from furiosa.optimizer import optimize_model
from furiosa.quantizer import post_training_quantize, quantize, Calibrator, CalibrationMethod

libfuriosa_hal.so --- v2.0, built @ 5423ba8
libfuriosa_hal.so --- v2.0, built @ 5423ba8


## Load PyTorch Model

As a running example, we employ the pre-trained ResNet-50 model from Torchvision.

In [2]:
torch_model = torchvision.models.resnet50(weights='DEFAULT')
torch_model = torch_model.eval()  # Set the model to inference mode.

The ResNet50 model has been trained with the following preprocessing applied: https://pytorch.org/vision/stable/models.html We will use the same preprocessing for calibration and inference.

In [3]:
preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

## Export PyTorch Model to ONNX Model

We call the `torch.onnx.export` function to export the PyTorch ResNet-50 model to an ONNX model. The function executes a PyTorch model provided as its first argument, recording a trace of what operators are used during the execution, and then converts those operators into ONNX equivalents. Because `torch.onnx.export` runs the model, we need to provide the function with an input tensor as its second argument, which can be random so long as it satisfies the shape and type of the model's input. As of Furiosa SDK v0.6, ONNX OpSet 12 is the most well-supported version.

In [4]:
# Generate a dummy input of the shape, (1, 3, 224, 224), of the model's input.
dummy_input = (torch.randn(1, 3, 224, 224),)

# Export the PyTorch model into an ONNX model.
torch.onnx.export(
    torch_model,  # PyTorch model to export
    dummy_input,  # model input
    "resnet50.onnx",  # where to save the exported ONNX model
    opset_version=13,  # the ONNX OpSet version to export the model to
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["input"],  # the ONNX model's input names
    output_names=["output"],  # the ONNX model's output names
)

# Load the exported ONNX model.
onnx_model = onnx.load_model("resnet50.onnx")

## Load Dataset

We will use subsets of the ImageNet dataset for calibration and validation. 

You need to download `ILSVRC2012_img_val.tar` and `ILSVRC2012_devkit_t12.tar.gz` externally and place them in the `imagenet` directory. Torchvision cannot download the ImageNet dataset automatically because it is no longer publicly accessible: https://github.com/pytorch/vision/pull/1457.

Note that it may take several minutes to run this step for the first time because it involves decompressing the archive files. It will take much less time to complete subsequently.

In [5]:
imagenet = torchvision.datasets.ImageNet("imagenet", split="val", transform=preprocess)

## Calibrate and Quantize ONNX Model

For quick demonstration, a small number of samples randomly chosen from the ImageNet dataset is used for calibration.

In [6]:
calibration_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:100])
calibration_dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=1)

We call the `optimize_model` function to optimize onnx model, before calibration/quantization of the model.

In [7]:
onnx_model = optimize_model(onnx_model)
onnx_model = onnx_model.SerializeToString()

We use Calibrator to calibrate the model with various CalibrationMethod (e.g. MIN_MAX, Entropy, ...)

In [8]:
help(Calibrator)

Help on class Calibrator in module builtins:

class Calibrator(object)
 |  Calibrator(model, calibration_method, percentage=99.99)
 |  
 |  A calibrator, which collects the values of tensors in an ONNX model
 |  and computes their ranges.
 |  
 |  Args:
 |      model (bytes): An ONNX model to calibrate.
 |      calibration_method (CalibrationMethod): A calibration method.
 |      percentage (float, optional): A percentage to use with
 |          percentile calibration. Defaults to 99.99 (i.e.
 |          99.99%-percentile calibration).
 |  
 |  Methods defined here:
 |  
 |  collect_data(self, calibration_dataset)
 |      Collect the values of tensors that will be used for range
 |      computation.
 |      
 |      This can be called multiple times.
 |      
 |      Args:
 |          calibration_dataset (Iterable[Sequence[numpy.ndarray]]):
 |              An object that provides input data for the model one at
 |              a time.
 |  
 |  compute_range(self, verbose)
 |      Estim

In [9]:
help(CalibrationMethod)

Help on class CalibrationMethod in module furiosa_quantizer_impl:

class CalibrationMethod(enum.IntEnum)
 |  CalibrationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1)
 |  
 |  Calibration method.
 |  
 |  Attributes:
 |      MIN_MAX (CalibrationMethod): Min-max calibration.
 |      ENTROPY (CalibrationMethod): Entropy calibration.
 |      PERCENTILE (CalibrationMethod): Percentile calibration.
 |      MSE (CalibrationMethod): Mean squared error (MSE) calibration.
 |      SQNR (CalibrationMethod): Signal-to-quantization-noise ratio (SQNR) calibration.
 |  
 |  Method resolution order:
 |      CalibrationMethod
 |      enum.IntEnum
 |      builtins.int
 |      enum.Enum
 |      builtins.object
 |  
 |  Data and other attributes defined here:
 |  
 |  ENTROPY = <CalibrationMethod.ENTROPY: 1>
 |  
 |  MIN_MAX = <CalibrationMethod.MIN_MAX: 0>
 |  
 |  MSE = <CalibrationMethod.MSE: 3>
 |  
 |  PERCENTILE = <CalibrationMethod.PERCENTILE: 2>
 |  
 |  SQNR = <Calib

Before the Calibrator actually computes the ranges, input data should be collected.

In [10]:
calibrator = Calibrator(onnx_model, CalibrationMethod.MIN_MAX)

calibrator.collect_data([calibration_data.numpy()] for calibration_data, _ in calibration_dataloader)
for calibration_data, _ in calibration_dataloader:
    calibrator.collect_data([[calibration_data.numpy()]])

In [11]:
ranges = calibrator.compute_range()

In [12]:
print(ranges)

{'/layer2/layer2.2/relu_1/Relu_output_0': (0.0, 39.90517807006836), '/layer1/layer1.0/conv1/Conv_output_0': (-47.21926498413086, 17.741018295288086), '/layer4/layer4.2/Add_output_0': (-38.632232666015625, 75.21089935302734), '/layer3/layer3.5/conv3/Conv_output_0': (-32.19559097290039, 16.876602172851562), '/layer3/layer3.2/conv2/Conv_output_0': (-30.72959327697754, 26.510149002075195), '/layer1/layer1.0/relu_2/Relu_output_0': (0.0, 24.1777400970459), '/layer2/layer2.2/relu/Relu_output_0': (0.0, 86.73884582519531), '/layer1/layer1.1/conv1/Conv_output_0': (-72.26683044433594, 20.372543334960938), '/layer2/layer2.0/Add_output_0': (-59.57727813720703, 92.01126098632812), '/layer3/layer3.1/conv1/Conv_output_0': (-48.180213928222656, 47.666568756103516), '/layer4/layer4.1/conv2/Conv_output_0': (-33.69670486450195, 18.930652618408203), '/layer3/layer3.1/conv3/Conv_output_0': (-120.35619354248047, 56.40264892578125), 'output': (-2.4086153507232666, 9.489575386047363), '/layer3/layer3.4/relu/Re

With the range computed, now we can quantize the model by calling `optimize_model` function.

In [13]:
graph = quantize(onnx_model, ranges)

In [14]:
help(quantize)

Help on built-in function quantize in module furiosa_quantizer_impl.quantizer_impl:

quantize(model, tensor_name_to_range, with_quantize)
    Quantize an ONNX model on the basis of the range of its tensors.
    
    Args:
        model (bytes): An ONNX model to quantize.
        tensor_name_to_range (Mapping[str, Sequence[float]]):
            A mapping from a tensor name to a 2-tuple (or list) of the
            tensor's min and max.
        with_quantize (bool): Whether to put a Quantize operator at the
            beginning of the resulting model. Defaults to True.
    
    Returns:
        Graph: An intermediate representation (IR) of the quantized
            model.



The process above can be done in one-shot, by using `post_training_quantize` function.

### Quantization in one-shot

As we did in the previous section, onnx model and dataset should be prepared.

In [15]:
onnx_model = onnx.load_model("resnet50.onnx")

calibration_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:100])
calibration_dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=1)

dataset fed to the `post_training_quantize` function should be a type of `Iterable[Sequence[numpy.ndarray]]`.
Outermost Iterable is for iteration over dataset, and Sequence is for inputs of model. (in case the model requires multiple inputs)

In [16]:
# label is not needed for calibration
dataset = ([calibration_data.numpy()] for calibration_data, _ in calibration_dataloader)

We call `post_training_quantize` function with onnx model and dataset to quantize model.

In [17]:
graph = post_training_quantize(onnx_model, dataset)

`post_training_quantize` function is a wrapper for several functionalites needed for quantization including optimization/serialization/calibration/quantization, which are explained in the previous section, and we can manually call those functions to have full control over the quantization phase.

In [18]:
help(post_training_quantize)

Help on function post_training_quantize in module furiosa.quantizer.utils:

post_training_quantize(model: Union[bytes, str, pathlib.Path, onnx.onnx_ml_pb2.ModelProto], dataset: Iterable[Sequence[numpy.ndarray]], calibration_method: furiosa_quantizer_impl.CalibrationMethod = <CalibrationMethod.MIN_MAX: 0>, opset_version: int = 13, with_quantize: bool = True, verbose: bool = False) -> Graph
    Conduct a quantization with given dataset and calibration method
    Args:
        model (bytes, str, Path, onnx.ModelProto): a byte string containing a model or
            a path string of a onnx model or `onnx.ModelProto`
        dataset: A calibration dataset.
        calibration_method: A calibration method to use. (MIN_MAX, ENTROPY, etc.)
            Defaults to MIN_MAX.
        opset_version: ONNX OperatorSet version to use.
            Defaults to 13.
        with_quantize: Whether to put a Quantize operator at the
            beginning of the resulting model. Defaults to True.
        ver

## Run Inference with Quantized Model

For quick demonstration, we use randomly chosen 1000 samples from the ImageNet dataset for validation.

In [19]:
validation_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:1000])
validation_dataloader = torch.utils.data.DataLoader(validation_dataset, batch_size=1)

correct_predictions, total_predictions = 0, 0
elapsed_time = 0
with furiosa.runtime.session.create(bytes(graph)) as session:
    for image, label in tqdm.tqdm(validation_dataloader, desc="Evaluation", unit="images", mininterval=0.5):
        image = image.numpy()
        start = time.perf_counter_ns()
        outputs = session.run(image)
        elapsed_time += time.perf_counter_ns() - start
        
        prediction = np.argmax(outputs[0].numpy(), axis=1)  # postprocessing  
        if prediction == label.numpy():
            correct_predictions += 1
        total_predictions += 1

[2m2023-01-20T04:18:56.585783Z[0m [32m INFO[0m [2mnux::npu[0m[2m:[0m Npu (npu0pe0-1) is being initialized
[2m2023-01-20T04:18:56.587575Z[0m [32m INFO[0m [2mnux[0m[2m:[0m NuxInner create with pes: [PeId(0)]


Saving the compilation log into /root/.local/state/furiosa/logs/compile-20230120131856-oni7r9.log
Using furiosa-compiler 0.9.0-dev (rev: 0e82d35da built at 2023-01-02T06:05:47Z)
[1m[2m[1/5][0m 🔍   Compiling from dfg to ldfg
Done in 71.27173s
[1m[2m[2/5][0m 🔍   Compiling from ldfg to cdfg
Done in 0.003129137s
[1m[2m[3/5][0m 🔍   Compiling from cdfg to gir
Done in 0.029114116s
[1m[2m[4/5][0m 🔍   Compiling from gir to lir
Done in 0.008903365s
[1m[2m[5/5][0m 🔍   Compiling from lir to enf
Done in 0.06838308s
✨  Finished in 71.38182s
Evaluation: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:07<00:00, 130.46images/s]

[2m2023-01-20T04:20:18.032267Z[0m [32m INFO[0m [2mnux::npu[0m[2m:[0m NPU (npu0pe0-1) has been destroyed
[2m2023-01-20T04:20:18.035431Z[0m [32m INFO[0m [2mnux::capi[0m[2m:[0m session has been destroyed





In [20]:
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy:%}")

latency = elapsed_time / total_predictions
print(f"Average Latency: {latency / 1_000_000} ms")

Accuracy: 80.600000%
Average Latency: 2.408617062 ms
