# How to Use Furiosa SDK from Start to Finish

This notebook demonstrates how to use Furiosa SDK from start to finish.

## Prerequisites

The Furiosa SDK needs to have been installed. If not, it can be installed following instructions on https://furiosa-ai.github.io/docs/latest/ko/ (Korean) or https://furiosa-ai.github.io/docs/latest/en/ (English). The `torchvision` and `scipy` packages also need to be installed for this demonstration.

```console
$ pip install furiosa-sdk torchvision scipy
```

In [1]:
import time

import numpy as np
import onnx
import torch
import torchvision
from torchvision import transforms
import tqdm

import furiosa.runtime.session
from furiosa.optimizer import optimize_model
from furiosa.quantizer import quantize, Calibrator, CalibrationMethod

libfuriosa_hal.so --- v0.10.0, built @ d6fc64c
libfuriosa_hal.so --- v0.11.0, built @ 43c901f


## Load PyTorch Model

As a running example, we employ the pre-trained ResNet-50 model from Torchvision.

In [2]:
torch_model = torchvision.models.resnet50(weights='DEFAULT')
torch_model = torch_model.eval()  # Set the model to inference mode.

The ResNet50 model has been trained with the following preprocessing applied: https://pytorch.org/vision/stable/models.html We will use the same preprocessing for calibration and inference.

In [3]:
preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

## Export PyTorch Model to ONNX Model

We call the `torch.onnx.export` function to export the PyTorch ResNet-50 model to an ONNX model. The function executes a PyTorch model provided as its first argument, recording a trace of what operators are used during the execution, and then converts those operators into ONNX equivalents. Because `torch.onnx.export` runs the model, we need to provide the function with an input tensor as its second argument, which can be random so long as it satisfies the shape and type of the model's input. As of Furiosa SDK v0.6, ONNX OpSet 12 is the most well-supported version.

In [4]:
# Generate a dummy input of the shape, (1, 3, 224, 224), of the model's input.
dummy_input = (torch.randn(1, 3, 224, 224),)

# Export the PyTorch model into an ONNX model.
torch.onnx.export(
    torch_model,  # PyTorch model to export
    dummy_input,  # model input
    "resnet50.onnx",  # where to save the exported ONNX model
    opset_version=13,  # the ONNX OpSet version to export the model to
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["input"],  # the ONNX model's input names
    output_names=["output"],  # the ONNX model's output names
)

# Load the exported ONNX model.
onnx_model = onnx.load_model("resnet50.onnx")

verbose: False, log level: Level.ERROR



## Load Dataset

We will use subsets of the ImageNet dataset for calibration and validation. 

You need to download `ILSVRC2012_img_val.tar` and `ILSVRC2012_devkit_t12.tar.gz` externally and place them in the `imagenet` directory. Torchvision cannot download the ImageNet dataset automatically because it is no longer publicly accessible: https://github.com/pytorch/vision/pull/1457.

Note that it may take several minutes to run this step for the first time because it involves decompressing the archive files. It will take much less time to complete subsequently.

In [5]:
imagenet = torchvision.datasets.ImageNet("imagenet", split="val", transform=preprocess)

## Calibrate and Quantize ONNX Model

For quick demonstration, a small number of samples randomly chosen from the ImageNet dataset is used for calibration.

In [6]:
calibration_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:100])
calibration_dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=1)

We call the `optimize_model` function to optimize onnx model, before calibration/quantization of the model.

In [7]:
onnx_model = optimize_model(onnx_model)

We use Calibrator to calibrate the model with various CalibrationMethod (e.g. MIN_MAX_ASYM, ENTROPY_ASYM, ...)

In [8]:
help(Calibrator)

Help on class Calibrator in module furiosa.quantizer:

class Calibrator(builtins.object)
 |  Calibrator(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.CalibrationMethod, *, percentage: float = 99.99)
 |  
 |  Calibrator.
 |  
 |  This collects the values of tensors in an ONNX model and computes
 |  their ranges.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.CalibrationMethod, *, percentage: float = 99.99)
 |      Args:
 |          model (onnx.ModelProto or bytes): An ONNX model to
 |              calibrate.
 |          calibration_method (CalibrationMethod): A calibration
 |              method.
 |          percentage (float): A percentage to use with percentile
 |              calibration. Defaults to 99.99 (i.e. 99.99%-percentile
 |              calibration).
 |  
 |  collect_data(self, calibration_dataset: Iterable[Sequence[numpy.ndarray]]) -> Non

In [9]:
help(CalibrationMethod)

Help on class CalibrationMethod in module furiosa.quantizer:

class CalibrationMethod(enum.IntEnum)
 |  CalibrationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1)
 |  
 |  Calibration method.
 |  
 |  Attributes:
 |      MIN_MAX_ASYM (CalibrationMethod):
 |          Min-max calibration (Asymmetric).
 |      MIN_MAX_SYM (CalibrationMethod):
 |          Min-max calibration (Symmetric).
 |      ENTROPY_ASYM (CalibrationMethod):
 |          Entropy calibration (Aymmetric).
 |      ENTROPY_SYM (CalibrationMethod):
 |          Entropy calibration (Symmetric).
 |      PERCENTILE_ASYM (CalibrationMethod):
 |          Percentile calibration (Asymmetric).
 |      PERCENTILE_SYM (CalibrationMethod):
 |          Percentile calibration (Symmetric).
 |      MSE_ASYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Asymmetric).
 |      MSE_SYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Symmetric).
 |      SQNR_ASYM (Calibr

Before the Calibrator actually computes the ranges, input data should be collected.

In [10]:
calibrator = Calibrator(onnx_model, CalibrationMethod.MIN_MAX_ASYM)

for calibration_data, _ in tqdm.tqdm(calibration_dataloader, desc="Calibration", unit="images", mininterval=0.5):
    calibrator.collect_data([[calibration_data.numpy()]])

Calibration: 100%|██████████| 100/100 [00:55<00:00,  1.82images/s]


In [11]:
ranges = calibrator.compute_range()

With the range computed, now we can quantize the model by calling `quantize` function.

In [12]:
graph = quantize(onnx_model, ranges)

In [13]:
help(quantize)

Help on function quantize in module furiosa.quantizer:

quantize(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], tensor_name_to_range: Mapping[str, Sequence[float]], *, with_quantize: bool = True, normalized_pixel_outputs: Optional[Sequence[int]] = None) -> Graph
    Quantize an ONNX model on the basis of the range of its tensors.
    
    Args:
        model (onnx.ModelProto or bytes): An ONNX model to quantize.
        tensor_name_to_range (Mapping[str, Sequence[float]]):
            A mapping from a tensor name to a 2-tuple (or list) of the
            tensor's min and max.
        with_quantize (bool): Whether to put a Quantize operator at the
            beginning of the resulting model. Defaults to True.
        normalized_pixel_outputs (Optional[Sequence[int]]):
            A sequence of indices of output tensors in the ONNX model
            that produce pixel values in a normalized format ranging
            from 0.0 to 1.0. If specified, the corresponding output
           

In case you want already have calibrated model once and have ranges info, you can save the ranges info inside a file and
load it in order to skip calibration phase.

In [14]:
import json

with open("ranges.json", "w") as f:
    f.write(json.dumps(ranges, indent=4))
with open("ranges.json", "r") as f:
    ranges = json.load(f)

graph = quantize(onnx_model, ranges)

## Run Inference with Quantized Model

For quick demonstration, we use randomly chosen 1000 samples from the ImageNet dataset for validation.

In [15]:
validation_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:1000])
validation_dataloader = torch.utils.data.DataLoader(validation_dataset, batch_size=1)

correct_predictions, total_predictions = 0, 0
elapsed_time = 0
with furiosa.runtime.session.create(graph) as session:
    for image, label in tqdm.tqdm(validation_dataloader, desc="Evaluation", unit="images", mininterval=0.5):
        image = image.numpy()
        start = time.perf_counter_ns()
        outputs = session.run(image)
        elapsed_time += time.perf_counter_ns() - start
        
        prediction = np.argmax(outputs[0].numpy(), axis=1)  # postprocessing  
        if prediction == label.numpy():
            correct_predictions += 1
        total_predictions += 1

Saving the compilation log into /root/.local/state/furiosa/logs/compile-20230412194938-s4t0d6.log
Using furiosa-compiler 0.9.0-dev (rev: 6ad475d33 built at 2023-04-12T00:25:06Z)


[2m2023-04-12T10:49:38.320063Z[0m [32m INFO[0m [2mnux::npu[0m[2m:[0m Npu (npu3pe0-1) is being initialized
[2m2023-04-12T10:49:38.321904Z[0m [32m INFO[0m [2mnux[0m[2m:[0m NuxInner create with pes: [PeId(0)]


[1m[2m[1/5][0m  Compiling from dfg to ldfg
Done in 112.35857s
[1m[2m[2/5][0m  Compiling from ldfg to cdfg
Done in 0.003420042s
[1m[2m[3/5][0m  Compiling from cdfg to gir
Done in 0.029426608s
[1m[2m[4/5][0m  Compiling from gir to lir
Done in 0.006769268s
[1m[2m[5/5][0m  Compiling from lir to enf
Done in 0.16132618s
:-) Finished in 112.56005s
Evaluation: 100%|██████████| 1000/1000 [00:49<00:00, 20.02images/s]

[2m2023-04-12T10:52:22.486376Z[0m [32m INFO[0m [2mnux::npu[0m[2m:[0m NPU (npu3pe0-1) has been destroyed
[2m2023-04-12T10:52:22.486723Z[0m [32m INFO[0m [2mnux::capi[0m[2m:[0m session has been destroyed





In [16]:
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy:%}")

latency = elapsed_time / total_predictions
print(f"Average Latency: {latency / 1_000_000} ms")

Accuracy: 78.600000%
Average Latency: 9.612101494000001 ms
