# How to Use Warboy From Start To Finish

This notebook demonstrates how to use warboy from start to finish with yolov8n object detection model.

## Prerequisites

### Install Driver, Firmware, and Runtime packages

First, you can install the Driver, Firmware, and Runtime packages for the NPU device through the APT server. To do this, you need to set up the APT server. You can follow the instructions in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/installation.html) or [English](https://developer.furiosa.ai/docs/latest/en/software/installation.html).

After setting up the APT server, you can install the packages using the following command:

```console
$ sudo apt-get update && sudo apt-get install -y furiosa-driver-warboy furiosa-libnux
```

Next, you can check NPU devices on your environment using the following command:

```console
$ sudo apt-get install -y furiosa-toolkit
$ furiosactl info -full
```


### Install Furiosa Python SDK

To install the Furiosa Python SDK, you need Python 3.8 or higher. First, you can create a virtual environment with Conda using the following command:

```console
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh ./Miniconda3-latest-Linux-x86_64.sh
$ source ~/.bashrc
$ conda create -n furiosa-3.9 python=3.9
$ conda activate furiosa-3.9
```


Nest, the Furiosa SDK needs to have been installed. If not, it can be installed following instructions on [Korean](https://furiosa-ai.github.io/docs/latest/ko/) or [English](https://furiosa-ai.github.io/docs/latest/en/).

```console
$ pip install 'furiosa-sdk[full]'
```

### Install Datasets

In this notebook, we will use the COCO dataset. You can download the COCO dataset using the following command:

```console
./coco.sh
```
This will download the COCO dataset and save it in the `datasets/coco` directory.


Also, to run web demo, you need to install the demo videos. You can download the demo videos using the following command:

```console
./demo_videos.sh
```

This will download the demo videos and save them in the `datasets/demo_videos` directory. This includes the object detection and instacne segmentation videos in `datasets/demo_videos/detection` and pose estimation videos in `datasets/demo_videos/estimation` directory.

### Install required packages

You can install the required packages using the following command:

```console
$ pip install -r requirements.txt
```


In [None]:
%pip install -r requirements.txt

Also, in jupyter notebook, because we cannot use `asyncio.run()`, we need to use `nest_asyncio` to run the async function. You can install `nest_asyncio` using the following command:

```console
$ pip install nest_asyncio
```


In [None]:
%pip install nest_asyncio


### Installing a Custom CLI Tool (Optional)

In this notebook, we won't be using the custom CLI tool, but if you want to use it, you can install our custom CLI tool to run vision models on Warboy using the following command:

```console
$ pip install .
```
This will install the `warboy-vision` command line tool, which you can use to run models on Warboy.

In [None]:
%pip install .

In [None]:
import os
import torch
import onnx
from typing import List, Dict
import cv2
from tqdm import tqdm
from pathlib import Path
import nest_asyncio
import asyncio
import numpy as np



## Prepare Model

First, you need to prepare the weight file for the model you want to use. In this notebook, we will use the YOLOv8n model.

### Download YOLOv8n Weights

You can download the YOLOv8n weights using the following command:

```console
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt
```


### Export ONNX

To run the model on Warboy, you need a quantized ONNX model. First, let's export the YOLOv8n model to ONNX format.



In [None]:
from ultralytics import YOLO

def export_onnx(
    weight_file: str ="yolov8n.pt", 
    input_shape: List[int] = [1, 3, 640, 640],
    onnx_path: str ="models/onnx/object_detection/yolov8n.onnx",
):
    """
    Export YOLOv8 model to ONNX format.
    Args:
        weight_file (str): Path to the YOLOv8 model weights file.
        input_shape (tuple): Shape of the input tensor.
        onnx_path (str): Path to save the exported ONNX model.
    """

    print(f"Load PyTorch Model from {weight_file}...")
    if os.path.dirname(onnx_path) != "" and not os.path.exists(
        os.path.dirname(onnx_path)
    ):
        os.makedirs(os.path.dirname(onnx_path))

    # Load the PyTorch model in inference mode
    torch_model = YOLO(weight_file).model.eval()

    print(f"Export ONNX {onnx_path}...")
    dummy_input = torch.randn(input_shape).to(torch.device("cpu"))
    
    torch.onnx.export(
        torch_model,
        dummy_input,
        onnx_path,
        opset_version=13,
        input_names=["images"],
        output_names=["outputs"],
    )
    
    return


For yolo models, due to a drop in accuracy after quantization caused by the concatenation operator (which combines class results and box results along the channel axis at each anchor), we need to modified the model by removing the decoding part from the model output.


In [None]:
def edit_onnx(
    onnx_path: str = "models/onnx/object_detection/yolov8n.onnx", 
    edit_info: Dict[str, List] = None
):
    
    """
    Edit the ONNX model to change input and output shapes.
    Args:
        onnx_path (str): Path to the ONNX model.
        edit_info (dict): Not necessary, can be None and will be computed.
    """
    
    from onnx.utils import Extractor
    from warboy_vision_models.warboy.tools.utils import get_onnx_graph_info

    onnx_graph = onnx.load(onnx_path)
    input_to_shape, output_to_shape = get_onnx_graph_info(
        "object_detection", "yolov8n", onnx_path, edit_info
    )

    edited_graph = Extractor(onnx_graph).extract_model(
        input_names=list(input_to_shape), output_names=list(output_to_shape)
    )

    for value_info in edited_graph.graph.input:
        del value_info.type.tensor_type.shape.dim[:]
        value_info.type.tensor_type.shape.dim.extend(
            input_to_shape[value_info.name]
        )
    for value_info in edited_graph.graph.output:
        del value_info.type.tensor_type.shape.dim[:]
        value_info.type.tensor_type.shape.dim.extend(
            output_to_shape[value_info.name]
        )

    print(f"Export edited ONNX >> {onnx_path}")
    onnx.save(edited_graph, onnx_path)
    
    return


Let's run the code below to export and edit the YOLOv8n model.


In [None]:
weight_file = "yolov8n.pt"
onnx_path = "models/onnx/object_detection/yolov8n.onnx"
input_shape = (1, 3, 640, 640)

export_onnx(
    weight_file=weight_file,
    input_shape=input_shape,
    onnx_path=onnx_path,
)

edit_onnx(
    onnx_path=onnx_path,
    edit_info=None,
)


### Quantize Model

Next, let's quantize the ONNX model. Quantization is a technique that converts a high-precision (usually FP32) DL model to a lower precision, reducing the model size and memory cost, and improving the inference speed. By quantizing the model, you can run efficient inference AI services.

You can see the specifics of quantization in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/quantization.html) or [English](https://developer.furiosa.ai/docs/v0.5.0/en/advanced/quantization.html).


In quantization phase, we need to prepare the calibration dataset. The calibration dataset is used to calibrate the quantization parameters of the model. In this notebook, we will use COCO dataset for calibration.


We need to preprocess the COCO dataset to use it for calibration. Preprocessor will be also used to preprocess the input data for inference.

For YOLO models, we will resize the input image to 640x640. You can check the `YoloPreProcessor` code in `warboy_vision_models/warboy/yolo/preprocess.py` file.


In [None]:
def get_calibration_dataset(
    calibration_data_path: str = "datasets/coco/val2017",
    num_calibration_data: int = 100,
):
    """
    Get calibration dataset for quantization.
    Args:
        calibration_data_path (str): Path to the dataset.
        num_calibration_data (int): Number of images to use for calibration.
    """
    
    import glob
    import imghdr
    import random

    calibration_data = []

    datas = glob.glob(calibration_data_path + "/**", recursive=True)
    datas = random.choices(datas, k=min(num_calibration_data, len(datas)))

    for data in datas:
        if os.path.isdir(data) or imghdr.what(data) is None:
            continue
        calibration_data.append(data)
    print(calibration_data)
    return calibration_data


In [None]:
def quantize(
    onnx_path: str = "models/onnx/object_detection/yolov8n.onnx",
    onnx_i8_path: str = "models/quantized_onnx/object_detection/yolov8n_i8.onnx",
    calibration_data_path: str = "datasets/coco",
    calibration_method: str = "SQNR_ASYM",
    num_calibration_data: int = 100,
    use_model_editor: bool = True
):
    """
    Qauntize the model using FuriosaAI SDK.

    Args:
        onnx_path (str): Path to the ONNX model.
        onnx_i8_path (str): Path to save the quantized ONNX model.
        calibration_method (str): Calibration method for quantization. Can check options by help(CalibrationMethod)
        use_model_editor (bool): Whether to use model editor for input type conversion.
    """
    if not os.path.exists(os.path.dirname(onnx_i8_path)):
        os.makedirs(os.path.dirname(onnx_i8_path))

    if not os.path.exists(onnx_path):
        raise FileNotFoundError(f"{onnx_path} is not found!")

    from furiosa.optimizer import optimize_model
    from furiosa.quantizer import (
        CalibrationMethod,
        Calibrator,
        ModelEditor,
        TensorType,
        get_pure_input_names,
        quantize,
    )

    new_shape = input_shape[2:]
    onnx_model = onnx.load(onnx_path)
    onnx_model = optimize_model(
        model=onnx_model,
        opset_version=13,
        input_shapes={"images": input_shape},
    )

    calibrator = Calibrator(
        onnx_model, CalibrationMethod._member_map_[calibration_method]
    )
    
    from warboy_vision_models.warboy.yolo.preprocess import YoloPreProcessor

    preprocessor = YoloPreProcessor(new_shape=new_shape, tensor_type="float32")

    for calibration_data in tqdm(
        get_calibration_dataset(
            calibration_data_path=calibration_data_path, 
            num_calibration_data=num_calibration_data
        ), desc="calibration..."
    ):
        input_img = cv2.imread(calibration_data)
        input_, _ = preprocessor(input_img)
        calibrator.collect_data([[input_]])

    if use_model_editor:
        editor = ModelEditor(onnx_model)
        input_names = get_pure_input_names(onnx_model)

        for input_name in input_names:
            editor.convert_input_type(input_name, TensorType.UINT8)

    calib_range = calibrator.compute_range()
    quantized_model = quantize(onnx_model, calib_range)

    with open(onnx_i8_path, "wb") as f:
        f.write(bytes(quantized_model))

    print(f"Quantization completed >> {onnx_i8_path}")
    
    return



In [None]:
onnx_path = "models/onnx/object_detection/yolov8n.onnx"
onnx_i8_path = "models/quantized_onnx/object_detection/yolov8n_i8.onnx"
calibration_data_path = "datasets/coco/val2017"
calibration_method = "SQNR_ASYM"
num_calibration_data = 1
use_model_editor = True

quantize(
    onnx_path=onnx_path,
    onnx_i8_path=onnx_i8_path,
    calibration_data_path=calibration_data_path,
    calibration_method=calibration_method,
    num_calibration_data=num_calibration_data,
    use_model_editor=use_model_editor
)


## Run Inference

We will run inference on the YOLOv8n model with the COCO dataset and then, test the performance. We will use MSCOCODataLoader to load the COCO dataset. The MSCOCODataLoader code is in `warboy_vision_models/test/utils.py` code. We will use the `COCOeval` module from `pycocotools.cocoeval` to evaluate the model performance.

We need preprocessor that we created in the previous step to preprocess the input data for inference. The preprocessor will resize the input image to 640x640. We also need postprocessor to postprocess the output data. The postprocessor will decode the output data and draw or calculate the bounding boxes. You can check the `ObjDetPostprocess` code in `warboy_vision_models/warboy/yolo/postprocess.py` file and you can check the `object_detection_anchor_decoder` code in `warboy_vision_models/warboy/yolo/decoder.py` file. And we also need to use `xyxy2xywh` function to convert the bounding boxes from xyxy format to xywh format and `YOLO_CATEGORY_TO_COCO_CATEGORY` to convert the category from YOLO format to COCO format. You can check the both functions in `warboy_vision_models/tests/utils.py` file.

In this notebook, we will run inference simply, but if you want to optimize the inference speed, please refer to `warboy_vision_models/warboy/utils/process_pipeline.py` code and `warboy_vision_models/warboy/runtime/warboy_runtime.py` code.


In [None]:
async def warboy_inference(model, data_loader, preprocessor, postprocessor):
    from warboy_vision_models.tests.utils import xyxy2xywh, YOLO_CATEGORY_TO_COCO_CATEGORY

    async def task(
        runner, data_loader, preprocessor, postprocessor, worker_id, worker_num
    ):
        results = []
        for idx, (img_path, annotation) in enumerate(data_loader):
            if idx % worker_num != worker_id:
                continue

            img = cv2.imread(str(img_path))
            img0shape = img.shape[:2]
            input_, contexts = preprocessor(img)
            preds = await runner.run([input_])

            outputs = postprocessor(preds, contexts, img0shape)[0]

            bboxes = xyxy2xywh(outputs[:, :4])
            bboxes[:, :2] -= bboxes[:, 2:] / 2

            for output, bbox in zip(outputs, bboxes):
                results.append(
                    {
                        "image_id": annotation["id"],
                        "category_id": YOLO_CATEGORY_TO_COCO_CATEGORY[int(output[5])],
                        "bbox": [round(x, 3) for x in bbox],
                        "score": round(output[4], 5),
                    }
                )
        return results

    from furiosa.runtime import create_runner

    worker_num = 16
    async with create_runner(model, worker_num=32, compiler_config={"use_program_loading": True}) as runner:
        results = await asyncio.gather(
            *(
                task(
                    runner,
                    data_loader,
                    preprocessor,
                    postprocessor,
                    idx,
                    worker_num,
                )
                for idx in range(worker_num)
            )
        )
    return sum(results, [])

In [None]:
def test_yolov8n(
    dataset: str = "datasets/coco/val2017",
    annotation: str = "datasets/coco/annotations/instances_val2017.json",
    onnx_i8_path: str = "models/quantized_onnx/object_detection/yolov8n_i8.onnx",
):
    """
    Inference using the quantized ONNX model.

    Args:
        dataset (str): Path to the dataset.
        onnx_i8 (str): Path to the quantized ONNX model.
    """

    from warboy_vision_models.warboy.yolo.preprocess import YoloPreProcessor
    from warboy_vision_models.warboy.yolo.anchor_process import object_detection_anchor_decoder
    from warboy_vision_models.tests.utils import MSCOCODataLoader
    from pycocotools.cocoeval import COCOeval

    preprocessor = YoloPreProcessor(
        new_shape=input_shape[2:],
        tensor_type="uint8"
    )
    postprocessor = object_detection_anchor_decoder(
        model_name="yolov8n",
        conf_thres=0.001,   # confidence threshold
        iou_thres=0.7,  # NMS IOU threshold
        anchors=[None],
        use_tracker=False
    )


    data_loader = MSCOCODataLoader(
        Path(dataset),
        Path(annotation),
        preprocessor,
        input_shape,
    )

    results = asyncio.run(
        warboy_inference(onnx_i8_path, data_loader, preprocessor, postprocessor)
    )

    coco_result = data_loader.coco.loadRes(results)
    coco_eval = COCOeval(data_loader.coco, coco_result, "bbox")
    coco_eval.evaluate()
    coco_eval.accumulate()
    coco_eval.summarize()

    print("mAP: ", coco_eval.stats[0])

    return


In [None]:
nest_asyncio.apply()

test_yolov8n(
    dataset="datasets/coco/val2017",
    annotation="datasets/coco/annotations/instances_val2017.json",
    onnx_i8_path = "models/quantized_onnx/object_detection/yolov8n_i8.onnx",
)


## NPU Profiling

In Furiosa SDK, we provide a profiling tool to analyze the performance of the model. You can use the profiling tool to measure the time taken by each operation in the model and identify the bottlenecks in the model.

After running the command, the trace file will be saved in the `models/trace` directory. You can visualize the trace analysis using the Chrome web browser's Trace Event Profiling Tool (chrome://tracing). This will help you understand the performance of the model and optimize it for better performance.


There can be `OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full` warning messages when writing the trace file. But you can ignore them.


In [None]:
def test_warboy_performance(input_shape, task, device, model, onnx_i8_path):
    from furiosa.runtime.sync import create_runner
    from furiosa.runtime.profiler import profile
    
    input_shape = input_shape
    trace_dir = os.path.join("models/trace", task)

    if not os.path.exists(trace_dir):
        os.makedirs(trace_dir)

    trace_file = os.path.join(trace_dir, model + "_" + device + ".log")
    dummy_input = np.uint8(np.random.randn(*input_shape))
    
    with open(trace_file, mode="w") as tracing_file:
        with profile(file=tracing_file) as profiler:
            with create_runner(
                onnx_i8_path, device=device, compiler_config={"use_program_loading": True}
            ) as runner:
                for _ in range(30):
                    runner.run([dummy_input])

    return


In [None]:
input_shape = (1, 3, 640, 640)
task = "object_detection"
device = "warboy(1)*1"
model = "yolov8n"
onnx_i8_path = "models/quantized_onnx/object_detection/yolov8n_i8.onnx"

test_warboy_performance(
    input_shape=input_shape,
    task=task,
    device=device,
    model=model,
    onnx_i8_path=onnx_i8_path
)
