# How to Use Warboy Vision Models

This notebook demonstrates how to use this project with yolov8n object detection model.

## Prerequisites

### Make Python Environment

To follow this tutorial, you need Python 3.8 or higher. If you already have your own Python environment, you can skip this step. Otherwise, you can create a new Python environment using Conda.

First, here are the commands to install Miniconda:
```console
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh ./Miniconda3-latest-Linux-x86_64.sh
$ rm -rf Miniconda3-latest-Linux-x86_64.sh
$ source ~/.bashrc
```


After installing Miniconda, you can create a new Python environment and install the Furiosa Python SDK using the following commands:

```console
$ conda create -n furiosa-3.9 python=3.9
$ conda activate furiosa-3.9
```


### Install Driver, Firmware, and Runtime packages

First, you can install the Driver, Firmware, and Runtime packages for the NPU device through the APT server. To do this, you need to set up the APT server. You can follow the instructions in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/installation.html) or [English](https://developer.furiosa.ai/docs/latest/en/software/installation.html).

After setting up the APT server, you can install the packages using the following command:

```console
$ sudo apt-get update && sudo apt-get install -y furiosa-driver-warboy furiosa-libnux
```


Next, you can check NPU devices on your environment using the following command:

```console
$ sudo apt-get install -y furiosa-toolkit
$ furiosactl info --format full
```


### Install Furiosa Python SDK

The Furiosa SDK can be installed following instructions on [Korean](https://furiosa-ai.github.io/docs/latest/ko/) or [English](https://furiosa-ai.github.io/docs/latest/en/).

```console
$ pip install 'furiosa-sdk[full]'
```

### Install Datasets

If you have your own dataset or already downloaded the dataset, you can skip these steps.


In this notebook, we will use the COCO dataset. You can download the COCO dataset using the following command:

```console
./coco.sh
```
This will download the COCO dataset and save it in the `datasets/coco` directory.


Also, to run web demo, you need to install the demo videos. You can download the demo videos using the following command:

```console
./demo_videos.sh
```

This will download the demo videos and save them in the `datasets/demo_videos` directory. This includes the object detection and instacne segmentation videos in `datasets/demo_videos/detection` and pose estimation videos in `datasets/demo_videos/estimation` directory.

### Install required packages

You can install the required packages using the following command:

```console
$ pip install -r requirements.txt
```


### Build Yolo Decoders

In this project, C++ decoders are included for post-processing. You can build the decoders using the following command:

```console
$ ./build.sh
```


### Installing a Custom CLI Tool (Optional)

In this notebook, we won't be using the custom CLI tool, but if you want to use it, you can install our custom CLI tool to run vision models on Warboy using the following command:

```console
$ pip install .
```
This will install the `warboy-vision` command line tool, which you can use to run models on Warboy.

## Prepare Model

First, you need to prepare the configuration file for the model you want to use. In this notebook, we will use the YOLOv8n model. You can check the configuration file in `yolov8n.yaml` file.

In [1]:
from src.warboy.tools.onnx_tools import OnnxTools
from src.warboy import get_model_params_from_cfg

cfg = '../cfg/yolov8n.yaml'
onnx_tools = OnnxTools(cfg)
param = get_model_params_from_cfg(cfg)


  from .autonotebook import tqdm as notebook_tqdm


### Export ONNX

To run the model on Warboy, you need a quantized ONNX model. First, let's export the YOLOv8n model to ONNX format.


For yolo models, due to a drop in accuracy after quantization caused by the concatenation operator (which combines class results and box results along the channel axis at each anchor), we need to modified the model by removing the decoding part from the model output. You can do this by giving the `need_edit` argument as `True` when exporting the model to ONNX format.


In [2]:
onnx_tools.export_onnx(need_edit=True)


Load PyTorch Model from ../../models/weight/object_detection/yolov8n.pt...
Export ONNX ../../models/onnx/object_detection/yolov8n.onnx...


  if self.format != "imx" and (self.dynamic or self.shape != shape):
  for i, stride in enumerate(strides):
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
  _C._jit_pass_onnx_graph_shape_type_inference(
  _C._jit_pass_onnx_graph_shape_type_inference(


Export ONNX for yolov8n >> ../../models/onnx/object_detection/yolov8n.onnx


True

### Quantize Model

Next, let's quantize the ONNX model. Quantization is a technique that converts a high-precision (usually FP32) DL model to a lower precision, reducing the model size and memory cost, and improving the inference speed. By quantizing the model, you can run efficient inference AI services.

In quantization phase, we need to prepare the calibration dataset. The calibration dataset is used to calibrate the quantization parameters of the model. In this notebook, we will use COCO dataset for calibration.

The calibration method and the number of calibration data configured in `yolov8n.yaml` file can be changed. You can see the specifics of quantization and calibration methods options in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/quantization.html) or [English](https://developer.furiosa.ai/docs/v0.5.0/en/advanced/quantization.html).


In [3]:
onnx_tools.quantize()


libfuriosa_hal.so --- v0.11.0, built @ 43c901f


['../../datasets/coco/val2017/000000019742.jpg', '../../datasets/coco/val2017/000000433243.jpg', '../../datasets/coco/val2017/000000292446.jpg', '../../datasets/coco/val2017/000000020571.jpg', '../../datasets/coco/val2017/000000290619.jpg', '../../datasets/coco/val2017/000000319617.jpg', '../../datasets/coco/val2017/000000427655.jpg', '../../datasets/coco/val2017/000000425226.jpg', '../../datasets/coco/val2017/000000195918.jpg', '../../datasets/coco/val2017/000000067406.jpg', '../../datasets/coco/val2017/000000520707.jpg', '../../datasets/coco/val2017/000000532058.jpg', '../../datasets/coco/val2017/000000217957.jpg', '../../datasets/coco/val2017/000000531771.jpg', '../../datasets/coco/val2017/000000056350.jpg', '../../datasets/coco/val2017/000000148999.jpg', '../../datasets/coco/val2017/000000279145.jpg', '../../datasets/coco/val2017/000000508482.jpg', '../../datasets/coco/val2017/000000129945.jpg', '../../datasets/coco/val2017/000000194724.jpg', '../../datasets/coco/val2017/0000004412

calibration...:   2%|▏         | 2/100 [00:02<01:57,  1.20s/it]


KeyboardInterrupt: 

## Run Inference

### End to End Performance Test

Now, we will run the end to end performance test. This will run the model to inference on the COCO dataset and measure the mAP.

In [4]:
from src.test_scenarios.e2e.test_object_det import test_warboy_yolo_accuracy_det

test_warboy_yolo_accuracy_det(
    model_name = 'yolov8n', 
    model = param['onnx_i8_path'], 
    input_shape= param['input_shape'], 
    image_dir = "../../datasets/coco/val2017",
    annotation_file = "../../datasets/coco/annotations/instances_val2017.json",
    anchors= param['anchors'],
)


loading annotations into memory...
Done (t=0.54s)
creating index...
index created!
0.001 0.7 [None] False
0.001 0.7 [None] False
WarboyQueueRuntime - init
WarboyQueueRuntime - init
[2m2025-04-22T09:48:44.049064Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m FuriosaRT (v0.10.5, rev: 5537afb71-modified, built at: 2025-01-10T02:06:37Z) bootstrapping ...
[2m2025-04-22T09:48:44.052261Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found furiosa-compiler (v0.10.1, rev: 8b00177, built at: 2025-01-08T03:45:42Z)
[2m2025-04-22T09:48:44.052269Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found libhal (type: warboy, v0.12.0, rev: 56530c0 built at: 2023-11-16T12:34:03Z)
[2m2025-04-22T09:48:44.052276Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-0] detected 1 NPU device(s):
[2m2025-04-22T09:48:44.065045Z[0m [32m INFO[0m [2mfuriosa_rt_core::d

Process <src.warboy.utils.image_encoder.PredictionEncoder object at 0x7f30efe59d00>:
Process <src.warboy.utils.image_encoder.PredictionEncoder object at 0x7f30efe65070>:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/src/warboy/utils/image_encoder.py", line 66, in run
    output = self.output_mux.get()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/sit

KeyboardInterrupt: 

Process <src.warboy.runtime.warboy_runtime.WarboyQueueRuntime object at 0x7f30efe50580>:
Process <src.warboy.runtime.warboy_runtime.WarboyQueueRuntime object at 0x7f30efe5d400>:
Process <src.warboy.utils.image_decoder.ImageListDecoder object at 0x7f30efe59ca0>:
Process <src.warboy.utils.image_decoder.ImageListDecoder object at 0x7f30efe65040>:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs

### Web Demo with Fast API

To run the web demo, you need to prepare the demo configuration file. You can check the configuration file in `demo_config.yaml` file.

After running the web demo, you can access the web demo at `http://localhost:20001` or `http://0.0.0.0:20001`.

If you're using a remote server, you should port foward the port 20001 to your local machine. You can do this by running the following command **at your local machine**:

```console
$ ssh -L 20001:localhost:20001 <username>@<ip_address>
```


In [6]:
from src.demo.demo import run_web_demo

demo_cfg_path = '../cfg/demo.yaml'

run_web_demo(
    cfg_path = demo_cfg_path
)


0.25Init ByteTrack! 0.7 [None] True

WarboyQueueRuntime - init


Process Process-26:
Traceback (most recent call last):
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/src/warboy/viewer.py", line 80, in run_viewer
    uvicorn.run(*args, **kwargs)
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/uvicorn/main.py", line 580, in run
    server.run()
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/root/miniconda3/envs/furi

[2m2025-04-22T09:49:19.704989Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m FuriosaRT (v0.10.5, rev: 5537afb71-modified, built at: 2025-01-10T02:06:37Z) bootstrapping ...
[2m2025-04-22T09:49:19.707581Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found furiosa-compiler (v0.10.1, rev: 8b00177, built at: 2025-01-08T03:45:42Z)
[2m2025-04-22T09:49:19.707586Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found libhal (type: warboy, v0.12.0, rev: 56530c0 built at: 2023-11-16T12:34:03Z)
[2m2025-04-22T09:49:19.707592Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-0] detected 1 NPU device(s):
[2m2025-04-22T09:49:19.732568Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m - [0] npu:1:0 (warboy-b0, 64dpes, firmware: 1.8.0, 7550956)
[2m2025-04-22T09:49:19.732690Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::e

  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/uvicorn/importer.py", line 22, in import_from_string
    raise exc from None
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
  File "/root/miniconda3/envs/furiosa-3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)


[2m2025-04-22T09:49:19.762624Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Sess-898f8f3c] compiling the model (target: warboy-b0, 64dpes, file: yolov8n_i8.onnx, size: 12.1 MiB)


  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load


[2m2025-04-22T09:49:20.069162Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Sess-898f8f3c] the model compile is successful (took 0 secs)


  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked


[2m2025-04-22T09:49:20.132606Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-0] created 32 NPU threads on npu:1:0 (DRAM: 85.3 MiB/16.0 GiB, SRAM: 11.3 MiB/64.0 MiB)


  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'warboy'


KeyboardInterrupt: 

### NPU Profiling

In Furiosa SDK, we provide a profiling tool to analyze the performance of the model. You can use the profiling tool to measure the time taken by each operation in the model and identify the bottlenecks in the model.

After running the command, the trace file will be saved in the `models/trace` directory. You can visualize the trace analysis using the Chrome web browser's Trace Event Profiling Tool (chrome://tracing). This will help you understand the performance of the model and optimize it for better performance.


There can be `OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full` warning messages when writing the trace file. But you can ignore them.


In [None]:
from src.test_scenarios.e2e.test_npu_performance import test_warboy_performance

test_warboy_performance(
    cfg = cfg,
    num_device = 1,
)
