# How to Use Warboy Vision Models

This notebook demonstrates how to use this project with the YOLOv8n object detection model.

## Prerequisites

### Create a Python Environment

To follow this tutorial, you need Python 3.9 or higher. If you already have your own Python 3.9 or higher environment, you can skip this step. Otherwise, you can create a new Python environment using Conda.

First, if you don't have Conda installed, you can install Miniconda. Here are the commands to install Miniconda:
```console
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh ./Miniconda3-latest-Linux-x86_64.sh
$ rm -rf Miniconda3-latest-Linux-x86_64.sh
$ source ~/.bashrc
```


After installing Miniconda, you can create a new Python environment using the following commands:

```console
$ conda create -n furiosa-3.9 python=3.9
$ conda activate furiosa-3.9
```


### Install Driver, Firmware, and Runtime packages

First, you can install the Driver, Firmware, and Runtime packages for the NPU device through the APT server. To do this, you need to set up the APT server. You can follow the instructions in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/installation.html) or [English](https://developer.furiosa.ai/docs/latest/en/software/installation.html).

After setting up the APT server, you can install the packages using the following command:

```console
$ sudo apt-get update && sudo apt-get install -y furiosa-driver-warboy furiosa-libnux
```


Next, you can check the NPU devices in your environment using the following command:

```console
$ sudo apt-get install -y furiosa-toolkit
$ furiosactl info --format full
```


### Install Furiosa Python SDK

The Furiosa SDK can be installed using pip. You can check the details in [Korean](https://furiosa-ai.github.io/docs/latest/ko/) or [English](https://furiosa-ai.github.io/docs/latest/en/).

```console
$ pip install 'furiosa-sdk[full]'
```

### Install Datasets

If you have already downloaded the dataset, you can skip this step. However, please make sure to check the `CHECK` marks and verify the path of the dataset.


In this notebook, we will use the COCO dataset. You can download the COCO dataset using the following command:

```console
./coco.sh
```
This will download the COCO dataset and save it in the `datasets/coco` directory.


### Install required packages

You can install the required packages using the following command:

```console
$ pip install -r requirements.txt
```


### Build Yolo Decoders

In this project, C++ decoders are included for post-processing. You can build the decoders using the following command:

```console
$ ./build.sh
```


### Install the Project

To install the project as a module, you can use the following command:

```console
$ pip install .
```
This will install the project, allowing you to use the `warboy-vision` command line tool to run models on Warboy. You can check the details of the command line tool using the following command:

```console
$ warboy-vision --help
```


## Prepare Model

First, you need to prepare the configuration file for the model you want to use. In this notebook, we will use the YOLOv8n model. You can check the configuration file in `yolov8n.yaml`.

In [None]:
from src.warboy.tools.onnx_tools import OnnxTools
from src.warboy import get_model_params_from_cfg

cfg = '../cfg/yolov8n.yaml'
onnx_tools = OnnxTools(cfg)
param = get_model_params_from_cfg(cfg)


### Export ONNX

To run the model on Warboy, you need a quantized ONNX model. First, let's export the YOLOv8n model to ONNX format.


For YOLO models, due to a drop in accuracy after quantization caused by the concatenation operator (which combines class results and box results along the channel axis at each anchor), we need to modify the model by removing the decoding part from the model output. You can do this by giving the `need_edit` argument as `True` when exporting the model to ONNX format.


In [None]:
onnx_tools.export_onnx(need_edit=True)


### Quantize Model

Next, let's quantize the ONNX model. Quantization is a technique that converts a high-precision (usually FP32) DL model to a lower precision (here, INT8), reducing the model size and memory cost, and improving the inference speed.

During the quantization phase, we need to prepare the calibration dataset. The calibration dataset is used to calculate the calibration range. In this notebook, we will use COCO dataset for calibration.

The calibration method and the number of calibration data configured in `tutorials/cfg/yolov8n.yaml` file can be changed. You can see the specifics of quantization and calibration methods options in [Korean](https://developer.furiosa.ai/docs/latest/ko/software/quantization.html) or [English](https://developer.furiosa.ai/docs/v0.5.0/en/advanced/quantization.html).


In [None]:
onnx_tools.quantize()


## Run Inference

### End to End Performance Test

Now, we will run the end-to-end performance test. This will run inference using the model on the COCO dataset and measure the mAP.

In [None]:
from src.test_scenarios.e2e.object_det import test_warboy_yolo_accuracy_det

test_warboy_yolo_accuracy_det(
    cfg = cfg,
    image_dir = "../../datasets/coco/val2017",  # CHECK you may change this path to your own path
    annotation_file = "../../datasets/coco/annotations/instances_val2017.json",  # CHECK you may change this path to your own path
)


### Web Demo with Fast API

To run the web demo, you need to prepare the demo configuration file. You need to **set the `input_path` in `tutorials/cfg/demo.yaml` file** to the path of the video you want to use for the web demo.

After running the web demo, you can access it at `http://localhost:20001` or `http://0.0.0.0:20001`.

If you're using a remote server, you should port forward the port 20001 to your local machine. You can do this by running the following command **at your local machine**:

```console
$ ssh -L 20001:localhost:20001 <username>@<ip_address>
```


In [None]:
from src.demo.demo import run_web_demo

demo_cfg_path = '../cfg/demo.yaml'

run_web_demo(
    cfg_path = demo_cfg_path
)


### NPU Profiling

In Furiosa SDK, we provide a profiling tool to analyze the NPU performance of the model. The profiling tool is used to measure the execution time of each operation and identify performance bottlenecks.

After running the command, the trace file will be saved in the `tutorials/models/trace` directory. You can visualize the trace analysis using the Chrome web browser's Trace Event Profiling Tool (chrome://tracing). This will help you understand the performance of the model and optimize it for better performance.


You may encounter `OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full` warning messages when writing the trace file. However, you can safely ignore them.


In [None]:
from src.test_scenarios.e2e.npu_performance import test_warboy_performance

test_warboy_performance(
    cfg = cfg,
    num_device = 1,
    trace_file_dir="../models/trace",
)
