# Advanced Topics in Inference APIs

This tutorial explains a little bit more advanced topics about Inference APIs. The followings are the main topics:
* How to specify a NPU device including *NPU core fusion*.
* Asynchronous and non-blocking inference API

## Prerequisites
To follow this tutorial, please install the following requisites.

First, you must install NPU driver, firmware, and runtime by following the instruction at [FuriosaAI Driver, Firmware, Runtime Installation Guide](https://furiosa-ai.github.io/docs/latest/ko/software/installation.html).

Then, please install the following python packages:
```sh
pip install furiosa-sdk matplotlib python-mnist
```
Or, you can run the following command to install all dependent packages for all notebook examples at once:
```sh
pip install -r requirements.txt
```

And then, let's check if your NPU device is ready as following:

In [1]:
!furiosactl info

[0m+[0m[0m------[0m[0m+[0m[0m--------[0m[0m+[0m[0m----------------[0m[0m+[0m[0m-------[0m[0m+[0m[0m---------[0m[0m+[0m[0m--------------[0m[0m+
[0m[0m[0m|[0m[0m [0m[0m[0m[1mNPU [0m [0m[0m|[0m[0m [0m[0m[0m[1mName  [0m [0m[0m|[0m[0m [0m[0m[0m[1mFirmware      [0m [0m[0m|[0m[0m [0m[0m[0m[1mTemp.[0m [0m[0m|[0m[0m [0m[0m[0m[1mPower  [0m [0m[0m|[0m[0m [0m[0m[0m[1mPCI-BDF     [0m [0m[0m|[0m[0m
[0m[0m[0m+[0m[0m------[0m[0m+[0m[0m--------[0m[0m+[0m[0m----------------[0m[0m+[0m[0m-------[0m[0m+[0m[0m---------[0m[0m+[0m[0m--------------[0m[0m+
[0m[0m[0m|[0m[0m [0m[0m[0mnpu0[0m [0m[0m|[0m[0m [0m[0m[0mwarboy[0m [0m[0m|[0m[0m [0m[0m[0m1.7.8, e9f371e[0m [0m[0m|[0m[0m [0m[0m[0m 42°C[0m [0m[0m|[0m[0m [0m[0m[0m10.60 W[0m [0m[0m|[0m[0m [0m[0m[0m0000:4f:00.0[0m [0m[0m|[0m[0m
[0m[0m[0m+[0m[0m------[0m[0m+[0m[0m--------[0m[0m+[0m[0

Then, let's make sure that your SDK is ready to run immediately by running the following command. If you see any error here, please follow the instructions at
* [FuriosaAI Driver, Firmware, Runtime Installation Guide](https://furiosa-ai.github.io/docs/v0.5.0/ko/software/installation.html)
* [Setting up a Python Environment](https://furiosa-ai.github.io/docs/v0.5.0/ko/software/python-sdk.html#python)

In [2]:
!python -c "from furiosa import runtime;print(runtime.__full_version__)"

Furiosa SDK Runtime 0.10.1-release (rev: 23336d5) (furiosa-rt 0.10.3 394c19392 2023-11-22T08:53:04Z)


## How to Specify a NPU device

You may need to specify a NPU device for your applications in the following cases:
* Case A: when you have more than one NPU devices
* Case B: if you want to use individual PEs separately for smaller DNN applications or a single fusioned PE

FuriosaAI SDK provides a couple of ways to specify a NPU device that your application uses. In this section, we are going to explain this feature.

### Understanding NPU IDs

NPU IDs are used across all of furioaAI SDK components. So, you need to understand how a NPU device is represented as a single NPU ID string.

`npu0`, `npu1`, `npuN` represents a single NPU device. The last digit number starts from 0, and can be increased sequentially as you add more NPUS to your machine. There are individual 2 PEs in a single NPU device. They are individually represented as `pe0` and `pe1`.

Usually, a NPU ID can represent both a certain NPU device and certain PE(s). For example, if you have 2 NPU devices and want to list all available individual PEs, they are represented by:
* `npu0pe0`
* `npu0pe1`
* `npu1pe0`
* `npu1pe1`

In Warboy, you are able to fuse 2 PEs belonging to the same NPU. 2 fused NPUs are represented by:
* `npu0pe0-1`
* `npu1pe0-1`

### Using Shell Environment Variable to Specify a NPU device

All of FuriosaAI SDKs recognize the shell environment variable `NPU_DEVNAME`. If you specify `NPU_DEVNAME` in your shell, your application will use the NPU device specified in `NPU_DEVNAME`. For example, you can specify a NPU device in your shell as following:

```sh
export NPU_DEVNAME="npu0pe0"
```

Please note that a single NPU device is occupied while another application is using the device. So, you cannot run multiple applications with the same `NPU_DEVNAME` setting.

### Using Function Option

TODO: In Python SDK, `Session` is the core class to run inferences, and it allows various options. One of the options is `device`, allowing a user to specific a NPU device for the session. If you are not familar with `Session`, you can learn from [Getting Started With Python SDK](GettingStartedWithPythonSDK.ipynb).

For example, you can specify a NPU device when you run `create_runner` / `create_queue`, as following:
```python
from furiosa.runtime import create_runner
async with create_runner('mnist-8.onnx', device="npu0pe0"): ...
```

Please note that a specific NPU device in Session option overrides the shell environment variable `NPU_DEVNAME`.

## Asynchronous Inference APIs

Asynchronous Inference API allows an user application to handle multiple inference requests through a single thread.

To use asynchronous inference APIs, please use `furiosa.runtime` instead of `furiosa.runtime.sync`.

Also, there is `queue` API in asynchronous inference API. This separates input queue and output queue.

In [11]:
# download mnist dataset
!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

--2024-11-05 14:39:29--  http://www.di.ens.fr/~lelarge/MNIST.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/~lelarge/MNIST.tar.gz [following]
--2024-11-05 14:39:30--  https://www.di.ens.fr/~lelarge/MNIST.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘MNIST.tar.gz’

MNIST.tar.gz            [              <=>   ]  33.20M  6.44MB/s    in 5.2s    

2024-11-05 14:39:36 (6.44 MB/s) - ‘MNIST.tar.gz’ saved [34813078]

MNIST/
MNIST/raw/
MNIST/raw/train-labels-idx1-ubyte
MNIST/raw/t10k-labels-idx1-ubyte.gz
MNIST/raw/t10k-labels-idx1-ubyte
MNIST/raw/t10k-images-idx3-ubyte.gz
MNIST/raw/train-images-idx3-ubyte
MNIST/raw/train-labels-idx1-ubyte.gz
MNIST/raw/t10k-images-idx3-ubyte
MNIST/raw/tra

In [11]:
from furiosa.runtime import create_queue # <- async API
from mnist import MNIST
import numpy as np
import random
import asyncio

model_path = "models/MNIST_MobileNet_v2_uint8_quant_without_avgpool_softmax.tflite"

mndata = MNIST('./MNIST/raw/')
train_images, train_labels = mndata.load_training()

async with create_queue( # <- use `async with` statement to close queue automatically
        model_path,
        worker_num=1,
        # Determine how many asynchronous requests you can submit
        # without blocking.
        input_queue_size=100,
        output_queue_size=100
    ) as (submitter, reciever):

    async def submit_data():
        # Submit the inference requests asynchronously
        for i in range(0, 5):
            idx = random.randint(0, 59999)
            input = np.array(train_images[idx:idx+1], np.uint8).reshape(1, 28, 28, 1)
            await submitter.submit(input, context=idx)

    async def recieve_result():
        # Receive the results asynchronously
        for i in range(0, 5):
            context, outputs = await reciever.recv()
            print(f"Context: {context}, Predict: {np.argmax(outputs[0])}")

    submitter_task = asyncio.create_task(submit_data())
    reciever_task = asyncio.create_task(recieve_result())

    await submitter_task
    print("Submit data is done!")
    await reciever_task
    print("Inference is done!")

[2m2024-11-05T06:19:46.322284Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m FuriosaRT (v0.10.3, rev: 394c19392, built at: 2023-11-22T08:53:04Z) bootstrapping ...
[2m2024-11-05T06:19:46.324742Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found furiosa-compiler (v0.10.1, rev: 8b00177, built at: 2024-05-28T06:18:01Z)
[2m2024-11-05T06:19:46.324747Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found libhal (type: warboy, v0.12.0, rev: 56530c0 built at: 2023-11-16T12:34:03Z)
[2m2024-11-05T06:19:46.324749Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-8] detected 1 NPU device(s):
[2m2024-11-05T06:19:46.342628Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m - [0] npu:5:0-1 (warboy-b0-2pe, 128dpes, firmware: 1.7.8, e9f371e)
[2m2024-11-05T06:19:46.342814Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::eve