
# Python Async Inference Tutorial - Multiple Models

This tutorial describes how to run an inference process with multiple models using `InferModel` (Async) API, which is the recommended option


**Requirements:**

* Run the notebook inside the Python virtual environment: ```source hailo_virtualenv/bin/activate```

When inside the ```virtualenv```, use the command ``jupyter-notebook <tutorial-dir>`` to open a Jupyter server that contains the tutorials (default folder on GitHub: ``hailort/libhailort/bindings/python/platform/hailo_tutorials/notebooks/``).

In [None]:
# Optional: define a callback function that will run after the inference job is done
# The callback must have a keyword argument called "completion_info".
# That argument will be passed by the framework.
def example_callback(completion_info, bindings):
    if completion_info.exception:
        # handle exception
        pass
        
    _ = bindings.output().get_buffer()

In [None]:
import numpy as np
from functools import partial
from hailo_platform import VDevice, FormatType

number_of_frames = 4
timeout_ms = 10000

def infer():
    # Create a VDevice
    params = VDevice.create_params()
    params.group_id = "SHARED" 
    with VDevice(params) as vdevice:

        # Create an infer model from an HEF:
        infer_model = vdevice.create_infer_model('../hefs/resnet_v1_18.hef')

        # Set optional infer model parameters
        infer_model.set_batch_size(2)

        # For a single input / output model, the input / output object 
        # can be accessed with a name parameter ...
        infer_model.input("resnet_v1_18/input_layer1").set_format_type(FormatType.FLOAT32)
        # ... or without
        infer_model.output().set_format_type(FormatType.FLOAT32)

        # Once the infer model is set, configure the infer model
        with infer_model.configure() as configured_infer_model:
            for _ in range(number_of_frames):
                # Create bindings for it and set buffers
                bindings = configured_infer_model.create_bindings()
                bindings.input().set_buffer(np.empty(infer_model.input().shape).astype(np.float32))
                bindings.output().set_buffer(np.empty(infer_model.output().shape).astype(np.float32))

                # Wait for the async pipeline to be ready, and start an async inference job
                configured_infer_model.wait_for_async_ready(timeout_ms=10000)

                # Any callable can be passed as callback (lambda, function, functools.partial), as long
                # as it has a keyword argument "completion_info"
                job = configured_infer_model.run_async([bindings], partial(example_callback, bindings=bindings))

            # Wait for the last job
            job.wait(timeout_ms)

### Running Multiple Models Concurrently

The models can be run concurrently using either multiple `Thread` objects or multiple `Process` objects.

In [None]:
from threading import Thread

pool = [
    Thread(target=infer),
    Thread(target=infer)
]

print('Starting async inference on multiple models using threads')

for job in pool:
    job.start()
for job in pool:
    job.join()

print('Done inference')