# ONNX Runtime: Tutorial for TVM execution provider

This notebook shows a simple example for model inference with TVM EP.


#### Tutorial Roadmap:
1. Prerequistes
2. Accuracy check for TVM EP
3. Configuration options
4. Support precompiled model

## 1. Prerequistes

Make sure that you have installed all the necessary dependencies described in the corresponding paragraph of the documentation.

Also, make sure you have the `tvm` and `onnxruntime-tvm` packages in your pip environment. 

If you are using `PYTHONPATH` variable expansion, make sure it contains the following paths: `<path_to_msft_onnxrt>/onnxruntime/cmake/external/tvm_update/python` and `<path_to_msft_onnxrt>/onnxruntime/build/Linux/Release`.

### Common import

These packages can be delivered from standard `pip`.

In [1]:
import os
import onnx
import tempfile
import numpy as np
from typing import List, AnyStr
from onnx import ModelProto, helper, checker, mapping

### Specialized import

It is better to collect these packages from source code in order to clearly understand what is available to you right now.

In [2]:
import onnxruntime

import tvm
import tvm.relay
import tvm.testing
import tvm.runtime
import tvm.runtime.vm
import tvm.relay.backend.vm
import tvm.contrib.download

### Helper functions for working with ONNX ModelProto

This set of helper functions allows you to recognize the meta information of the models. This information is needed for more versatile processing of ONNX models.

In [3]:
def get_onnx_input_names(model: ModelProto) -> List[AnyStr]:
    inputs = [node.name for node in model.graph.input]
    initializer = [node.name for node in model.graph.initializer]
    inputs = list(set(inputs) - set(initializer))
    return sorted(inputs)


def get_onnx_output_names(model: ModelProto) -> List[AnyStr]:
    return [node.name for node in model.graph.output]


def get_onnx_input_types(model: ModelProto) -> List[np.dtype]:
    input_names = get_onnx_input_names(model)
    return [
        mapping.TENSOR_TYPE_TO_NP_TYPE[node.type.tensor_type.elem_type]
        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names
    ]


def get_onnx_input_shapes(model: ModelProto) -> List[List[int]]:
    input_names = get_onnx_input_names(model)
    return [
        [dv.dim_value for dv in node.type.tensor_type.shape.dim]
        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names
    ]


def get_random_model_inputs(model: ModelProto) -> List[np.ndarray]:
    input_shapes = get_onnx_input_shapes(model)
    input_types = get_onnx_input_types(model)
    assert len(input_types) == len(input_shapes)
    inputs = [np.random.uniform(size=shape).astype(dtype) for shape, dtype in zip(input_shapes, input_types)]
    return inputs

### Wrapper helper functions for Inference

Wrapper helper functions for running model inference using ONNX Runtime EP.

In [4]:
def get_onnxruntime_output(model: ModelProto, inputs: List, provider_name: AnyStr) -> np.ndarray:
    output_names = get_onnx_output_names(model)
    input_names = get_onnx_input_names(model)
    assert len(input_names) == len(inputs)
    input_dict = {input_name: input_value for input_name, input_value in zip(input_names, inputs)}

    inference_session = onnxruntime.InferenceSession(model.SerializeToString(), providers=[provider_name])
    output = inference_session.run(output_names, input_dict)

    # Unpack output if there's only a single value.
    if len(output) == 1:
        output = output[0]
    return output


def get_cpu_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:
    return get_onnxruntime_output(model, inputs, "CPUExecutionProvider")


def get_tvm_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:
    return get_onnxruntime_output(model, inputs, "TvmExecutionProvider")

### Helper function for checking accuracy

This function uses the TVM API to compare two output tensors. The tensor obtained using the `CPUExecutionProvider` is used as a reference.

If a mismatch is found between tensors, an appropriate exception will be thrown.

In [5]:
def verify_outputs(
    lhs: List[np.ndarray],
    rhs: List[np.ndarray],
    rtol: float = 5e-5,
    atol: float = 5e-5
) -> None:
    for lhs_tensor, rhs_tensor in zip(lhs, rhs):
        tvm.testing.assert_allclose(lhs_tensor, rhs_tensor, rtol=rtol, atol=atol)
        assert lhs_tensor.dtype == rhs_tensor.dtype
    print("Same output, congratulations!")

In [6]:
def verify_with_ort_with_inputs(
    model,
    inputs,
    out_shape=None,
    opset=None,
    freeze_params=False,
    dtype="float32",
    rtol=1e-5,
    atol=1e-5,
    opt_level=1,
):
    if opset is not None:
        model.opset_import[0].version = opset

    ort_out = get_cpu_onnxruntime_output(model, inputs)
    tvm_out = get_tvm_onnxruntime_output(model, inputs)
    verify_outputs(ort_out, tvm_out, rtol, atol)

### Helper functions for download models

These functions use the TVM API to download models from the ONNX Model Zoo.

In [7]:
BASE_MODEL_URL = "https://github.com/onnx/models/raw/master/"
MODEL_URL_COLLECTION = {
    "ResNet50-v1": "vision/classification/resnet/model/resnet50-v1-7.onnx",
    "ResNet50-v2": "vision/classification/resnet/model/resnet50-v2-7.onnx",
    "SqueezeNet-v1.1": "vision/classification/squeezenet/model/squeezenet1.1-7.onnx",
    "SqueezeNet-v1.0": "vision/classification/squeezenet/model/squeezenet1.0-7.onnx",
    "Inception-v1": "vision/classification/inception_and_googlenet/inception_v1/model/inception-v1-7.onnx",
    "Inception-v2": "vision/classification/inception_and_googlenet/inception_v2/model/inception-v2-7.onnx",
}


def get_model_url(model_name):
    return BASE_MODEL_URL + MODEL_URL_COLLECTION[model_name]


def get_name_from_url(url):
    return url[url.rfind("/") + 1 :].strip()


def find_of_download(model_name):
    model_url = get_model_url(model_name)
    model_file_name = get_name_from_url(model_url)
    return tvm.contrib.download.download_testdata(model_url, model_file_name, module="models")

## 2. Accuracy check for TVM EP 

This section will check the accuracy. The check will be to compare the output tensors for `CPUExecutionProvider` and `TvmExecutionProvider`. See the description of `verify_with_ort_with_inputs` function used above.


### Check for simple architectures

In [8]:
def get_two_input_model(op_name: AnyStr) -> ModelProto:
    dtype = "float32"
    in_shape = [1, 2, 3, 3]
    in_type = mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype(dtype)]
    out_shape = in_shape
    out_type = in_type

    layer = helper.make_node(op_name, ["in1", "in2"], ["out"])
    graph = helper.make_graph(
        [layer],
        "two_input_test",
        inputs=[
            helper.make_tensor_value_info("in1", in_type, in_shape),
            helper.make_tensor_value_info("in2", in_type, in_shape),
        ],
        outputs=[
            helper.make_tensor_value_info(
                "out", out_type, out_shape
            )
        ],
    )
    model = helper.make_model(graph, producer_name="two_input_test")
    checker.check_model(model, full_check=True)
    return model

In [9]:
onnx_model = get_two_input_model("Add")
inputs = get_random_model_inputs(onnx_model)
verify_with_ort_with_inputs(onnx_model, inputs)
print("****************** Success! ******************")

Same output, congratulations!
****************** Success! ******************


### Check for DNN architectures 

In [10]:
def get_onnx_model(model_name):
    model_path = find_of_download(model_name)
    onnx_model = onnx.load(model_path)
    return onnx_model

In [11]:
model_name = "ResNet50-v1"

onnx_model = get_onnx_model(model_name)
inputs = get_random_model_inputs(onnx_model)
verify_with_ort_with_inputs(onnx_model, inputs)
print("****************** Success! ******************")

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.


Same output, congratulations!
****************** Success! ******************


## 3. Configuration options

This section shows how you can configure TVM EP using custom options. For more details on the options used, see the corresponding section of the documentation.

In [12]:
provider_name = "TvmExecutionProvider"
provider_options = dict(
    target="llvm -mtriple=x86_64-linux-gnu",
    target_host="llvm -mtriple=x86_64-linux-gnu",
    opt_level=3,
    freeze_weights=True,
    tuning_file_path="",
    tuning_type="Ansor",
)

In [13]:
model_name = "ResNet50-v1"
onnx_model = get_onnx_model(model_name)
input_dict = {
    input_name: input_value for input_name, input_value in zip(
        get_onnx_input_names(onnx_model),
        get_random_model_inputs(onnx_model),
    )
}
output_names = get_onnx_output_names(onnx_model)

In [14]:
tvm_session = onnxruntime.InferenceSession(
    onnx_model.SerializeToString(),
    providers=[provider_name],
    provider_options=[provider_options],
)
output = tvm_session.run(output_names, input_dict)[0]
print(f"****************** Output shape: {output.shape} ******************")

****************** Output shape: (1, 1000) ******************


## 4. Support precompiled model

Wrapper functions that allow you to compile the model and save it in the desired format.

In [15]:
def compile_virtual_machine(model: onnx.ModelProto, target_str: AnyStr) -> tvm.runtime.vm.Executable:
    ir_mod, params = tvm.relay.frontend.from_onnx(
        model,
        opset=model.opset_import[0].version,
        freeze_params=True,
    )
    target = tvm.target.Target(target=target_str, host=target_str)
    return tvm.relay.backend.vm.compile(ir_mod, target)


def serialize_virtual_machine(vm_exec: tvm.runtime.vm.Executable) -> AnyStr:
    temp_directory = tempfile.mkdtemp()
    path_consts = os.path.join(temp_directory, "consts")
    vm_exec.move_late_bound_consts(path_consts, byte_limit=256)
    lib_path = os.path.join(temp_directory, f"model.so")
    code_path = os.path.join(temp_directory, f"model.ro")
    code, lib = vm_exec.save()
    lib.export_library(lib_path)
    with open(code_path, "wb") as fo:
        fo.write(code)
    return temp_directory

Preparation of the ONNX model.

In [16]:
model_name = "ResNet50-v1"
onnx_model = get_onnx_model(model_name)
input_dict = {
    input_name: input_value for input_name, input_value in zip(
        get_onnx_input_names(onnx_model),
        get_random_model_inputs(onnx_model),
    )
}
output_names = get_onnx_output_names(onnx_model)

Compiling the ONNX model using `VirtualMachine` (TVM).

In [17]:
compiled_vm_exec = compile_virtual_machine(onnx_model, target_str="llvm")

In [18]:
so_folder = serialize_virtual_machine(compiled_vm_exec)

Preparing `ProviderOptions` and launching `TVM EP` inference.

In order to use the precompiled model, you only need to pass two options:
* **executor** - `vm` (`VirtualMachine`) must be used as a value (this functionality is not supported for `GraphExecutor`);
* **so_folder** - as a value, you must pass the path to the directory where the files of the precompiled model are located.

In [19]:
provider_name = "TvmExecutionProvider"
provider_options = dict(
    executor="vm",
    so_folder=so_folder,
)

In [20]:
tvm_session = onnxruntime.InferenceSession(
    onnx_model.SerializeToString(),
    providers=[provider_name],
    provider_options=[provider_options],
)
tvm_output = tvm_session.run(output_names, input_dict)

Let's make sure that the output values match those that can be obtained through `CPUExecutionProvider`:

In [21]:
verify_outputs(
    tvm_output[0],
    get_cpu_onnxruntime_output(
        onnx_model,
        input_dict.values()
    ),
)

Same output, congratulations!
