# ONNX Runtime: Tutorial for TVM execution provider

This notebook shows a simple example for model inference with TVM EP.


#### Tutorial Roadmap:
1. Prerequistes
2. Accuracy check for TVM EP
3. Configuration options

## 1. Prerequistes

Make sure that you have installed all the necessary dependencies described in the corresponding paragraph of the documentation.

Also, make sure you have the `tvm` and `onnxruntime-tvm` packages in your pip environment. 

If you are using `PYTHONPATH` variable expansion, make sure it contains the following paths: `<path_to_msft_onnxrt>/onnxruntime/cmake/external/tvm_update/python` and `<path_to_msft_onnxrt>/onnxruntime/build/Linux/Release`.

### Common import

These packages can be delivered from standard `pip`.

In [1]:
import onnx
import numpy as np
from typing import List, AnyStr
from onnx import ModelProto, helper, checker, mapping

### Specialized import

It is better to collect these packages from source code in order to clearly understand what is available to you right now.

In [2]:
import tvm.testing
from tvm.contrib.download import download_testdata
import onnxruntime.providers.tvm   # nessesary to register tvm_onnx_import_and_compile and others

### Helper functions for working with ONNX ModelProto

This set of helper functions allows you to recognize the meta information of the models. This information is needed for more versatile processing of ONNX models.

In [3]:
def get_onnx_input_names(model: ModelProto) -> List[AnyStr]:
    inputs = [node.name for node in model.graph.input]
    initializer = [node.name for node in model.graph.initializer]
    inputs = list(set(inputs) - set(initializer))
    return sorted(inputs)


def get_onnx_output_names(model: ModelProto) -> List[AnyStr]:
    return [node.name for node in model.graph.output]


def get_onnx_input_types(model: ModelProto) -> List[np.dtype]:
    input_names = get_onnx_input_names(model)
    return [
        mapping.TENSOR_TYPE_TO_NP_TYPE[node.type.tensor_type.elem_type]
        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names
    ]


def get_onnx_input_shapes(model: ModelProto) -> List[List[int]]:
    input_names = get_onnx_input_names(model)
    return [
        [dv.dim_value for dv in node.type.tensor_type.shape.dim]
        for node in sorted(model.graph.input, key=lambda node: node.name) if node.name in input_names
    ]


def get_random_model_inputs(model: ModelProto) -> List[np.ndarray]:
    input_shapes = get_onnx_input_shapes(model)
    input_types = get_onnx_input_types(model)
    assert len(input_types) == len(input_shapes)
    inputs = [np.random.uniform(size=shape).astype(dtype) for shape, dtype in zip(input_shapes, input_types)]
    return inputs

### Wrapper helper functions for Inference

Wrapper helper functions for running model inference using ONNX Runtime EP.

In [4]:
def get_onnxruntime_output(model: ModelProto, inputs: List, provider_name: AnyStr) -> np.ndarray:
    output_names = get_onnx_output_names(model)
    input_names = get_onnx_input_names(model)
    assert len(input_names) == len(inputs)
    input_dict = {input_name: input_value for input_name, input_value in zip(input_names, inputs)}

    inference_session = onnxruntime.InferenceSession(model.SerializeToString(), providers=[provider_name])
    output = inference_session.run(output_names, input_dict)

    # Unpack output if there's only a single value.
    if len(output) == 1:
        output = output[0]
    return output


def get_cpu_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:
    return get_onnxruntime_output(model, inputs, "CPUExecutionProvider")


def get_tvm_onnxruntime_output(model: ModelProto, inputs: List) -> np.ndarray:
    return get_onnxruntime_output(model, inputs, "TvmExecutionProvider")

### Helper function for checking accuracy

This function uses the TVM API to compare two output tensors. The tensor obtained using the `CPUExecutionProvider` is used as a reference.

If a mismatch is found between tensors, an appropriate exception will be thrown.

In [5]:
def verify_with_ort_with_inputs(
    model,
    inputs,
    out_shape=None,
    opset=None,
    freeze_params=False,
    dtype="float32",
    rtol=1e-5,
    atol=1e-5,
    opt_level=1,
):
    if opset is not None:
        model.opset_import[0].version = opset

    ort_out = get_cpu_onnxruntime_output(model, inputs)
    tvm_out = get_tvm_onnxruntime_output(model, inputs)
    for tvm_val, ort_val in zip(tvm_out, ort_out):
        tvm.testing.assert_allclose(ort_val, tvm_val, rtol=rtol, atol=atol)
        assert ort_val.dtype == tvm_val.dtype

### Helper functions for download models

These functions use the TVM API to download models from the ONNX Model Zoo.

In [6]:
BASE_MODEL_URL = "https://github.com/onnx/models/raw/master/"
MODEL_URL_COLLECTION = {
    "ResNet50-v1": "vision/classification/resnet/model/resnet50-v1-7.onnx",
    "ResNet50-v2": "vision/classification/resnet/model/resnet50-v2-7.onnx",
    "SqueezeNet-v1.1": "vision/classification/squeezenet/model/squeezenet1.1-7.onnx",
    "SqueezeNet-v1.0": "vision/classification/squeezenet/model/squeezenet1.0-7.onnx",
    "Inception-v1": "vision/classification/inception_and_googlenet/inception_v1/model/inception-v1-7.onnx",
    "Inception-v2": "vision/classification/inception_and_googlenet/inception_v2/model/inception-v2-7.onnx",
}


def get_model_url(model_name):
    return BASE_MODEL_URL + MODEL_URL_COLLECTION[model_name]


def get_name_from_url(url):
    return url[url.rfind("/") + 1 :].strip()


def find_of_download(model_name):
    model_url = get_model_url(model_name)
    model_file_name = get_name_from_url(model_url)
    return download_testdata(model_url, model_file_name, module="models")

## 2. Accuracy check for TVM EP 

This section will check the accuracy. The check will be to compare the output tensors for `CPUExecutionProvider` and `TvmExecutionProvider`. See the description of `verify_with_ort_with_inputs` function used above.


### Check for simple architectures

In [7]:
def get_two_input_model(op_name: AnyStr) -> ModelProto:
    dtype = "float32"
    in_shape = [1, 2, 3, 3]
    in_type = mapping.NP_TYPE_TO_TENSOR_TYPE[np.dtype(dtype)]
    out_shape = in_shape
    out_type = in_type

    layer = helper.make_node(op_name, ["in1", "in2"], ["out"])
    graph = helper.make_graph(
        [layer],
        "two_input_test",
        inputs=[
            helper.make_tensor_value_info("in1", in_type, in_shape),
            helper.make_tensor_value_info("in2", in_type, in_shape),
        ],
        outputs=[
            helper.make_tensor_value_info(
                "out", out_type, out_shape
            )
        ],
    )
    model = helper.make_model(graph, producer_name="two_input_test")
    checker.check_model(model, full_check=True)
    return model

In [8]:
onnx_model = get_two_input_model("Add")
inputs = get_random_model_inputs(onnx_model)
verify_with_ort_with_inputs(onnx_model, inputs)
print("****************** Success! ******************")

/home/onnxruntime/onnxruntime/core/providers/tvm/tvm_execution_provider.cc:480: TVM EP options:
target: llvm -mcpu=skylake-avx512
target_host: llvm -mcpu=skylake-avx512
opt level: 3
freeze weights: 1
tuning file path: 
tuning type: AutoTVM
convert layout to NHWC: 0
input tensor names: 
input tensor shapes: 


****************** Success! ******************


### Check for DNN architectures 

In [9]:
def get_onnx_model(model_name):
    model_path = find_of_download(model_name)
    onnx_model = onnx.load(model_path)
    return onnx_model

In [10]:
model_name = "ResNet50-v1"

onnx_model = get_onnx_model(model_name)
inputs = get_random_model_inputs(onnx_model)
verify_with_ort_with_inputs(onnx_model, inputs)
print("****************** Success! ******************")

/home/onnxruntime/onnxruntime/core/providers/tvm/tvm_execution_provider.cc:480: TVM EP options:
target: llvm -mcpu=skylake-avx512
target_host: llvm -mcpu=skylake-avx512
opt level: 3
freeze weights: 1
tuning file path: 
tuning type: AutoTVM
convert layout to NHWC: 0
input tensor names: 
input tensor shapes: 
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.


****************** Success! ******************


## 3. Configuration options

This section shows how you can configure TVM EP using custom options. For more details on the options used, see the corresponding section of the documentation.

In [11]:
provider_name = "TvmExecutionProvider"
provider_options = dict(target="llvm -mtriple=x86_64-linux-gnu",
                        target_host="llvm -mtriple=x86_64-linux-gnu",
                        opt_level=3,
                        freeze_weights=True,
                        tuning_file_path="",
                        tuning_type="Ansor",
)

In [12]:
model_name = "ResNet50-v1"
onnx_model = get_onnx_model(model_name)
input_dict = {input_name: input_value for input_name, input_value in zip(get_onnx_input_names(onnx_model),
                                                                         get_random_model_inputs(onnx_model))}
output_names = get_onnx_output_names(onnx_model)

In [13]:
tvm_session = onnxruntime.InferenceSession(onnx_model.SerializeToString(),
                                           providers=[provider_name],
                                           provider_options=[provider_options]
                                          )
output = tvm_session.run(output_names, input_dict)[0]
print(f"****************** Output shape: {output.shape} ******************")

/home/onnxruntime/onnxruntime/core/providers/tvm/tvm_execution_provider.cc:480: TVM EP options:
target: llvm -mtriple=x86_64-linux-gnu
target_host: llvm -mtriple=x86_64-linux-gnu
opt level: 3
freeze weights: 1
tuning file path: 
tuning type: Ansor
convert layout to NHWC: 0
input tensor names: 
input tensor shapes: 


****************** Output shape: (1, 1000) ******************
