# MircoKWS deployment using TVM's Python API

While the `tvmc` utility explained in `tutorial_tvmc.md` is very easy to use, in some situations it is more straightforward to interface with TVM directly via a Python script. While the tutorial in `tutorial_tvmc.md` contains a step by step guide on how to get started with TVM by compiling on the command line, this Jupyter notebook will introduce the TVMC Python API. It can be used analogously to the `tvmc` command line utility. More information can be found https://tvm.apache.org/docs/tutorial/tvmc_python.html

Only the flow for generating kernels for an embedded device are covered at the moment. The used executor and features are aligned with the examples in `tutotial_tvmc.md`.

## Disclaimer

This tutorial is heavily inspired by the official "microTVM with TFLite Models" How-To in the TVM documentation (https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_tflite.html)

## Setting up the dependencies

Please follow the instructions at the top of `install_tvm.md` to
- Install required software
- Setup and activate a virtual python environment
- Install TVM
  - via `tlcpack` python package, or
  - by building it manually from source (See https://tvm.apache.org/docs/install/from_source.html)

Make sure to activate the virtual environment before launching the jupyter kernel!

The following cell is only required for custom TVM builds:

In [None]:
# import sys

#sys.path.append("/PATH/TO/TVM/python")

Import Python dependencies 

In [None]:
import os
import json
import tarfile
import pathlib
import tempfile
import numpy as np
from pathlib import Path

import tflite
import tvm
from tvm import relay, transform

## MicroKWS Flow using TVM

### Load and prepare the Pre-Trained Model

First, define the path to the TFLite Model

In [None]:
TFLITE_MODEL = "data/micro_kws_student_quantized.tflite"

Next, load the file into a binary buffer.

In [None]:
tflite_model_buf = open(TFLITE_MODEL, "rb").read()

Initialize the TFLite model

In [None]:
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)

Provide information on the input tensors (Name, DataType and Shape)

In [None]:
input_tensor = "serving_default_input:0"
input_shape = (1, 1960)
input_dtype = "int8"

Convert TFlite Model to Relay IR

In [None]:
mod, params = relay.frontend.from_tflite(
    tflite_model, shape_dict={input_tensor: input_shape}, dtype_dict={input_tensor: input_dtype}
)

### Defining the runtime, target and executor

The use target device is a generic MicroTVM target. We are using the CRT runtime in combination with the AoT executor as it is more lightweight compared to the full C++ runtime.

In [None]:
TARGET = tvm.target.target.micro("host")
RUNTIME = tvm.relay.backend.Runtime("crt", {"system-lib": False})
EXECUTOR = tvm.relay.backend.Executor(
    "aot", {"interface-api": "c", "unpacked-api": True, "link-params": True}
)

### Define pass configurations

These options will be passed to the `relay.build()` function in a later step.

In [None]:
cfg = {
    "tir.disable_vectorize": True,
    "tir.usmp.enable": True,
    "tir.usmp.algorithm": "hill_climb",
}

For more a detailed explanation of these options, see the `--pass-config` flags in `tutorial_tvmc.md`!

### Apply Transformations to the model

TFLite models typically use the `NHWC` format to store the weight of a convolutional layer. However, in some situations (especially when performing autotuning) a schedule using a `HCHW` layout can be more efficient. The following code, therefore, applies passes to the relay modules, which transform the weights.

In [None]:
desired_layout = "NCHW"
desired_layouts = {
    "nn.conv2d": [desired_layout, "default"],
    "nn.conv2d_transpose": [desired_layout, "default"],
    "qnn.conv2d": [desired_layout, "default"],
}

# Convert the layout of the graph where possible.
seq = transform.Sequential(
    [
        relay.transform.RemoveUnusedFunctions(),
        relay.transform.ConvertLayout(desired_layouts),
    ]
)

with transform.PassContext(opt_level=3):
    mod = seq(mod)

### Build the Model

While this step looks pretty simple, it actually invoces the whole compilation pipeline provided by TVM. Depending on the complexity of the model and the enabled features, it might take a couple of seconds to complete.

In [None]:
with tvm.transform.PassContext(opt_level=3, config=cfg):
    module = relay.build(mod, target=TARGET, runtime=RUNTIME, executor=EXECUTOR, params=params)

### Export codegen artifacts

For MicroTVM targets we are interested in the Model Library Format (MLF) artifact as it contains the sources required to build our target software.

In [None]:
model_library_format_tar_path = Path("gen/mlf.tar")
tvm.micro.export_model_library_format(module, f"{model_library_format_tar_path}.tar")

### Optional: Use provided autotuning logs

Supply the tuning records (see tvm/data/ directory) like this and rebuild the model:

In [None]:
tuning_records_file = Path("data/micro_kws_student_tuning_log_nchw_best.txt")

with tvm.autotvm.apply_history_best(tuning_records_file):
    module_tuned = relay.build(
        mod, target=TARGET, runtime=RUNTIME, executor=EXECUTOR, params=params
    )

model_library_format_tar_path_tuned = Path("gen/mlf_tuned")
tvm.micro.export_model_library_format(module_tuned, f"{model_library_format_tar_path_tuned}.tar")

Extract the MLF archive

In [None]:
model_library_format_tar_path_tuned.mkdir(exist_ok=True)
tar = tarfile.open(f"{model_library_format_tar_path_tuned}.tar").extractall(
    model_library_format_tar_path_tuned
)

### Support for physical hardware?

MicroTVM supports a set of hardware boards which allows to directly compile, flash and run target software using a build model. However, the ESP32C3 target is currently not supported. Thus, the approach for the lab exercises is currently independent of the TVM framework.