# Introduction To Relax

In this tutorial we would be getting a Relay model and then converting it into a Relax model using the [Relay -> Relax converter](https://github.com/tlc-pack/relax/blob/relax/python/tvm/relax/testing/relay_translator.py). We will also get a high level view into the Relax IR representation of the model and capture some basic concepts.

This tutorial will guide you to bring your Relay model into  Relax, walk you through the Relax IR representation of the model, and then build and run it.


In [1]:
from __future__ import annotations
import tvm
from tvm.relay import testing
from tvm import relax, relay
from tvm.relax.testing import relay_translator
from tvm.runtime import vm as vm_rt
import numpy as np

### Import a Relay Model
Lets get a Relay model from library. For this tutorial we have chosen MLP but feel free to play around with any Relay model of your choice. You can also use any other model in Relay for this tutorial. We will import the model with unknown batch dimension by passing `batch_size=relay.Any()`.

We can dump the Relay IR representation of the model. It uses 10 Relay operators such as `nn.batch_flatten`, `nn.dense`, `nn.relu`, etc. Since we imported model with dynamic batch dimension, the `%data` input to the model has shape `(?, 1, 28, 28)` and the output of the model is `(?, 10)` because the model has 10 classes.

In [2]:
dshape = (1, 28, 28)
relay_mod, params_dict = testing.mlp.get_workload(batch_size=relay.Any())
print(relay_mod)



def @main(%data: Tensor[(?, 1, 28, 28), float32] /* ty=Tensor[(?, 1, 28, 28), float32] */, %fc1_weight: Tensor[(128, 784), float32] /* ty=Tensor[(128, 784), float32] */, %fc1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %fc2_weight: Tensor[(64, 128), float32] /* ty=Tensor[(64, 128), float32] */, %fc2_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %fc3_weight: Tensor[(10, 64), float32] /* ty=Tensor[(10, 64), float32] */, %fc3_bias: Tensor[(10), float32] /* ty=Tensor[(10), float32] */) -> Tensor[(?, 10), float32] {
  %0 = nn.batch_flatten(%data) /* ty=Tensor[(?, 784), float32] */;
  %1 = nn.dense(%0, %fc1_weight, units=128) /* ty=Tensor[(?, 128), float32] */;
  %2 = nn.bias_add(%1, %fc1_bias, axis=-1) /* ty=Tensor[(?, 128), float32] */;
  %3 = nn.relu(%2) /* ty=Tensor[(?, 128), float32] */;
  %4 = nn.dense(%3, %fc2_weight, units=64) /* ty=Tensor[(?, 64), float32] */;
  %5 = nn.bias_add(%4, %fc2_bias, axis=-1) /* ty=Tensor[(?, 64), float32] */;
  %6 = nn.relu

### Convert Relay IR to Relax IR

Now let's convert the imported Relay module to Relax module. Relax provides a simple utility [Relay -> Relax converter](https://github.com/tlc-pack/relax/blob/relax/python/tvm/relax/testing/relay_translator.py) to convert any legacy Relay module into Relax. The converter directly lowers the relay operations to TIR implementations, so we need to provide the `target` to the converter. Let's import for `llvm` target.

We can dump the Relax IR module and take a look.

The Relay to Relax converter translates the Relay model into a dataflow block with calls to TIR implementation of Relay operators. So the Relax IR module would have a high-level function (`main`) and TensorIR functions, one for each of the operators.

In [3]:
target = tvm.target.Target("llvm")
relax_mod = relay_translator.from_relay(relay_mod["main"], target)

# To look at the entire Relax IR module you can dump it using the following code.
# print(relax_mod)


@add = primfn(var_rxplaceholder: handle, var_rxplaceholder_1: handle, var_T_add: handle) -> ()
  attr = {"global_symbol": "add", "tir.noalias": True}
  buffers = {rxplaceholder: Buffer(rxplaceholder_2: Pointer(global float32), float32, [d: int64, 128i64], []),
             rxplaceholder_1: Buffer(rxplaceholder_3: Pointer(global float32), float32, [1, 128i64], []),
             T_add: Buffer(T_add_1: Pointer(global float32), float32, [d, 128i64], [])}
  buffer_map = {var_rxplaceholder: rxplaceholder, var_rxplaceholder_1: rxplaceholder_1, var_T_add: T_add} {
  block([], "root") {
    tir.reads([])
    tir.writes([])
    for (i0: int64, 0i64, d) {
      for (i1: int64, 0i64, 128i64) {
        block([d, 128i64], "T_add") as [ax0, ax1] {
          bind(ax0, i0)
          bind(ax1, i1)
          tir.reads([rxplaceholder[ax0, ax1], rxplaceholder_1[0i64, ax1]])
          tir.writes([T_add[ax0, ax1]])
          T_add[ax0, ax1] = (rxplaceholder[ax0, ax1] + rxplaceholder_1[0i64, ax1])
      }
   

### Relax and TensorIR Functions
Woah! That's a big IR module but don't worry it will all make sense very soon. We can observe that the IR module contains a number of TIR functions and one Relax function. The Relax function has the decorator `@relax.function` and the TIR functions start with `@<func_name> = primfn(...`

Let's look at the Relax function more closely.

In [4]:
print(relax_mod["main"])

@relax.function
def main(data: Tensor((d, 1, 28, 28), "float32"), fc1_weight: Tensor((128, 784), "float32"), fc1_bias: Tensor((128,), "float32"), fc2_weight: Tensor((64, 128), "float32"), fc2_bias: Tensor((64,), "float32"), fc3_weight: Tensor((10, 64), "float32"), fc3_bias: Tensor((10,), "float32")) -> Tensor(None, "float32", ndim = 2):
    # block 0
    with relax.dataflow():
        lv = relax.call_tir(batch_flatten, (data,), (d, 784), dtype="float32")
        lv1 = relax.call_tir(dense, (lv, fc1_weight), (d, 128), dtype="float32")
        lv2 = relax.call_tir(expand_dims, (fc1_bias,), (1, 128), dtype="float32")
        lv3 = relax.call_tir(add, (lv1, lv2), (d, 128), dtype="float32")
        lv4 = relax.call_tir(relu, (lv3,), (d, 128), dtype="float32")
        lv5 = relax.call_tir(dense1, (lv4, fc2_weight), (d, 64), dtype="float32")
        lv6 = relax.call_tir(expand_dims1, (fc2_bias,), (1, 64), dtype="float32")
        lv7 = relax.call_tir(add1, (lv5, lv6), (d, 64), dtype="float32"

The Relax functions in Relax IR module are decorated with `@relax.function`. The signature of the function contains shape and dtypes of the parameters and output of the function. This is all pretty similar to Relay module so far. However, it has some fundamental new features which we briefly discuss below. They are covered in greater detail in future tutorials.

### Dataflow Block

You can observe the that the model code is encapsulated in a `with relax.dataflow()` construct. Relax enforces some guarantees within this construct such as, all the operations under the dataflow block are side-effect-free and do not contain advanced control flows(such as if-then-else) or nested scopes. A dataflow block can effectively be viewed as a computational graph embedded in the program. Note that most of the binding variables(lv, lv1, lv2, lv3) within the dataflow block are "local", which means they are only visible within the block. These variables can be viewed as "internal nodes" of the computational graph. We can mark a variable as output(`gv`), in which case the variable will be visible in later part of the program. These output variables can be viewed as output nodes in the computational graph.

Note that return gv is outside of the dataflow block. Everything that is outside of a dataflow block can have side effects. So we cannot perform optimizations such as reordering these bindings according to topological order unless we do more careful analysis We expect most of the optimizations will happen at the dataflow block level. These optimizations can be done by ML engineers who are familiar with the computational graph concept. The ability to isolate and represent effectful components also provides opportunities for more advanced optimizations for the places that need them.

### Direct Interaction with TensorIR

In Relax high-level IR can directly interact and call into lower-level TensorIR (also PackedFunc, but this example does not cover that). For example, `lv = relax.call_tir(batch_flatten, (data,), (d, 784), dtype="float32")`. This calls into the TensorIR function `batch_flatten`. The arguments to the TensorIR function are `data` and the output is expected to be a tensor of shape `(d, 784)` and dtype `float32`. This unlocks multiple opportunities, including, but not limited to:

* Incrementally lower different parts of the program using different strategies.
* Allow automation to take a `call_tir` to TensorIR, perform optimization and rewrite into multiple `call_tir` note that informs layout rewriting decisions to the high-level.
* Bring BYOC flow as a natural part of transformation(by transforming part of the graph into call of opaque packed functions).


### Symbolic Shape Dimensions
The unknown dimension in Relay module has been replaced with a symbolic dimension in Relax module. `%data: Tensor[(?, 1, 28, 28), float32]` in Relay was translated to `data: Tensor((d, 1, 28, 28), "float32")` in Relax. The unknown batch dimension is captured by symbolic integer TIR variable `d`. Symbolic dimensions are not limited to batch dimension in Relax and can be used in place of any dimension in the shape of any tensor.

The benefits of symbolic dimensions over unknown (`?`) dimensions is that it can express relationships between different Tensors in the model. For example, in Relax program we know that `data` and `lv` share the same first dimension `d`. This can lead to better memory planning for dynamic shape models. This information is lost in the Relay program with unknown shapes. We will cover symbolic shapes in more detail in future tutorial.



The other functions in the Relax module are TensorIR functions. For example, take a look at `batch_flatten` using `print(relax_mod["batch_flatten"])`.

In [5]:
print(relax_mod["batch_flatten"])

primfn(var_rxplaceholder: handle, var_tensor: handle) -> ()
  attr = {"global_symbol": "batch_flatten", "tir.noalias": True}
  buffers = {rxplaceholder: Buffer(rxplaceholder_1: Pointer(global float32), float32, [d: int64, 1i64, 28i64, 28i64], []),
             tensor: Buffer(tensor_1: Pointer(global float32), float32, [d, 784i64], [])}
  buffer_map = {var_rxplaceholder: rxplaceholder, var_tensor: tensor} {
  block([], "root") {
    tir.reads([])
    tir.writes([])
    for (i0: int64, 0i64, d) {
      for (i1: int64, 0i64, 784i64) {
        block([d, 784i64], "tensor") as [ax0, ax1] {
          bind(ax0, i0)
          bind(ax1, i1)
          tir.reads([rxplaceholder[ax0, 0i64, floordiv(floormod(ax1, 784i64), 28i64), floormod(ax1, 28i64)]])
          tir.writes([tensor[ax0, ax1]])
          tensor[ax0, ax1] = rxplaceholder[ax0, 0i64, floordiv(floormod(ax1, 784i64), 28i64), floormod(ax1, 28i64)]
      }
    }
}



## Compile Relax Module

Relax has a simple API to compile the Relax module to VM executable. We can dump the VM executable as text using `ex.stats()` and `ex.as_text()`. 

In [6]:
# Get params and input for the module
batch_size = 2
shape = (batch_size, *dshape)
data = tvm.nd.array(np.random.rand(*shape).astype(np.float32))
params = list(params_dict.values())

# Build the Relax IRModule
ex = relax.vm.build(relax_mod, target)

print(ex.stats())
print(ex.as_text())

Relax VM executable statistics:
  Constant pool (# 91): [shapetuple[30], shapetuple[0, 1, 2, 3], shapetuple[4, 5], shapetuple[6], shapetuple[7, 8], shapetuple[9], shapetuple[10, 11], shapetuple[12], shapetuple[13], float32, shapetuple[0, 14], shapetuple[0, 14], float32, shapetuple[0, 14], shapetuple[0, 14], shapetuple[15], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[512], float32, shapetuple[1, 128], float32, shapetuple[18], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[19], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[20], float32, shapetuple[0, 7], shapetuple[0, 7], float32, shapetuple[0, 7], shapetuple[0, 7], shapetuple[256], float32, shapetuple[1, 64], float32, shapetuple[23], float32, shapetuple[0, 7], shapetuple[0, 7], float32, shapetuple[0, 7], shapetuple[0, 7], shapetuple[24], float32, shapetuple[0, 7], shapetupl

## Execute Relax IR module

Now the compiled relax VM executable can be run and we can compare the results with Relay for correctness.

In [7]:
vm = relax.VirtualMachine(ex, tvm.cpu())
res = vm["main"](data, *params)

# check correctness by comparing with relay result
exe = relay.vm.compile(relay_mod, target)
relay_vm = vm_rt.VirtualMachine(exe, tvm.cpu())
inputs = [data] + params
expected_output = relay_vm.run(*inputs)
tvm.testing.assert_allclose(res.numpy(), expected_output.numpy())

Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 784), 'float32'), ('TENSOR', (128, 784), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 128), 'float32'), ('TENSOR', (64, 128), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 64), 'float32'), ('TENSOR', (10, 64), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
  "target_host parameter is going to be deprecated. "
