# Introduction To Relax

In this tutorial, we will get a Relay model and then convert it into a Relax model using the [Relay -> Relax converter](https://github.com/tlc-pack/relax/blob/relax/python/tvm/relax/testing/relay_translator.py). This will give us a chance to review the Relax IR at a high level and capture some basic concepts.

This tutorial will show you how to bring your Relay model into Relax and run it, as well as explain the basics of the Relax IR representation of the model.


In [1]:
from __future__ import annotations
import tvm
from tvm.relay import testing
from tvm import relax, relay
from tvm.relax.testing import relay_translator
from tvm.runtime import vm as vm_rt
import numpy as np

### Import a Relay Model
Let us begin with a Relay model. For this tutorial, we will use the multilayer preceptron (MLP), for its compactness and simplicity, but feel free to play around with any Relay model of your choice. We will import the model with a dynamic batch dimension by passing `batch_size=relay.Any()`. Below, we print the model's representation.

In [2]:
dshape = (1, 28, 28)
relay_mod, params_dict = testing.mlp.get_workload(batch_size=relay.Any())
print(relay_mod)



def @main(%data: Tensor[(?, 1, 28, 28), float32] /* ty=Tensor[(?, 1, 28, 28), float32] */, %fc1_weight: Tensor[(128, 784), float32] /* ty=Tensor[(128, 784), float32] */, %fc1_bias: Tensor[(128), float32] /* ty=Tensor[(128), float32] */, %fc2_weight: Tensor[(64, 128), float32] /* ty=Tensor[(64, 128), float32] */, %fc2_bias: Tensor[(64), float32] /* ty=Tensor[(64), float32] */, %fc3_weight: Tensor[(10, 64), float32] /* ty=Tensor[(10, 64), float32] */, %fc3_bias: Tensor[(10), float32] /* ty=Tensor[(10), float32] */) -> Tensor[(?, 10), float32] {
  %0 = nn.batch_flatten(%data) /* ty=Tensor[(?, 784), float32] */;
  %1 = nn.dense(%0, %fc1_weight, units=128) /* ty=Tensor[(?, 128), float32] */;
  %2 = nn.bias_add(%1, %fc1_bias, axis=-1) /* ty=Tensor[(?, 128), float32] */;
  %3 = nn.relu(%2) /* ty=Tensor[(?, 128), float32] */;
  %4 = nn.dense(%3, %fc2_weight, units=64) /* ty=Tensor[(?, 64), float32] */;
  %5 = nn.bias_add(%4, %fc2_bias, axis=-1) /* ty=Tensor[(?, 64), float32] */;
  %6 = nn.relu


Let's inspect the Relay IR representation of the model. It uses 10 Relay operators such as `nn.batch_flatten`, `nn.dense`, `nn.relu`, etc., applied in feedforward fashion (the output of the previous operator is the input to the next). Since we imported model with dynamic batch dimension, the `%data` input to the model has shape `(?, 1, 28, 28)` and the output of the model is `(?, 10)`, corresponding to 10 classes of outputs.

### Converting Relay to Relax

Now let's convert the imported Relay module into a Relax module; we can do this automatically thanks to the built-in [Relay -> Relax converter](https://github.com/tlc-pack/relax/blob/relax/python/tvm/relax/testing/relay_translator.py) utility. The resulting Relax module consists of a Relax function corresponding to the Relay implementation above, as well as TIR implementations of the operators called in the Relay model (Relax supports calling TIR inline). Note that the TIR calls need a target for code generation; in this case, we will lower to LLVM.

Let's inspect the resulting module, starting with the entrypoint function, `main()`.

In [3]:
target = tvm.target.Target("llvm")
relax_mod = relay_translator.from_relay(relay_mod["main"], target)

# To look at the entire Relax IR module you can dump it using the following code.
# print(relax_mod)

print(relax_mod["main"])

@relax.function
def main(data: Tensor((d, 1, 28, 28), "float32"), fc1_weight: Tensor((128, 784), "float32"), fc1_bias: Tensor((128,), "float32"), fc2_weight: Tensor((64, 128), "float32"), fc2_bias: Tensor((64,), "float32"), fc3_weight: Tensor((10, 64), "float32"), fc3_bias: Tensor((10,), "float32")) -> Tensor(None, "float32", ndim = 2):
    # block 0
    with relax.dataflow():
        lv = relax.call_tir(batch_flatten, (data,), (d, 784), dtype="float32")
        lv1 = relax.call_tir(dense, (lv, fc1_weight), (d, 128), dtype="float32")
        lv2 = relax.call_tir(expand_dims, (fc1_bias,), (1, 128), dtype="float32")
        lv3 = relax.call_tir(add, (lv1, lv2), (d, 128), dtype="float32")
        lv4 = relax.call_tir(relu, (lv3,), (d, 128), dtype="float32")
        lv5 = relax.call_tir(dense1, (lv4, fc2_weight), (d, 64), dtype="float32")
        lv6 = relax.call_tir(expand_dims1, (fc2_bias,), (1, 64), dtype="float32")
        lv7 = relax.call_tir(add1, (lv5, lv6), (d, 64), dtype="float32"

#### Relax Functions

The function `main()` is printed in TVMScript format and begins with the `@relax.function` decorator. Despite the difference in format, however, the Relax `main()` function resembles its Relay counterpart in many ways, though it also shows off some fundamental new features. The following sections will discuss several of the differences and what they mean. Many will be covered in further detail in upcoming tutorials.

#### Dataflow Block

We may observe the that the model code is encapsulated in a `with relax.dataflow()` construct. Relax enforces certain guarantees within a dataflow block: namely, that all the operations within the block are side effect-free and that there are no control flow (e.g., if-then-else) constructs or nested scopes. Since the original Relay model did not have any side effects, branching control flow, or nested scopes, all of the functionality of the original model can be safely contained in this block.

A dataflow block can effectively be viewed as a computational graph embedded in the program. Note that most of the binding variables (`lv`, `lv1`, `lv2`, `lv3`) within the dataflow block are "local", which means they are only visible within the block. These variables can be viewed as "internal nodes" of the computational graph. We can mark a variable as output (as is done with `gv`), in which case the variable will be visible in later part of the program. These output variables can be viewed as output nodes in the computational graph.

Note that `return gv` is outside of the dataflow block. Code that is outside of a dataflow block can have side effects and so requires further analysis to determine if optimizations like reordering bindings are safe. We expect most of the optimizations to be implemented on dataflow blocks, where they can take advantage of the safety guarantees. These optimizations can be done by ML engineers who are familiar with computational graphs. The ability to isolate and represent effectful components also provides opportunities for more advanced optimizations for the places that need them.

#### Symbolic Shape Dimensions

Another difference between the two models can be observed in the type signatures. The "any" dimension in Relay module has been replaced with a symbolic dimension in Relax module; `%data: Tensor[(?, 1, 28, 28), float32]` in Relay was translated to `data: Tensor((d, 1, 28, 28), "float32")` in Relax. Though this example has a symbolic batch dimension, Relax permits any dimension in a tensor shape to be symbolic.

Relax's use of symbolic dimensions has an important benefit over Relay's single wildcard `?` dimension: it can express relationships between different Tensors in the model. For example, in Relax program we know that `data` and `lv` share the same first dimension `d`; by contrast, each use of `?` in Relay can match any dimension. The more precise information in Relax can allow for better memory planning for models with dynamic shapes and otherwise represent more correctness properties than Relay's "any" dimension. We will cover symbolic shapes in more detail in future tutorial.

#### Direct Interaction with TensorIR

One last difference between the models is that the Relay model is calling into a set of predefined operators whereas the Relax model is dirrectly calling into TensorIR implementations of the corresponding operators. In Relax, the high-level IR can directly interact with and call into the lower-level TensorIR (and also invoke arbitrary `PackedFunc`s, but this example does not make use of this).

For example, the line `lv = relax.call_tir(batch_flatten, (data,), (d, 784), dtype="float32")` is directly calling into the TensorIR function `batch_flatten`. The arguments to the TensorIR function are `data` and the output is expected to be a tensor of shape `(d, 784)` and dtype `float32`. We can observe below that the Relax module indeed contains a TIR definition for `batch_flatten`.

In [4]:
print(relax_mod["batch_flatten"])

primfn(var_rxplaceholder: handle, var_tensor: handle) -> ()
  attr = {"global_symbol": "batch_flatten", "tir.noalias": True}
  buffers = {rxplaceholder: Buffer(rxplaceholder_1: Pointer(global float32), float32, [d: int64, 1i64, 28i64, 28i64], []),
             tensor: Buffer(tensor_1: Pointer(global float32), float32, [d, 784i64], [])}
  buffer_map = {var_rxplaceholder: rxplaceholder, var_tensor: tensor} {
  block([], "root") {
    tir.reads([])
    tir.writes([])
    for (i0: int64, 0i64, d) {
      for (i1: int64, 0i64, 784i64) {
        block([d, 784i64], "tensor") as [ax0, ax1] {
          bind(ax0, i0)
          bind(ax1, i1)
          tir.reads([rxplaceholder[ax0, 0i64, floordiv(floormod(ax1, 784i64), 28i64), floormod(ax1, 28i64)]])
          tir.writes([tensor[ax0, ax1]])
          tensor[ax0, ax1] = rxplaceholder[ax0, 0i64, floordiv(floormod(ax1, 784i64), 28i64), floormod(ax1, 28i64)]
      }
    }
}



Unlike Relax functions, TIR functions start with `@<func_name> = primfn(...`.

The ability to call TensorIR functions directly is very convenient for experimenting with new operators, compared to the relative complexity of registering new operators in Relay. It also provides many further advantages in compilation, such as the following capabilities (among others):

* Incrementally lowering different parts of the program using different strategies
* Communicating layout rewriting and transformations at the TIR level back to the high-level IR to inform further optimizations
* Easier incorporation of the BYOC compilation flow into the lower levels of the stack (by transforming part of the graph into call of opaque packed functions).

## Compiling a Relax Module

Relax has a simple API to compile the Relax module to VM executable, similar to the existing VM compilation for Relay modules. We can dump the VM executable as text using `ex.stats()` and `ex.as_text()`. 

In [5]:
# Get params and input for the module
batch_size = 2
shape = (batch_size, *dshape)
data = tvm.nd.array(np.random.rand(*shape).astype(np.float32))
params = list(params_dict.values())

# Build the Relax IRModule
ex = relax.vm.build(relax_mod, target)

print(ex.stats())
print(ex.as_text())

Relax VM executable statistics:
  Constant pool (# 91): [shapetuple[30], shapetuple[0, 1, 2, 3], shapetuple[4, 5], shapetuple[6], shapetuple[7, 8], shapetuple[9], shapetuple[10, 11], shapetuple[12], shapetuple[13], float32, shapetuple[0, 14], shapetuple[0, 14], float32, shapetuple[0, 14], shapetuple[0, 14], shapetuple[15], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[512], float32, shapetuple[1, 128], float32, shapetuple[18], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[19], float32, shapetuple[0, 4], shapetuple[0, 4], float32, shapetuple[0, 4], shapetuple[0, 4], shapetuple[20], float32, shapetuple[0, 7], shapetuple[0, 7], float32, shapetuple[0, 7], shapetuple[0, 7], shapetuple[256], float32, shapetuple[1, 64], float32, shapetuple[23], float32, shapetuple[0, 7], shapetuple[0, 7], float32, shapetuple[0, 7], shapetuple[0, 7], shapetuple[24], float32, shapetuple[0, 7], shapetupl

## Execute Relax IR module

Let's run both models and compare the results so we can be certain that the conversion has preserved the semantics.

In [6]:
vm = relax.VirtualMachine(ex, tvm.cpu())
res = vm["main"](data, *params)

# check correctness by comparing with relay result
exe = relay.vm.compile(relay_mod, target)
relay_vm = vm_rt.VirtualMachine(exe, tvm.cpu())
inputs = [data] + params
expected_output = relay_vm.run(*inputs)
tvm.testing.assert_allclose(res.numpy(), expected_output.numpy())

Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 784), 'float32'), ('TENSOR', (128, 784), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 128), 'float32'), ('TENSOR', (64, 128), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', ({any_dim|any_dim>=0}, 64), 'float32'), ('TENSOR', (10, 64), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
  "target_host parameter is going to be deprecated. "
