# JIT Engine: Tensor + Tensor

This example will go over how to compile MLIR code to a function callable from Python.

The example MLIR code we’ll use here performs element-wise tensor addition.

Let’s first import some necessary modules and generate an instance of our JIT engine.

In [1]:
import mlir_graphblas
import numpy as np

engine = mlir_graphblas.MlirJitEngine()

We'll use the same set of passes to optimize and compile all of our examples below.

In [2]:
passes = [
    "--linalg-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--tensor-constant-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-std",
    "--convert-memref-to-llvm",
    "--convert-std-to-llvm",
]

## Fixed-Size Tensor Addition

Here’s some MLIR code to add two 32-bit floating point tensors of with the shape 2x3.

In [3]:
mlir_text = """
#trait_add = {
 indexing_maps = [
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>
 ],
 iterator_types = ["parallel", "parallel"]
}

func @matrix_add_f32(%arga: tensor<2x3xf32>, %argb: tensor<2x3xf32>) -> tensor<2x3xf32> {
  %answer = linalg.generic #trait_add
    ins(%arga, %argb: tensor<2x3xf32>, tensor<2x3xf32>)
    outs(%arga: tensor<2x3xf32>) {
      ^bb(%a: f32, %b: f32, %s: f32):
        %sum = addf %a, %b : f32
        linalg.yield %sum : f32
  } -> tensor<2x3xf32>
  return %answer : tensor<2x3xf32>
}
"""

Let's compile our MLIR code. 

In [4]:
engine.add(mlir_text, passes)

['matrix_add_f32']

Let's try out our compiled function. 

In [5]:
# grab our callable
matrix_add_f32 = engine.matrix_add_f32

# generate inputs
a = np.arange(6, dtype=np.float32).reshape([2, 3])
b = np.full([2, 3], 100, dtype=np.float32)

# generate output
result = matrix_add_f32(a, b)

In [6]:
result

array([[100., 101., 102.],
       [103., 104., 105.]], dtype=float32)

Let's verify that our function works as expected.

In [7]:
np.all(result == np.add(a, b))

True

## Arbitrary-Size Tensor Addition

The above example created a function to add two matrices of size 2x3. This function won't work if we want to add two matrices of size 4x5 or any other size. 

In [8]:
a = np.arange(20, dtype=np.float32).reshape([4, 5])
b = np.full([4, 5], 100, dtype=np.float32)
matrix_add_f32(a, b)

ValueError: array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]], dtype=float32) is expected to have size 2 in the 0th dimension but has size 4.

While it's nice that the JIT engine is able to detect that there's a size mismatch, it'd be nicer to have a function that can add two tensors of arbitrary size. 

We'll now show how to create such a function for matrix of 32-bit integers. 

In [9]:
mlir_text = """
#trait_add = {
 indexing_maps = [
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>
 ],
 iterator_types = ["parallel", "parallel"]
}

func @matrix_add_i32(%arga: tensor<?x?xi32>, %argb: tensor<?x?xi32>) -> tensor<?x?xi32> {
  // Find the max dimensions of both args
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %arga_dim0 = tensor.dim %arga, %c0 : tensor<?x?xi32>
  %arga_dim1 = tensor.dim %arga, %c1 : tensor<?x?xi32>
  %argb_dim0 = tensor.dim %argb, %c0 : tensor<?x?xi32>
  %argb_dim1 = tensor.dim %argb, %c1 : tensor<?x?xi32>
  %dim0_gt = cmpi "ugt", %arga_dim0, %argb_dim0 : index
  %dim1_gt = cmpi "ugt", %arga_dim1, %argb_dim1 : index
  %output_dim0 = select %dim0_gt, %arga_dim0, %argb_dim0 : index
  %output_dim1 = select %dim1_gt, %arga_dim1, %argb_dim1 : index
  %output_memref = memref.alloca(%output_dim0, %output_dim1) : memref<?x?xi32>
  %output_tensor = memref.tensor_load %output_memref : memref<?x?xi32>
  
  // Perform addition
  %answer = linalg.generic #trait_add
    ins(%arga, %argb: tensor<?x?xi32>, tensor<?x?xi32>)
    outs(%output_tensor: tensor<?x?xi32>) {
      ^bb(%a: i32, %b: i32, %s: i32):
        %sum = addi %a, %b : i32
        linalg.yield %sum : i32
    } -> tensor<?x?xi32>
 return %answer : tensor<?x?xi32>
}
"""

The compilation of this MLIR code will be the same as our first example. The main difference is in how we wrote our MLIR code (notice the use of "?X?" when denoting the shapes of tensors).

In [10]:
# compile
engine.add(mlir_text, passes)
matrix_add_i32 = engine.matrix_add_i32

# generate inputs
a = np.arange(20, dtype=np.int32).reshape([4, 5])
b = np.full([4, 5], 100, dtype=np.int32)

# generate output
result = matrix_add_i32(a, b)

In [11]:
result

array([[100, 101, 102, 103, 104],
       [105, 106, 107, 108, 109],
       [110, 111, 112, 113, 114],
       [115, 116, 117, 118, 119]], dtype=int32)

In [12]:
assert np.all(result == np.add(a, b))

Note that we get some level of safety regarding the tensor types as we get an exception if we pass in tensors with the wrong dtype.

In [13]:
matrix_add_i32(a, b.astype(np.int64))

TypeError: array([[100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100]]) is expected to have dtype <class 'numpy.int32'>

Note that in the MLIR code, each of our output tensor's dimensions is the max of each dimension of our inputs. 

A consequence of this is that our function doesn't enforce that our inputs are the same shape.

In [14]:
# generate differently shaped inputs
a = np.arange(6, dtype=np.int32).reshape([2, 3])
b = np.full([4, 5], 100, dtype=np.int32)

# generate output
result = matrix_add_i32(a, b)

In [15]:
result.shape

(4, 5)

In [16]:
result

array([[       100,        101,        102, 1073743867,          6],
       [       103,        104,        105,          0,        101],
       [1852990827, 1714318437,  962869552,  761619769,  842032697],
       [1667511341,  926428470,  808280369, 1714829668,  946157106]],
      dtype=int32)

This result is somewhat unexpected. The weird numbers we see (the zeros and large numbers) are come from the arbitrarily initialized values in the memory for our output (i.e. `%output_memref`). 

This is an implementation problem with how we wrote our MLIR code as there's no enforcement of the need for both inputs to be the same shape. Special care must be taken when dealing with arbitrary sized tensors or else we might get bugs or unexpected results as shown here. 