# Groq API - Matrix Multiplication Tutorial

In this example, we'll introduce how to perform a matrix multiplication in the MXM module of the GroqChip. In the Adding Tensors and the Buffered Scopes tutorials, we created our own components (Add and Mul), this time we'll use the premade matmul component included with the Groq API Neural Net Library (NN). 

By the end of this tutorial, you should feel comfortable with the following concepts:
* Matrix Multiplication on Groq hardware
* MXM: Matrix Execution Module
* Groq API Neural Net Library

It is expected that you have finished reading the Intro to Matrix Multiplication section of the Groq API Tutorial Guide prior to going through this tutorial. 

## Build a program and Compile with Groq API
Begin by importing the following packages. Note that for this example, in addition to the Groq API, we're also importing the Neural Net library from the Groq API as 'nn'. 

In [None]:
import groq.api as g
import groq.api.nn as nn
from groq.runner import tsp
import numpy as np
print("Python packages imported successfully")

Create two input tensors as placeholders for the data we're going to multiply. The Matrix Multiply in the Neural Net library expects two Rank-2 tensors and supports the following data types: int8 & float16, as well as a special case for mixed FLOAT16/FLOAT32 (See API Reference Guide for more). The API implicitly transposes the 2nd tensor before performing the matmul operation. As well, it is required that the inner dimension of both memory tensors are the same. 

In [None]:
matrix1 = g.input_tensor(shape=(120, 120), dtype=g.float16, name="matrix1", layout="H1(W), -1, S2")
matrix2 = g.input_tensor(shape=(120, 120), dtype=g.float16, name="matrix2", layout="H1(W), -1, S16(4-38)")

You'll see in the above that we include a layout for the input tensors--while not required for this simple example, this is a best practice guideline and should be included. The details of memory layouts will be explained in the Multi Matmul Tutorial but for float16, the following layout is recommended. See the API Reference Guide section on `nn.matmul()` for guidance. 

The following instantiates the Neural Net matmul component inside your top level component. It is recommended to provide a name to your matmul operation to help with any future debug needs. 

In [None]:
class TopLevel(g.Component):  # Create our top level component
    def __init__(self):
        super().__init__()
        self.mm = nn.MatMul(name="MyMatMul", buffer_output=True, arith_mode_warmup=True)      #Matmul: using the nn.MatMul() component.

    def build(self, mat1_mt, mat2_mt, time=0):   #Provide input matrices and a default time
        with g.ResourceScope(name="mmscope", time=0) as mmscope:
            result_mt = self.mm(mat1_mt, mat2_mt, time=0)
            result_mt.name = "mm_result"
            result_mt.layout = "H1(W), -1, S4" #recommended layout for the matmul result (float32)
        return result_mt

In [None]:
top = TopLevel()    # instantiate the top level component
result = top(matrix1, matrix2, time=0)    # call into the instance of the top level, providing your inputs and time

Now that we've instantiated and built the MatMul component, we can compile our program

In [None]:
iop_file = g.compile(base_name="matmul_tutorial", result_tensor=result)

## GroqView
GroqView can be used to view the instructions of your program in the GroqChip. Note: it is expected that you are familiar with the GroqView tool (See "GroqView User Guide") for this section of this tutorial. You may skip viewing the program in GroqView and move to the "Prepare Data for Program" section.

Using the following command, we can create a .json file that can be used to view the program in hardware. This will show:
* what instructions occur
* where on the chip they take place, as well as 
* when in time (cycles) each instruction occurs.

In [None]:
g.write_visualizer_data("matmul")

To launch GroqView, uncomment and run the following command. Remember, you still need to create a tunnel to the server running the GroqView tool to load in another window. 

In [None]:
#!groqview matmul/visdata.json

<b>Note:</b> before proceeding to the next section, you'll want to stop the above cell. 

## Prepare Data for Program

In [None]:
t1_data = np.random.rand(120, 120).astype(np.float16)
t2_data = np.random.rand(120, 120).astype(np.float16)

## Run on Hardware
Program the GroqChip with the binary file of the Matrix Multiply program 

In [None]:
program = tsp.create_tsp_runner(iop_file)

Provide the input data to the GroqChip which will return the results of the matrix multiplication

In [None]:
result = program(matrix1=t1_data, matrix2=t2_data)

## Check Results
Note that the oracle value is float32 because the output of the MXM matrix multiply is float32 for two float16 inputs. 

In [None]:
oracle = np.matmul(t1_data, t2_data.transpose(), dtype=np.float32)

In [None]:
print("Matrix Multiplication for input tensors of size {} x {}.  Results are: ".format(t1_data.shape, t2_data.shape))
print(np.allclose(oracle, result['mm_result'], rtol=1e-1, atol=1e-1, equal_nan=True))

## Back to Back Computations
The GroqChip is still programmed with the matmul program so we can continue to provide input data and it will return the results of the matmul. Now let's look at how we can perform calls to the same program repeatedly with different input tensors.

In [None]:
for i in range(3):
    print(f"Matrix Multiply {i}")
    t1_data = np.random.rand(120, 120).astype(np.float16)
    t2_data = np.random.rand(120, 120).astype(np.float16)
    result = program(matrix1=t1_data, matrix2=t2_data)
    oracle = np.matmul(t1_data, t2_data.transpose(), dtype=np.float32)
    print("For input tensors of size {} x {}. Results are: ".format(t1_data.shape, t2_data.shape))
    print(np.allclose(oracle, result['mm_result'], rtol=1e-1, atol=1e-1, equal_nan=True))

### Optional
Try different sized matmuls and see what happens in the hardware using the GroqView tool