# Groq API - Linear Tutorial

The following tutorial will implement a Linear Layer that includes:
    Matmul -> Bias Add -> ReLu

By the end of this tutorial, you should feel comfortable with the following concepts:
* VXM Chaining

It is expected that you have finished reading the Linear Layer section of the Groq API Tutorial Guide prior to going through this tutorial. 

## Build Your Program
Begin by importing the following packages. Since we'll be using the matmul and relu component from the Neural Net library, we import groq.api.nn as well. 

In [None]:
import groq.api as g
import groq.api.nn as nn
import numpy as np
from groq.runner import tsp

print("Python packages imported successfully")

## Prepare Data for Program
First, we'll setup our input data. We've selected a size of 320 so that we can show a full MXM matrix multiplication. To fully exercise our linear layer, we will need data with both positive and negative numbers, so we start with a random sample between 0 and 1 and subtract -0.5 to get a range between -0.5 and 0.5. This is so that the ReLU has an effect as it sets negative values to zero and passes anything greater than 0. 

In [None]:
size = 320
v_shape = (1, size)       # Vector shape for the bias add
mat_shape = (size, size)  # Size for Input data and weights

input_data = np.random.random_sample(size=mat_shape).astype(np.float16) - 0.5

bias_data = np.random.random_sample(size=v_shape).astype(np.float32)  # Bias is FP32 as the matmul from the MXM returns results in FP32
weights_data = np.random.random_sample(size=mat_shape).astype(np.float16) - 0.5

Next, we'll create an input data placeholder for the activations of the matmul. We use the recommended layout for the matmul activations for FP16. This is discussed in the Multi-Matmul Tutorial or can be referenced in the Groq API Reference Guide for nn.matmul(). 

In [None]:
input_mt = g.input_tensor(
    shape=mat_shape,
    name="input_matrix",            # always name when possible
    layout="H1(W), -1, S2(4-38)",   # layout to match the activation matrix. 
    dtype=g.float16,
)

The following instantiates the top level for the Linear Layer. Prior to implementation, it's important to make a VXM chaining plan. The Groq API Tutorial Guide supplements the information provided here:

* nn.matmul() reserves ALUs \[4] and \[5] to finish the matmul operation and the data leaves the West MXM, arriving at the VXM on StreamGroups 3E and 4E for the lower and upper byte planes (Byte Plane 0 and Byte Plane 1, respectively).
    * Matmul results are combined in ALU\[4] 
    * and accumulated in ALU\[5]

Since we're implementing a Linear Layer that includes Bias and ReLU operations, we need two more StreamGroups for the following:
 * bias -> Add tensor
 * relu (max) -> Zero tensor (which will check for values less than zero and mask with the zero tensor)

The `max` instruction within the ReLU requires a large ALU. The `add` for the `bias_add` can be performed in either a large or small ALU. 
Thus we'll choose StreamGroup 2E for the add tensor and StreamGroup 5W for the zero tensor. Where E and W represent the direction the data is heading (eastward/westward). We will also account for casting the final result from FP32 to FP16 before returning it to the host which will require a large ALU. So to summarize, we have the following StreamGroups in use:

 * 2E for the bias-add tensor
 * 3E for the MXM results from the upper byte plane
 * 4E for the MXM results from the lower byte plane
 * 5W as we'll plan on having the zero tensor come from MEM east.
 * 2W for the result 

And the following ALUs:

 * \[4] MXM combining
 * \[5] MXM accumulation (used when Matrix size is larger than 320)
 * \[6] Bias Add
 * \[7] ReLU
 * \[2] Cast FP32 to FP16

The VXM chaining plan is shown in the following figure:

 ![title](LinearVXMChain.jpg)

In [None]:
class LinearTopLevel(g.Component):
    def __init__(self, weights, bias, **kwargs):
        super().__init__(**kwargs)

        self.weights_data = weights
        self.bias_data = bias

        self.add_alus = [6]     # ALU[6]: see diagram above
        self.relu_alus = [7]    # ALU[7]
        self.cast_alus = [2]    # ALU[2]

        self.matmul = nn.MatMul(
            #assign an output StreamGroup to ensure the backend API does not assign something different than the VXM chaining plan
            name="matmul", planes=[0, 1], use_vxm_accum=True, out_strm_rq=g.SG4_E[3], arith_mode_warmup=True, 
        )
        self.relu = nn.ReLU(
            alus=self.relu_alus,
            input_stream_req=g.SG4_E[2],    # bias_add result will be on StreamGroup 2E
            output_stream_req=g.SG4_W[2],   # output ReLU result on StreamGroup 2W
            zero_stream_req=g.SG4_W[5],     # bring the zero tensor from the east MEM location heading west
        )

    def build(self, input_mt, time=0, **kwargs):
        super().build(**kwargs)

        weights_mt = g.from_data(
            data=self.weights_data,  
            name="weights_mt",
            layout="H1(W), -1, S16(4-38)",  #layout for matmul weights
        )

        bias_mt = g.from_data(
            data=self.bias_data, name="bias_mt", layout="H1(W), -1, S4"  # we're using an eastbound stream, so we ensure the bias tensor is in west MEM  
        )
        bias_st = bias_mt.read(streams=g.SG4_E[2])  # Read bias tensor. 

        product_st = self.matmul(input_mt, weights_mt, time=0) 

        # Add bias_add tensor with the matmul results
        sum_st = product_st.add(
            bias_st,
            input_streams=[g.SG4_E[3], g.SG4_E[2]],  # Use StreamGroups 2E and 3E to input to the ALU
            output_streams=g.SG4_E[2],               # Output result onto StreamGroup 2E
            alus=self.add_alus,     #ALU6
        )
        relu_st = self.relu(sum_st) # perform ReLU
        result_st = g.cast(relu_st, dtype=g.float16, fp16_inf=False, alus=self.cast_alus) #Cast results to FP16

        result_mt = result_st.write(name="result", layout="H1(W), -1, S2")  #Write to MEM in West hemisphere. 

        return result_mt

In [None]:
unit = LinearTopLevel(weights_data, bias_data)    # instantiate the top level component
result_mt = unit(input_mt, time=0)    # call into the instance of the top level, providing your inputs and time

Now that we've instantiated and built the Linear Top Level, we can compile our program

In [None]:
iop_file = g.compile(
    base_name="linear_test", result_tensor=[result_mt]
)

## GroqView
GroqView can be used to view the instructions of your program in the GroqChip. Note: it is expected that you are familiar with the GroqView tool (See "GroqView User Guide") for this section of this tutorial. You may skip viewing the program in GroqView and move to the "Prepare Data for Program" section.

Using the following command, we can create a .json file that can be used to view the program in hardware. This will show:
* what instructions occur
* where on the chip they take place, as well as 
* when in time (cycles) each instruction occurs.

To launch GroqView, uncomment and run the following command. Remember, you still need to create a tunnel to the server running the GroqView tool to load in another window. 

In [None]:
g.write_visualizer_data("linear")
#!groqview linear/visdata.json

<b>Note:</b> before proceeding to the next section, you'll want to stop the above cell. 

## Run on Hardware
Program the GroqChip with the binary file of the Matrix Multiply program 

In [None]:
program = tsp.create_tsp_runner(iop_file)

Provide the input data to the GroqChip which will return the results of the linear layer

In [None]:
result = program(input_matrix=input_data)

## Check Results

In [None]:
def linear_np(input, weights, bias):
    product = np.matmul(input, weights.transpose())
    sum = product + bias
    result = np.maximum(sum, 0).astype(dtype=np.float16)
    return result

oracle = linear_np(input_data, weights_data, bias_data)

In [None]:
print("Linear layer with all dimensions set to {}.  Results are: ".format(size))
print(np.allclose(oracle, result['result'], rtol=1e-1, atol=1e-1, equal_nan=True))