# Groq API - Buffered Resource Scopes

In this example, we'll build off the Adding Tensors Tutorial and include a multiplication operation. To do this, we'll read the input data of the first two tensors from memory, add them together, and write back to memory. We'll then read the result from memory and multiply it by a third tensor to get the final result of computing: Result = (A + B) x C. For this design, we'll use Buffered Resource Scopes which helps with allocating and sharing resources in the chip. The flow of operations will look like the following:

(Read Tensors) -> Add -> (Write Result) -> (Read Tensors) -> Mul -> (Write Result)

By the end of this tutorial, you should feel comfortable with the following concepts:
* VXM Multiplication (Element Wise)
* Buffered Resource Scopes

It is expected that you have finished reading the Intro to Buffered Scopes section of the Groq API Tutorial Guide prior to going through this tutorial. 

## Build a program and compile with Groq API

Begin by importing the Groq API main module.

In [None]:
import groq.api as g
from groq.runner import tsp
import numpy as np
print("Python Packages imported successfully!")

We first declare the three tensors as input tensors. 

In [None]:
t1_mt = g.input_tensor(shape=(2, 32), dtype=g.float16, name="tensor_1")
t2_mt = g.input_tensor(shape=(2, 32), dtype=g.float16, name="tensor_2")
t3_mt = g.input_tensor(shape=(2, 32), dtype=g.float16, name="tensor_3")

### Create Buffered Resource Scopes 

Next, we'll use the component we created in the Adding Tensors Tutorial, however, this time we'll include a two Resource Scopes in the build step:

1. add_scope: wraps the Read -> Add -> Write
2. mul_scope: wraps the Read -> Mul -> Write

In both Resource Scopes we include `in_buffered=True` which is how we set the Resource Scope to be buffered. Since we created a buffered resource scope on the addition and we want the multiplication to occur after it, we can pass the name of the buffered resource scope as the predecessor to the multiplication scope. This tells the API that it should not schedule the mul_scope until the add_scope has completed, which allows you to set `time=None`. Generally speaking, setting an argument to `None` asks the API to decide the value of that argument for you, which works as long as the API has sufficient information (i.e. the predecessors list).

In [None]:
class TopLevel(g.Component):
    def __init__(self):
        super().__init__()
    def build(self, in1_mt, in2_mt, in3_mt, time=0):
        with g.ResourceScope(name="add_scope", is_buffered=True, time=time) as add_scope:
            add_st = g.add(in1_mt, in2_mt, time=0) #time here is relative to that start time of the scope
            add_mt = add_st.write(name="tmp")
        with g.ResourceScope(name="mul_scope", is_buffered=True, time=None, predecessors=[add_scope]) as mul_scope:
            result_st = g.mul(add_mt, in3_mt, time=0) #time here is relative to that start time of the scope
            result_mt = result_st.write(name="result")
        return result_mt

Now that we have our buffered scopes created, we can instantiate our top-level component, then call into it with the input data and a start time.

In [None]:
top = TopLevel()
total_result = top(t1_mt, t2_mt, t3_mt, time=0)

With a complete design, we can compile it for the GroqChip (creating an IOP binary file used to program the device)

In [None]:
iop_file = g.compile(base_name="buffered_scopes", result_tensor=total_result)
print(iop_file)

`buffered_scopes.iop` contains the binary compiled program. This will be used to program the GroqChip with the desired functionality of our program. 

## GroqView
GroqView can be used to view the instructions of your program in the GroqChip. Note: it is expected that you are familiar with the GroqView tool (See "GroqView User Guide") for this section of this tutorial. You may skip viewing the program in GroqView and move to the "Prepare Data for Program" section.

Using the following command, we can create a .json file that can be used to view the program in hardware. This will show:
* what instructions occur
* where on the chip they take place, as well as 
* when in time (cycles) each instruction occurs. 

In [None]:
g.write_visualizer_data("buffered_scopes")

To launch GroqView, uncomment and run the following command. Remember, you still need to create a tunnel to the server running the GroqView tool to load in another window. 

In GroqView, you should see the input tensors being read from memory, streamed to the VXM where they are added together and the result written back into memory. 

In [None]:
#!groqview buffered_scopes/visdata.json

<b>Note:</b> before proceeding to the next section, you'll want to stop the above cell. 

## Prepare Data for Program

Using NumPy, we'll create three randomly generated tensors as inputs. We ensure that the shape of the tensors and the data type match what the GroqChip is programmed for: (2, 32) and float16.

In [None]:
t1_data = np.random.rand(2, 32).astype(np.float16)
t2_data = np.random.rand(2, 32).astype(np.float16)
t3_data = np.random.rand(2, 32).astype(np.float16)

## Add Program to GroqChip

The next step is to use the create_tsp_runner function to load the program on the GroqChip. This will use the binary executable we compiled earlier to program the chip. 

In [None]:
add_program = tsp.create_tsp_runner(iop_file)

At this point the binary program `buffered_scopes.iop` has been loaded on the GroqChip. We can now start sending data from the host to the program. To do this, we provide the numpy tensors we created as inputs. The program will return the result of adding the two tensors together and multiplying the result by the third input tensor. The arguments to `add_program` are the input tensors for the loaded program. For example, `tensor_1` is the name of the input tensor previously constructed with that name, and `t1_data` is the tensor containing the input data.

In [None]:
result = add_program(tensor_1=t1_data, tensor_2=t2_data, tensor_3=t3_data)

Let's check that the results from the GroqChip is correct by comparing it with the result calculated via NumPy:

In [None]:
numpy_result = (t1_data+t2_data)*t3_data

np.array_equal(result['result'], numpy_result)

## Back to Back Computations

The GroqChip is still programmed with the program so we can continue to provide inputs data and it will return the results of adding them together. Now let's look at how we can perform calls to the same program repeatedly with different input tensors.

In [None]:
for i in range(3):
    print(f"Inference {i}")
    t1_data = np.random.rand(2, 32).astype(np.float16)
    t2_data = np.random.rand(2, 32).astype(np.float16)
    t3_data = np.random.rand(2, 32).astype(np.float16)
    result = add_program(tensor_1=t1_data, tensor_2=t2_data, tensor_3=t3_data)
    numpy_result = (t1_data+t2_data)*t3_data
    np.array_equal(result['result'], numpy_result)
    print(f"Validated {i} -->", np.array_equal(result['result'], numpy_result))