# VWR2A Simulator
This notebook illustrates how to use the simulator both for decoding existing hexadecimal VWR2A kernels to assembly, as well as writing your own kernels by writting your own assembly and generating it's bitstream. At the end, we develop a working kernel that adds two vectors together.

In [6]:
# Imports
import pandas as pd
from random import randint
from src import *
from src.simulator import SIMULATOR

## ISAs for specialized slots
First, we set up objects for each specialized slot of the VWR2A (i.e. Load Store Unit, Reconfigurable Cells, etc.) and show some examples. For detailed descriptions of the assembly ISA ot the hexadecimal underlying, please visit the docs section.

### Loop Control Unit
This unit is prepared to control the loops of the code.

In [7]:
# --------------------------------------------
#         Loop Control Unit (LCU)
# --------------------------------------------
lcu = LCU()

instr_list = ["NOP", "EXIT", "SADD R1, ZERO, LAST", "SADD R1, SRF(3), LAST", "SADD R1, 7, ONE", "SSUB SRF(4), SRF(4), SRF(4)", "JUMP 7, ONE", "BGEPD ZERO, ONE, 5"]

for instr in instr_list:
    _, _, imem_word = lcu.asmToHex(instr)
    print("ASM : " + instr + " --> Hex: " + imem_word.get_word_in_hex())

ASM : NOP --> Hex: 0x0
ASM : EXIT --> Hex: 0x1c00
ASM : SADD R1, ZERO, LAST --> Hex: 0xd4340
ASM : SADD R1, SRF(3), LAST --> Hex: 0x94340
ASM : SADD R1, 7, ONE --> Hex: 0xfc347
ASM : SSUB SRF(4), SRF(4), SRF(4) --> Hex: 0x90400
ASM : JUMP 7, ONE --> Hex: 0xfda07
ASM : BGEPD ZERO, ONE, 5 --> Hex: 0xdd605


### Load-Store Unit
This unit is prepared to control the movement of data between the SPM and the VWRs.

In [8]:
# --------------------------------------------
#             Load- Store Unit (LCU)
# --------------------------------------------
lsu = LSU()

instr_list = ["SADD R0, ONE, ONE/LD.VWR VWR_A", "SADD R0, SRF(5), ONE/SH.IL.UP", "SADD SRF(5), SRF(5), ONE/LD.VWR SRF"]

for instr in instr_list:
    _, _, imem_word = lsu.asmToHex(instr)
    print("ASM : " + instr + " --> Hex: " + imem_word.get_word_in_hex())

ASM : SADD R0, ONE, ONE/LD.VWR VWR_A --> Hex: 0x45538
ASM : SADD R0, SRF(5), ONE/SH.IL.UP --> Hex: 0xc4538
ASM : SADD SRF(5), SRF(5), ONE/LD.VWR SRF --> Hex: 0x5c530


### Reconfigurable Cells
This units are prepared to make the computations as an ALU would do on a CPU.

In [9]:
# --------------------------------------------
#         Reconfigurable Cells (RCs)
# --------------------------------------------
rc = RC()

instr_list = ["NOP", "SADD VWR_A, VWR_A, VWR_B", "SADD VWR_A, SRF(3), VWR_B", "LOR R0, RCB, MIN_INT", "MUL.FXP R0, RCB, MIN_INT"]

for instr in instr_list:
    _, _, _, imem_word = rc.asmToHex(instr)
    print("ASM : " + instr + " --> Hex: " + imem_word.get_word_in_hex())

ASM : NOP --> Hex: 0x0
ASM : SADD VWR_A, VWR_A, VWR_B --> Hex: 0x420
ASM : SADD VWR_A, SRF(3), VWR_B --> Hex: 0xc420
ASM : LOR R0, RCB, MIN_INT --> Hex: 0x1f522
ASM : MUL.FXP R0, RCB, MIN_INT --> Hex: 0x1f5a2


### Multiplexer Control Unit
This unit is prepared to take care of all the indexes of the SRF and VWR that are accesed for loads or stores.

In [10]:
# --------------------------------------------
#      Multiplexer Control Unit (MXCU)
# --------------------------------------------
mxcu = MXCU()

instr_list = ["NOP", "SADD R1, ONE, LAST", "LOR R1, ONE, SRF(3)"]

for instr in instr_list:
    imem_word = mxcu.asmToHex(instr, -1, 0, 0, [0,0,0,0], 0)
    print("ASM : " + instr + " --> Hex: " + imem_word.get_word_in_hex())

ASM : NOP --> Hex: 0x0
ASM : SADD R1, ONE, LAST --> Hex: 0x5699000
ASM : LOR R1, ONE, SRF(3) --> Hex: 0x54690c0


## App Example
Now, let's see an example of a real program that adds two vectors.
For adding two vectors, you just need to add each element one by one until the end.
Let's assume the vectors have 128 elements, so they fit in one line of the SPM.
First, we store the values on the SPM.

## Generating code for kernels

### Process an existing kernel

Load an existing kernel (in the form of an excel sheet where each row is a clock cycle and each column is a specialized slot) and use the simulator to understand what is going on in each element at a given clock cycle. 

In [11]:
kernel_path = "kernels/mf_q64_erosion/"
df = pd.read_csv(kernel_path + "instructions_hex.csv")
print("The instruction memory has {0} entries.".format(len(df)))
df.head()

The instruction memory has 512 entries.


Unnamed: 0,LCU,LSU,MXCU,RC0,RC1,RC2,RC3,KMEM
0,0x0,0x5c49f,0x0,0x0,0x0,0x0,0x0,0x0
1,0x9c500,0x43d3f,0x180,0x0,0x0,0x0,0x0,0x802b
2,0x98fc0,0x4bd3f,0x40,0x0,0x0,0x0,0x0,0x0
3,0xf8f43,0x53c98,0x0,0x0,0x0,0x0,0x0,0x0
4,0xb8f80,0x539,0x4ce9000,0x0,0x0,0x0,0x0,0x0


Let's generate the assembly for this hexadecimal instructions so we understand better what is going on.

In [12]:
sim = SIMULATOR()
sim.compileHexToAsm(kernel_path)

Hex to ASM
Processing file: kernels/mf_q64_erosion/instructions_hex.csv...
Creating file: kernels/mf_q64_erosion/instructions_asm.csv


In [13]:
df = pd.read_csv(kernel_path + "instructions_asm.csv")
print("The instruction memory has {0} entries.".format(len(df)))
df.head()

The instruction memory has 512 entries.


Unnamed: 0,LCU,LSU,MXCU,RC0,RC1,RC2,RC3
0,NOP,"LOR R7, SRF(0), ZERO/LD.VWR SRF",NOP,NOP,NOP,NOP,NOP
1,"SSUB R0, SRF(6), ONE","SADD R7, R7, ONE/LD.VWR VWR_A",NOP,NOP,NOP,NOP,NOP
2,"LOR R3, SRF(1), ZERO","SADD R7, R7, ONE/LD.VWR VWR_B",NOP,NOP,NOP,NOP,NOP
3,"LOR R1, 3, ZERO","LOR R0, R7, ZERO/LD.VWR VWR_C",NOP,NOP,NOP,NOP,NOP
4,"LOR R2, LAST, ZERO","SADD R1, R0, ONE/NOP","LOR R1, ZERO, ZERO",NOP,NOP,NOP,NOP


For example, let's make sure that the last instruction of the LCU is an EXIT. For this we need to know some information about the kernel. In the hexadecimal it is provided as the KMEM column. And we also extract the kernel_number from which line has the instruction.

In [14]:
kernel_number = 1 # Asign a number for the kernel (coherent with the KMEM)
hex_word = "0x802b"
nInstr, _, _, _ = KMEM_WORD(hex_word=hex_word).decode_word()
print("Last instruction for LCU is: " + df.iloc[nInstr]['LCU'])

Last instruction for LCU is: EXIT


## Load a kernel

To load a kernel into the IMEM of the VWR2A we need to know some info about it.
 - The kernel number
 - How many and which columns it uses
 - How many instructions per column it has
 - The position where it starts in the IMEM
 - The postition in the SPM where the SRF initial values are

In [15]:
kernel_path = "kernels/add_vectors/"
kernel_number = 1 # Kernel number (from 1 to 15)
column_usage = [True, False] # Columns to use
nInstrPerCol = 37 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF

sim = SIMULATOR()
sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number)

Now, let's generate assembly for it so we clearly see the adds.

In [16]:
sim.compileHexToAsm(kernel_path)
df = pd.read_csv(kernel_path + "instructions_asm.csv")
print("The instruction memory has {0} entries.".format(len(df)))
df.head(10)

Hex to ASM
Processing file: kernels/add_vectors/instructions_hex.csv...
Creating file: kernels/add_vectors/instructions_asm.csv
The instruction memory has 512 entries.


Unnamed: 0,LCU,LSU,MXCU,RC0,RC1,RC2,RC3
0,NOP,"LOR R7, ONE, ZERO/LD.VWR SRF",NOP,NOP,NOP,NOP,NOP
1,NOP,"SADD R7, ONE, R7/LD.VWR VWR_A",NOP,NOP,NOP,NOP,NOP
2,NOP,"SADD R7, ONE, R7/LD.VWR VWR_B","LOR R0, ZERO, ZERO",NOP,NOP,NOP,NOP
3,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
4,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
5,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
6,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
7,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
8,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"
9,NOP,NOP/NOP,"SADD R0, ONE, R0","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B","SADD VWR_C, VWR_A, VWR_B"


We can see that the same action, adding two elements, is performed one and once again. Let's handle that with a loop.

### Using loops
It's your time to play. Try to use a loop to reduce the number of instructions.
(You can think about it or check the solution in the provided assembly instructions version 2.)

## Running code

Now, let's run an application to see the outputs and check if the result is the expected.
We will use, the vectors addition example once again. So, let's load it.

In [17]:
sim = SIMULATOR()

# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = './kernels/add_vectors/'
kernel_number = 1 
column_usage = [True, False] 
nInstrPerCol = 6 
imem_add_start = 0 
srf_spm_addres = 0 
version="_v2"

sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number)

Now, we need to populate the SPM with the values of our vectors.

In [18]:
# --------------------------------------------
#                LOAD SPM DATA
# --------------------------------------------
# Load vector A
vector_A = [i for i in range(SPM_NWORDS)]
nline = 1
sim.setSPMLine(nline, vector_A)
# Load vector B
vector_B = [i for i in range(SPM_NWORDS)]
nline = 2
sim.setSPMLine(nline, vector_B)

sim.displaySPMLine(1)
sim.displaySPMLine(2)

SPM 1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, ]
SPM 2: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 11

Now, let's compile the assembly to hexadecimal since it's needed to run the code.

In [19]:
# --------------------------------------------
#              COMPILE ASM TO HEX
# --------------------------------------------
sim.compileAsmToHex(kernel_path, kernel_number, version=version)

ASM to Hex
Processing file: ./kernels/add_vectors/instructions_asm_v2.csv...
Creating file: ./kernels/add_vectors/dsip_bitstream.h
Creating file: ./kernels/add_vectors/instructions_hex_v2_autogen.csv


Finally, we load the kernel into the internal memory of the specialized units and run it.

In [20]:
# --------------------------------------------
#                 LOAD KERNEL
# --------------------------------------------

# This needs the hex instructions, if you don't provide them, generate then compiling the asm
sim.kernel_load(kernel_path, version=version + "_autogen", kernel_number=kernel_number)

# --------------------------------------------
#               SIMULATE EXECUTION
# --------------------------------------------
show_lcu = []
show_srf = []
show_lsu = []
show_rcs = [[],[],[],[]]
show_mxcu = []
display_ops = [show_lcu, show_lsu, show_mxcu, show_rcs, show_srf]

sim.run(kernel_number, display_ops=display_ops)

Processing file: ./kernels/add_vectors/instructions_hex_v2_autogen.csv...
---------------------
       PC: 0
---------------------
LSU: LOR R7, ONE, ZERO/LD.VWR SRF --> ALU res = 1
RC0: NOP --> ALU res = 0
RC1: NOP --> ALU res = 0
RC2: NOP --> ALU res = 0
RC3: NOP --> ALU res = 0
MXCU: LOR R5, LAST, ZERO (VWR selected: 0, not writting SRF) --> ALU res = 31
LCU: NOP --> ALU res = 0
---------------------
       PC: 1
---------------------
LSU: SADD R7, ONE, R7/LD.VWR VWR_A --> ALU res = 2
RC0: NOP --> ALU res = 0
RC1: NOP --> ALU res = 0
RC2: NOP --> ALU res = 0
RC3: NOP --> ALU res = 0
MXCU: LOR R6, LAST, ZERO (VWR selected: 0, not writting SRF) --> ALU res = 31
LCU: NOP --> ALU res = 0
---------------------
       PC: 2
---------------------
LSU: NOP/NOP --> ALU res = 0
RC0: NOP --> ALU res = 0
RC1: NOP --> ALU res = 0
RC2: NOP --> ALU res = 0
RC3: NOP --> ALU res = 0
MXCU: LOR R7, LAST, ZERO (VWR selected: 0, not writting SRF) --> ALU res = 31
LCU: NOP --> ALU res = 0
----------------

Let's check that we have the correct output in the SPM line just by looking at it.

In [21]:
sim.displaySPMLine(1)
sim.displaySPMLine(2)
sim.displaySPMLine(3)

SPM 1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, ]
SPM 2: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 11

We can check it more rigorously. We can define our function in python and check that the output matches the CGRA output.

In [22]:
sim.displaySPMLine(3)
vwr2a_res = sim.getSPMLine(3)
errors_idx = []
for i in range(len(vector_A)):
    if vector_A[i] + vector_B[i] != vwr2a_res[i]:
        errors_idx.append(i)
if len(errors_idx) == 0:
    print("The result is correct!")
else:
    print("Oops, something went wrong. There are " + str(len(errors_idx)) + " errors.")
    print(errors_idx)


SPM 3: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, ]
The result is correct!


Now it's your time to play!

In [23]:
#Let's do it!

# Appendix
For the rest of the examples, in order to run them some info for the kernel configuration is needed. It can be decoded from the hexadecimal words in the column KMEM.

### Exit

In [24]:
# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = './kernels/exit/' 
kernel_number = 1
column_usage = [True, False] # Columns to use
nInstrPerCol = 1 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF
version = ""

### FFT

In [25]:
# Add to KMEM the word and decode it
kmem = KMEM()

kmem_pos_1 = 1
kmem_word_1 = 0x18026

kmem_pos_2 = 2
kmem_word_2 = 0x393b0

kmem.imem.set_word(kmem_word_1, kmem_pos_1)
kmem.imem.set_word(kmem_word_2, kmem_pos_2)
print("Kernel 1")
kmem.imem.get_kernel_info(kmem_pos_1)
print("Kernel 2")
kmem.imem.get_kernel_info(kmem_pos_2)

Kernel 1
This kernel uses 39 instruction words starting at IMEM address 0.
It uses column(s): both.
The SRF is located in SPM bank 0.
Kernel 2
This kernel uses 49 instruction words starting at IMEM address 78.
It uses column(s): both.
The SRF is located in SPM bank 1.


In [26]:
# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = './kernels/fft/' 
kernel_number = 1 # Kernel number (from 1 to 15)
column_usage = [True, True] # Columns to use
nInstrPerCol = 39 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF
version=""

sim = SIMULATOR()
sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number)
sim.compileHexToAsm(kernel_path)
sim.compileAsmToHex(kernel_path, kernel_number, version=version)

Hex to ASM
Processing file: ./kernels/fft/instructions_hex.csv...
Creating file: ./kernels/fft/instructions_asm.csv
ASM to Hex
Processing file: ./kernels/fft/instructions_asm.csv...
Creating file: ./kernels/fft/dsip_bitstream.h
Creating file: ./kernels/fft/instructions_hex_autogen.csv


### MF_Q64_EROSION

In [27]:
# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = './kernels/mf_q64_erosion/' 
kernel_number = 1 # Kernel number (from 1 to 15)
column_usage = [True, False] # Columns to use
nInstrPerCol = 44 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF
version=""

sim = SIMULATOR()
sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number)
sim.compileHexToAsm(kernel_path)
sim.compileAsmToHex(kernel_path, kernel_number, version=version)

Hex to ASM
Processing file: ./kernels/mf_q64_erosion/instructions_hex.csv...
Creating file: ./kernels/mf_q64_erosion/instructions_asm.csv
ASM to Hex
Processing file: ./kernels/mf_q64_erosion/instructions_asm.csv...
Creating file: ./kernels/mf_q64_erosion/dsip_bitstream.h
Creating file: ./kernels/mf_q64_erosion/instructions_hex_autogen.csv
