# Develop a new VWR2A kernel using the VWR2A simulator
This notebook illustrates how to use the simulator both for decoding existing VWR2A kernels (using isntructions from the morphological filter erosion example), as well as writing your own kernels by translating human-readable variables into hexadacimal ISA instructions for each specialized slot. At the end, we develop a working kernel that adds two vectors together.

Note: For now, only one-column kernels are supported.

In [1]:
%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd 
import sys, os
from src import *
from helpers import *

# ISAs for specialized slots
First, we set up objects for each specialized slot of the VWR2A (i.e. kernel configuration, Load Store Unit, Reconfigurable Cells, etc.) and show how the instructions are decoded and encoded. For detailed descriptions of what function each ISA component does in each specialized slot, please see ``vwr2a_docs/vwr2a_ISA.docx``.

### KERNEL CONFIGURATION
Set up an application to be accelerated by loading the different kernel configurations into the kernel memory

In [2]:
# Load an existing kernel into memory
kmem_pos = 1
kmem_word = 0x802b

kmem = KMEM()
kmem.imem.set_word(kmem_word, kmem_pos)
kmem.imem.get_kernel_info(kmem_pos)

This kernel uses 44 instruction words starting at IMEM address 0.
It uses column(s): 0.
The SRF is located in SPM bank 0.


In [3]:
# Create a new kernel
kmem_pos = 2

# Kernel configuration parameters
num_instructions=1
imem_add_start=0
col_one_hot=1
srf_spm_addres=0

kmem.imem.set_params(num_instructions, imem_add_start, col_one_hot, srf_spm_addres, kmem_pos)
print("Hex representation: " + kmem.imem.get_word_in_hex(kmem_pos))
kmem.imem.get_kernel_info(kmem_pos)

Hex representation: 0x8000
This kernel uses 1 instruction words starting at IMEM address 0.
It uses column(s): 0.
The SRF is located in SPM bank 0.


### Loop Control Unit IMEM 

In [4]:
# Load an existing imem word and decode it
imem_pos = 1
imem_word = 0xd9c00

lcu_imem = LCU_IMEM()
lcu_imem.set_word(imem_word, imem_pos)
asm = lcu_imem.get_instruction_asm(imem_pos)
print("Assembly representation: " + asm)

Assembly representation: EXIT


In [5]:
# Create a new instruction
imem_pos = 2

# Define instruction parameters
imm=3
rf_wsel=2
rf_we=1
alu_op=LCU_ALU_OPS.SADD
br_mode=0
muxb_sel=LCU_MUXB_SEL.ONE
muxa_sel=LCU_MUXA_SEL.ZERO

lcu_imem.set_params(imm, rf_wsel, rf_we, alu_op, br_mode, muxb_sel, muxa_sel, imem_pos)
print("Hex representation: " + lcu_imem.get_word_in_hex(imem_pos))
asm = lcu_imem.get_instruction_asm(imem_pos)
print("Assembly representation: " + asm)
lcu_imem.get_instruction_info(imem_pos)


Hex representation: 0xdc383
Assembly representation: SADD R2, ONE, ZERO
Immediate value: 3
LCU is in loop control mode
Performing ALU operation SADD between operands ZERO and ONE
Writing ALU result to LCU register 2


### Load Store Unit IMEM 

In [6]:
# Load an existing imem word and decode it
imem_pos = 1
imem_word = 0x4c80

lsu_imem = LSU_IMEM()
lsu_imem.set_word(imem_word, imem_pos)
print("Hex representation: " + lsu_imem.get_word_in_hex(imem_pos))
asm = lsu_imem.get_instruction_asm(imem_pos)
print("Assembly representation: " + asm)
lsu_imem.get_instruction_info(imem_pos)

Hex representation: 0x4c80
Assembly representation: LAND SRF(?), ZERO, ZERO/NOP
No loading, storing, or shuffling taking place
Performing ALU operation LAND between operands ZERO and ZERO
No LSU registers are being written


In [17]:
# Create a new instruction
imem_pos = 2

# Define instruction parameters
rf_wsel=2
rf_we=1
alu_op=LSU_ALU_OPS.SRL
muxb_sel=LSU_MUX_SEL.R5
muxa_sel=LSU_MUX_SEL.TWO
vwr_shuf_op=SHUFFLE_SEL.CONCAT_BITREV_UPPER
vwr_shuf_sel=LSU_MEM_OP.SHUFFLE

lsu_imem.set_params(rf_wsel, rf_we, alu_op, muxb_sel, muxa_sel, vwr_shuf_op, vwr_shuf_sel, imem_pos)
print("Hex representation: " + lsu_imem.get_word_in_hex(imem_pos))
asm = lsu_imem.get_instruction_asm(imem_pos)
print("Assembly representation: " + asm)
lsu_imem.get_instruction_info(imem_pos)

Hex representation: 0xe5aea
Assembly representation: SRL R2, TWO, R5/SH.BRE.UP
No loading, storing, or shuffling taking place
Performing ALU operation SRL between operands TWO and R5
Writing ALU result to LSU register 2


### Multiplexer Control Unit IMEM

In [8]:
# Load an existing imem word and decode it
imem_pos = 1
imem_word = 0x40

mxcu_imem = MXCU_IMEM()
mxcu_imem.set_word(imem_word, imem_pos)
mxcu_imem.get_instruction_info(imem_pos)

Not writing to VWRs
Reading from SRF index 1
Performing ALU operation NOP between operands R0 and R0
No MXCU registers are being written


In [9]:
# Create a new instruction
imem_pos = 2

# Define instruction parameters
vwr_row_we = [1, 1, 1, 0]
vwr_sel = MXCU_VWR_SEL.VWR_B.value
srf_sel = 3
alu_srf_write = ALU_SRF_WRITE.MXCU
srf_we = 1
rf_wsel = 0 
rf_we = 0 
alu_op =  MXCU_ALU_OPS.SADD
muxb_sel = MXCU_MUX_SEL.R0
muxa_sel = MXCU_MUX_SEL.TWO

mxcu_imem.set_params(vwr_row_we, vwr_sel, srf_sel, alu_srf_write, srf_we, rf_wsel, rf_we, alu_op, muxb_sel, muxa_sel, imem_pos)
mxcu_imem.get_instruction_info(imem_pos)

Writing to VWR rows [1 2 3] of VWR_B
Writing from MXCU ALU to SRF register 3
Performing ALU operation SADD between operands TWO and R0
No MXCU registers are being written


### Reconfigurable Cell IMEM

In [10]:
# Load an existing imem word and decode it
imem_pos = 1
imem_word = 0xe923

rc_imem = RC_IMEM()
rc_imem.set_word(imem_word, imem_pos)
rc_imem.get_instruction_info(imem_pos)

Performing ALU operation LOR between operands SRF and ZERO
ALU is performing operations with 32-bit precision
Writing ALU result to RC register 1


In [11]:
# Create a new instruction
imem_pos = 2

# Define instruction parameters
rf_wsel = 1 
rf_we = 1 
muxf_sel = RC_MUXF_SEL.RCT 
alu_op =  RC_ALU_OPS.INB_SF_INA
op_mode = 0 #Always keep this to zero; 16-bit mode is not supported yet
muxb_sel =  RC_MUX_SEL.VWR_A
muxa_sel = RC_MUX_SEL.VWR_B

rc_imem.set_params(rf_wsel, rf_we, muxf_sel, alu_op, op_mode, muxb_sel, muxa_sel, imem_pos)
rc_imem.get_instruction_info(imem_pos)

Output VWR_B if sign flag of RCT == 1, else output VWR_A
Writing ALU result to RC register 1


## Putting it all together: Instruction memory

### Process existing kernel
Load an existing kernel (in the form of an excel sheet where each row is a clock cycle and each column is a specialized slot) and use the simulator to understand what is going on in each element at a given clock cycle

In [12]:
from src.simulator import *
sim = SIMULATOR()

# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = './kernels/mf_q64_erosion/' # Path
kernel_number = 1 # Kernel number (from 1 to 15)
column_usage = [True, False] # Columns to use
nInstrPerCol = 44 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF

sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number) # Kernel info

# --------------------------------------------
#                 LOAD KERNEL
# --------------------------------------------

# This needs the hex instructions, if you don't provide them, generate then compiling the asm
sim.kernel_load(kernel_path, kernel_number=kernel_number)

Processing file: ./kernels/mf_q64_erosion/instructions_hex.csv...


In [13]:
# Make sure that the last kernel value is the exit instruction
nInstrPerCol, imem_add_start, col_one_hot, srf_spm_address = sim.vwr2a.kmem.imem.get_params(kernel_number)
nInstrPerCol+=1

if col_one_hot == 3:
    asm_word = sim.vwr2a.imem.lcu_imem[2*nInstrPerCol + imem_add_start -1].get_word_in_asm()
else:
    asm_word = sim.vwr2a.imem.lcu_imem[nInstrPerCol + imem_add_start -1].get_word_in_asm()
print(str(nInstrPerCol + imem_add_start) + ": " + asm_word)

# for i in range(nInstrPerCol):
#     asm_word = sim.vwr2a.imem.lcu_imem[i].get_word_in_asm()
#     print(str(i) +": " + asm_word)


44: EXIT


### Load a 2-column kernel
Kernels can use either one column of the CGRA, or both in parallel. The FFT example uses both.

In [14]:
sim = SIMULATOR()

# --------------------------------------------
#               KERNEL CONFIGURATION
# --------------------------------------------
kernel_path = "kernels/fft/"
kernel_number = 1 # Kernel number (from 1 to 15)
column_usage = [True, True] # Columns to use
nInstrPerCol = 39 # Number of instructions per column
imem_add_start = 0 # Start address on imem for this kernel
srf_spm_addres = 0 # Line of the SPM with the initial data for the SRF

sim.kernel_config(column_usage, nInstrPerCol, imem_add_start, srf_spm_addres, kernel_number)

# --------------------------------------------
#                 LOAD KERNEL
# --------------------------------------------

# This needs the hex instructions, if you don't provide them, generate then compiling the asm
sim.kernel_load(kernel_path, kernel_number=kernel_number)

Processing file: kernels/fft/instructions_hex.csv...


In [15]:
# Get the info of the kernel
sim.vwr2a.kmem.imem.get_kernel_info(kernel_number)

This kernel uses 39 instruction words starting at IMEM address 0.
It uses column(s): both.
The SRF is located in SPM bank 0.


In [16]:
# Note that now, the second column is populated with non-default instructions
lcu_asm_words = []
for i in range(nInstrPerCol, nInstrPerCol*2):
    lcu_asm_words.append(sim.vwr2a.imem.lcu_imem[i].get_word_in_asm())
print("LCU:")
print(lcu_asm_words)

LCU:
['NOP', 'LOR R0, ZERO, SRF(?)', 'SADD R2, ONE, R0', 'NOP', 'NOP', 'NOP', 'NOP', 'NOP', 'NOP', 'LOR R3, ZERO, LAST', 'NOP', 'BGEPD ZERO, R3, 10', 'LOR R3, ZERO, LAST', 'LOR R1, ZERO, ZERO', 'BGEPD ZERO, R3, 14', 'LOR R3, ZERO, LAST', 'LOR R1, ONE, ZERO', 'NOP', 'BGEPD ZERO, R3, 17', 'NOP', 'NOP', 'NOP', 'LOR R1, ZERO, ZERO', 'SRL R2, ONE, R2', 'SSUB R3, ONE, R2', 'LOR R3, ZERO, R0', 'LOR R1, ONE, ZERO', 'NOP', 'BEQ ZERO, R1, 32', 'NOP', 'NOP', 'BEQ ZERO, ZERO, 33', 'NOP', 'BGEPD ZERO, R3, 27', 'NOP', 'NOP', 'BEQ ONE, R1, 38', 'BEQ ZERO, ZERO, 14', 'BEQ ZERO, ZERO, 17']


### Now, your turn!
Either use the example above to make a new kernel, or see how we can optimize this one. Can you think of a way to avoid repeating the add instruction 32 times (HINT: use the LCU).