<br />
<div align="center">
  <a href="https://deepwok.github.io/">
    <img src="../imgs/deepwok.png" alt="Logo" width="160" height="160">
  </a>

  <h1 align="center">Lab 4 for Advanced Deep Learning Systems (ADLS) - Hardware Stream</h1>

  <p align="center">
    ELEC70109/EE9-AML3-10/EE9-AO25
    <br />
		Written by
    <a href="https://aaron-zhao123.github.io/">Aaron Zhao, Pedro Gimenes </a>
  </p>
</div>

# General introduction

In this lab, you will learn how to emit SystemVerilog code for a neural network that's been transformed and optimized by MASE. Then, you'll design some hardware for a new Pytorch layer, and simulate the hardware using your new module.

# The Hardware Emit pass

The `emit_verilog` transform pass generates a top-level RTL file and testbench file according to the `MaseGraph`, which includes a hardware implementation of each layer in the network. This top-level file instantiates modules from the `components` library in MASE and/or modules generated using [HLS](https://en.wikipedia.org/wiki/High-level_synthesis), when internal components are not available. The hardware can then be simulated using [Verilator](https://www.veripool.org/verilator/), or deployed on an FPGA.

First, add Machop to your system PATH (if you haven't already done so) and import the required libraries.

In [1]:
import os, sys
import torch
torch.manual_seed(0)

from chop.ir.graph.mase_graph import MaseGraph

from chop.passes.graph.analysis import (
    init_metadata_analysis_pass,
    add_common_metadata_analysis_pass,
    add_hardware_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
    report_node_type_analysis_pass,
)

from chop.passes.graph.transforms import (
    emit_verilog_top_transform_pass,
    emit_internal_rtl_transform_pass,
    emit_bram_transform_pass,
    emit_cocotb_transform_pass,
    quantize_transform_pass,
)

from chop.tools.logger import set_logging_verbosity

set_logging_verbosity("debug")

import toml
import torch
import torch.nn as nn

# TO DO: remove
import os
os.environ["PATH"] = "/opt/homebrew/bin:" + os.environ["PATH"]
!verilator

  from .autonotebook import tqdm as notebook_tqdm
[32mINFO    [0m [34mSet logging level to debug[0m


Usage:
        verilator --help
        verilator --version
        verilator --binary -j 0 [options] [source_files.v]... [opt_c_files.cpp/c/cc/a/o/so]
        verilator --cc [options] [source_files.v]... [opt_c_files.cpp/c/cc/a/o/so]
        verilator --sc [options] [source_files.v]... [opt_c_files.cpp/c/cc/a/o/so]
        verilator --lint-only -Wall [source_files.v]...



Now, define the neural network. We're using a model which can be used to perform digit classification on the MNIST dataset.

In [2]:
class MLP(torch.nn.Module):
    """
    Toy FC model for digit recognition on MNIST
    """

    def __init__(self) -> None:
        super().__init__()

        self.fc1 = nn.Linear(16, 16)

    def forward(self, x):
        x = torch.flatten(x, start_dim=1, end_dim=-1)
        x = torch.nn.functional.relu(self.fc1(x),inplace=False)
        # x = torch.nn.functional.leaky_relu(self.fc1(x),negative_slope=0.5,inplace=False)
        return x

In [3]:
import numpy as np
def float_to_fixed(x, scaling_factor, total_bits=8):
    """
    Convert a floating-point numpy array to fixed-point representation.

    Parameters:
      x            : numpy array of floats.
      scaling_factor: Factor to scale the float values (2^frac_bits).
      total_bits   : Total number of bits (default is 8).

    Returns:
      A numpy array of type np.int8 representing the fixed-point values.
    """
    # Scale the floating-point values.
    x_scaled = x * scaling_factor

    # Round to the nearest integer.
    x_fixed = np.trunc(x_scaled)

    # Define the representable range for signed 8-bit integers.
    min_val = -2**(total_bits - 1)       # -128 for 8 bits
    max_val = 2**(total_bits - 1) - 1      #  127 for 8 bits

    # Clip values that exceed the representable range.
    x_fixed = np.clip(x_fixed, min_val, max_val)

    # Return as np.int8.
    return x_fixed.astype(np.int8)


In [4]:

frac_bits = 8  # for example
scale = 2 ** frac_bits



In [5]:
# import numpy as np

# weight_np = np.array([
#     [-0.0037,  0.2682, -0.4115, -0.3680],
#     [-0.1926,  0.1341, -0.0099,  0.3964],
#     [-0.0444,  0.1323, -0.1511, -0.0983],
#     [-0.4777, -0.3311, -0.2061,  0.0185],
#     [ 0.1977,  0.3000, -0.3390, -0.2177],
#     [ 0.1816,  0.4152, -0.1029,  0.3742],
#     [-0.0806,  0.0529,  0.4527, -0.4638],
#     [-0.3148, -0.1266, -0.1949,  0.4320]
# ])  # shape (8,4)

# bias_np = np.array([-0.3241, -0.2302, -0.3493, -0.4683,
#                     -0.2919,  0.4298,  0.2231,  0.2423])  # shape (8,)

# # Convert them to PyTorch tensors of the correct shape and dtype
# weight_torch = torch.tensor(weight_np, dtype=torch.float32)
# bias_torch   = torch.tensor(bias_np,   dtype=torch.float32)


# model = MLP()

# # Overwrite the linear layer’s weight/bias data
# # Note: PyTorch layers store weight/bias in .weight and .bias
# with torch.no_grad():
#     model.fc1.weight.copy_(weight_torch)
#     model.fc1.bias.copy_(bias_torch)

# # Input as a NumPy array, shape = (1, 4)
# input_np = np.array([[0.4963, 0.7682, 0.0885, 0.1320]])

# # Convert to PyTorch tensor
# input_torch = torch.tensor(input_np, dtype=torch.float32)

# # Forward pass
# output = model(input_torch)
# print("Output:\n", output)

In [6]:
import torch

print("Supported quantized engines:", torch.backends.quantized.supported_engines)
print("Current quantized engine:   ", torch.backends.quantized.engine)

Supported quantized engines: ['qnnpack', 'none', 'onednn']
Current quantized engine:    onednn


Now, we'll generate a MaseGraph and add metadata. 

In [7]:
mlp = MLP()
mg = MaseGraph(model=mlp)

# Provide a dummy input for the graph so it can use for tracing
batch_size = 1
x = torch.randn((batch_size, 4, 4))
print(x)
dummy_in = {"x": x}

mg, _ = init_metadata_analysis_pass(mg, None)
mg, _ = add_common_metadata_analysis_pass(
    mg, {"dummy_in": dummy_in, "add_value": False}
)

[36mDEBUG   [0m [34mgraph():
    %x : [num_users=1] = placeholder[target=x]
    %flatten : [num_users=1] = call_function[target=torch.flatten](args = (%x,), kwargs = {start_dim: 1, end_dim: -1})
    %fc1 : [num_users=1] = call_module[target=fc1](args = (%flatten,), kwargs = {})
    %relu : [num_users=1] = call_function[target=torch.nn.functional.relu](args = (%fc1,), kwargs = {inplace: False})
    return relu[0m


tensor([[[-0.2340,  0.7073,  0.5800,  0.2683],
         [-2.0589,  0.5340, -0.5354, -0.8637],
         [-0.0235,  1.1717,  0.3987, -0.1987],
         [-1.1559, -0.3167,  0.9403, -1.1470]]])
Hellos in add_common_metadata
sigoyi in add_common_metadata
nihhhhhhhhh in graph_iterator_for_mase_ops in add_common_metadata
nihhhhhhhhh in graph_iterator_for_mase_ops in add_common_metadata
graph_model:  GraphModule(
  (fc1): Linear(in_features=16, out_features=16, bias=True)
)



def forward(self, x):
    flatten = torch.flatten(x, start_dim = 1, end_dim = -1);  x = None
    fc1 = self.fc1(flatten);  flatten = None
    relu = torch.nn.functional.relu(fc1, inplace = False);  fc1 = None
    return relu
    
# To see more debug info, please use `graph_module.print_readable()`
wocao in add_common_metadata
graph_iterator_for_metadata:  GraphModule(
  (fc1): Linear(in_features=16, out_features=16, bias=True)
)



def forward(self, x):
    flatten = torch.flatten(x, start_dim = 1, end_dim = -1);  x = No

Before running `emit_verilog`, we'll quantize the model to fixed precision. Refer back to [lab 3](https://deepwok.github.io/mase/modules/labs_2023/lab3.html) if you've forgotten how this works. Check that the data type for each node is correct after quantization.

In [8]:
config_file = os.path.join(
    os.path.abspath(""),
    "..",
    "..",
    "configs",
    "tests",
    "quantize",
    "fixed.toml",
)
with open(config_file, "r") as f:
    quan_args = toml.load(f)["passes"]["quantize"]
mg, _ = quantize_transform_pass(mg, quan_args)

_ = report_node_type_analysis_pass(mg)

# Update the metadata
for node in mg.fx_graph.nodes:
    for arg, arg_info in node.meta["mase"]["common"]["args"].items():
        if isinstance(arg_info, dict):
            arg_info["type"] = "fixed"
            arg_info["precision"] = [8, 5]
    for result, result_info in node.meta["mase"]["common"]["results"].items():
        if isinstance(result_info, dict):
            result_info["type"] = "fixed"
            result_info["precision"] = [8, 5]

[32mINFO    [0m [34mInspecting graph [add_common_node_type_analysis_pass][0m
[32mINFO    [0m [34m
Node name    Fx Node op     Mase type            Mase op      Value type
-----------  -------------  -------------------  -----------  ------------
x            placeholder    placeholder          placeholder  NA
flatten      call_function  implicit_func        flatten      float
fc1          call_module    module_related_func  linear       fixed
relu         call_function  module_related_func  relu         fixed
output       output         output               output       NA[0m


placeholder
flatten
node_config:  {'name': 'fixed', 'data_in_width': 8, 'data_in_frac_width': 5, 'weight_width': 8, 'weight_frac_width': 5, 'bias_width': 8, 'bias_frac_width': 5, 'data_out_width': 8, 'data_out_frac_width': 5, 'floor': True}
node_config:  {'name': 'fixed', 'data_in_width': 8, 'data_in_frac_width': 5, 'weight_width': 8, 'weight_frac_width': 5, 'bias_width': 8, 'bias_frac_width': 5, 'data_out_width': 8, 'data_out_frac_width': 5, 'floor': True}
output


At this point, it's important to run the `add_hardware_metadata` analysis pass. This adds all the required metadata which is later used by the `emit_verilog` pass, including:

1. The node's toolchain, which defines whether we use internal Verilog modules from the `components` library or the HLS flow.
2. The Verilog parameters associated with each node.

> **_TASK:_** Read [this page](https://deepwok.github.io/mase/modules/chop/analysis/add_metadata.html#add-hardware-metadata-analysis-pass) for more information on the hardware metadata pass.

In [9]:
mg, _ = add_hardware_metadata_analysis_pass(mg)
for node in mg.nodes:
        mase_op = node.meta["mase"]["common"]["mase_op"]
        print ('mase_op:', mase_op)
        print ("common:",node.meta["mase"]["common"])
        print ("hardware:",node.meta["mase"]["hardware"])

for node in mg.fx_graph.nodes:
        if node.meta["mase"].parameters["hardware"]["is_implicit"]:
            continue
        # Only modules have internal parameters
        if node.meta["mase"].module is None:
            continue
        # print (node.meta["mase"].parameters["hardware"])
        # Only checks the hardware data that contains the key toolchain
        if "INTERNAL" in node.meta["mase"].parameters["hardware"]["toolchain"]:
                for param_name, parameter in node.meta["mase"].module.named_parameters():
                        print ("param_name in CNN.jynb:",param_name)
                        print ("parameter in CNN.jynb:", parameter)

"""
weights and bias in the Conv/linear in Maze
param_data = node.meta["mase"].module.get_parameter(param_name).data
print ("param_data: ", param_data)
"""

mase_op: linear
2222
mase_op: relu
2222
vp:  {}
arg in add_verilog_param: data_in_0
vp after for:  {'DATA_IN_0_PRECISION_0': 8, 'DATA_IN_0_PRECISION_1': 5, 'DATA_IN_0_TENSOR_SIZE_DIM_0': 16, 'DATA_IN_0_PARALLELISM_DIM_0': 4, 'DATA_IN_0_TENSOR_SIZE_DIM_1': 1, 'DATA_IN_0_PARALLELISM_DIM_1': 1}
arg in add_verilog_param: weight
vp after for:  {'DATA_IN_0_PRECISION_0': 8, 'DATA_IN_0_PRECISION_1': 5, 'DATA_IN_0_TENSOR_SIZE_DIM_0': 16, 'DATA_IN_0_PARALLELISM_DIM_0': 4, 'DATA_IN_0_TENSOR_SIZE_DIM_1': 1, 'DATA_IN_0_PARALLELISM_DIM_1': 1, 'WEIGHT_PRECISION_0': 8, 'WEIGHT_PRECISION_1': 5, 'WEIGHT_TENSOR_SIZE_DIM_0': 16, 'WEIGHT_PARALLELISM_DIM_0': 4, 'WEIGHT_TENSOR_SIZE_DIM_1': 16, 'WEIGHT_PARALLELISM_DIM_1': 4}
arg in add_verilog_param: bias
vp after for:  {'DATA_IN_0_PRECISION_0': 8, 'DATA_IN_0_PRECISION_1': 5, 'DATA_IN_0_TENSOR_SIZE_DIM_0': 16, 'DATA_IN_0_PARALLELISM_DIM_0': 4, 'DATA_IN_0_TENSOR_SIZE_DIM_1': 1, 'DATA_IN_0_PARALLELISM_DIM_1': 1, 'WEIGHT_PRECISION_0': 8, 'WEIGHT_PRECISION_1': 5,

'\nweights and bias in the Conv/linear in Maze\nparam_data = node.meta["mase"].module.get_parameter(param_name).data\nprint ("param_data: ", param_data)\n'

In [10]:
mg_soft, _ = add_software_metadata_analysis_pass(mg)
print (mg_soft)

<chop.ir.graph.mase_graph.MaseGraph object at 0x7ffff81e96d0>


Finally, run the emit verilog pass to generate the SystemVerilog files.

In [11]:
mg, _ = emit_verilog_top_transform_pass(mg)
mg, _ = emit_internal_rtl_transform_pass(mg)

[32mINFO    [0m [34mEmitting Verilog...[0m
[32mINFO    [0m [34mEmitting internal components...[0m


Parameter_map in VerilogEmitter: {'fc1_DATA_IN_0_PRECISION_0': 8, 'fc1_DATA_IN_0_PRECISION_1': 5, 'fc1_DATA_IN_0_TENSOR_SIZE_DIM_0': 16, 'fc1_DATA_IN_0_PARALLELISM_DIM_0': 4, 'fc1_DATA_IN_0_TENSOR_SIZE_DIM_1': 1, 'fc1_DATA_IN_0_PARALLELISM_DIM_1': 1, 'fc1_WEIGHT_PRECISION_0': 8, 'fc1_WEIGHT_PRECISION_1': 5, 'fc1_WEIGHT_TENSOR_SIZE_DIM_0': 16, 'fc1_WEIGHT_PARALLELISM_DIM_0': 4, 'fc1_WEIGHT_TENSOR_SIZE_DIM_1': 16, 'fc1_WEIGHT_PARALLELISM_DIM_1': 4, 'fc1_BIAS_PRECISION_0': 8, 'fc1_BIAS_PRECISION_1': 5, 'fc1_BIAS_TENSOR_SIZE_DIM_0': 16, 'fc1_BIAS_PARALLELISM_DIM_0': 4, 'fc1_BIAS_TENSOR_SIZE_DIM_1': 1, 'fc1_BIAS_PARALLELISM_DIM_1': 1, 'fc1_DATA_OUT_0_PRECISION_0': 8, 'fc1_DATA_OUT_0_PRECISION_1': 5, 'fc1_DATA_OUT_0_TENSOR_SIZE_DIM_0': 16, 'fc1_DATA_OUT_0_PARALLELISM_DIM_0': 4, 'fc1_DATA_OUT_0_TENSOR_SIZE_DIM_1': 1, 'fc1_DATA_OUT_0_PARALLELISM_DIM_1': 1, 'relu_DATA_IN_0_PRECISION_0': 8, 'relu_DATA_IN_0_PRECISION_1': 5, 'relu_DATA_IN_0_TENSOR_SIZE_DIM_0': 16, 'relu_DATA_IN_0_PARALLELISM_DIM_0

The generated files should now be found under `top/hardware`. 

> **_TASK:_** Read through `top/hardware/rtl/top.sv` and make sure you understand how our MLP model maps to this hardware design. 

You will notice the following instantiated modules:

* `fixed_linear`: this is found under `components/linear/fixed_linear.sv` and implements each Linear layer in the model.
* `fc<layer number>_weight/bias_source`: these are [BRAM](https://nandland.com/lesson-15-what-is-a-block-ram-bram/) memories which drive the weights and biases into the linear layers for computation.
* `fixed_relu`: found under `components/activations/fixed_relu.sv`, implements the ReLU activation.

As of now, we can't yet run a simulation on the model, as we haven't yet generated the memory components. To do this, run the `emit_bram` transform pass as follows, which will generate the memory initialization files and SystemVerilog modules to drive weights and biases into the linear layers. Finally, the `emit_verilog_tb` transform pass will generate the testbench files.


In [12]:
mg, _ = emit_bram_transform_pass(mg)

[32mINFO    [0m [34mEmitting BRAM...[0m
[36mDEBUG   [0m [34mEmitting DAT file for node: fc1, parameter: weight[0m


1 in emit_bram_transform_pass
/root/.mase/top/hardware/rtl
param_name in emit_bram_handshake: weight
parameter in emit_bram_handshake: Parameter containing:
tensor([[-0.0019,  0.1341, -0.2058, -0.1840, -0.0963,  0.0670, -0.0050,  0.1982,
         -0.0222,  0.0662, -0.0756, -0.0491, -0.2388, -0.1656, -0.1031,  0.0093],
        [ 0.0988,  0.1500, -0.1695, -0.1089,  0.0908,  0.2076, -0.0515,  0.1871,
         -0.0403,  0.0265,  0.2264, -0.2319, -0.1574, -0.0633, -0.0974,  0.2160],
        [-0.1620, -0.1151, -0.1747, -0.2341, -0.1459,  0.2149,  0.1116,  0.1212,
          0.0131, -0.1282,  0.0423, -0.2334, -0.1806, -0.1289,  0.1577,  0.1466],
        [-0.1109, -0.0090,  0.1599,  0.2485,  0.0992,  0.0338,  0.1676, -0.1472,
          0.0466, -0.1938, -0.1733, -0.1291,  0.1131,  0.1005, -0.1481,  0.0755],
        [ 0.1372, -0.0316,  0.0095,  0.0579,  0.1551,  0.2400, -0.1927, -0.0916,
          0.0983,  0.2071,  0.2176,  0.2206,  0.0498, -0.2174,  0.0230, -0.1564],
        [-0.2330,  0.2221,  

[36mDEBUG   [0m [34mROM module weight successfully written into /root/.mase/top/hardware/rtl/fc1_weight_source.sv[0m


out_size in emit_parameters_in_mem_internal: 16
fc1
param_data in emit_parameters_in_dat_internal:  tensor([[-0.0019,  0.1341, -0.2058, -0.1840, -0.0963,  0.0670, -0.0050,  0.1982,
         -0.0222,  0.0662, -0.0756, -0.0491, -0.2388, -0.1656, -0.1031,  0.0093],
        [ 0.0988,  0.1500, -0.1695, -0.1089,  0.0908,  0.2076, -0.0515,  0.1871,
         -0.0403,  0.0265,  0.2264, -0.2319, -0.1574, -0.0633, -0.0974,  0.2160],
        [-0.1620, -0.1151, -0.1747, -0.2341, -0.1459,  0.2149,  0.1116,  0.1212,
          0.0131, -0.1282,  0.0423, -0.2334, -0.1806, -0.1289,  0.1577,  0.1466],
        [-0.1109, -0.0090,  0.1599,  0.2485,  0.0992,  0.0338,  0.1676, -0.1472,
          0.0466, -0.1938, -0.1733, -0.1291,  0.1131,  0.1005, -0.1481,  0.0755],
        [ 0.1372, -0.0316,  0.0095,  0.0579,  0.1551,  0.2400, -0.1927, -0.0916,
          0.0983,  0.2071,  0.2176,  0.2206,  0.0498, -0.2174,  0.0230, -0.1564],
        [-0.2330,  0.2221,  0.1901, -0.2494,  0.0468, -0.0421, -0.0411, -0.1144,
    

[36mDEBUG   [0m [34mInit data weight successfully written into /root/.mase/top/hardware/rtl/fc1_weight_rom.dat[0m
[36mDEBUG   [0m [34mEmitting DAT file for node: fc1, parameter: bias[0m
[36mDEBUG   [0m [34mROM module bias successfully written into /root/.mase/top/hardware/rtl/fc1_bias_source.sv[0m
[36mDEBUG   [0m [34mInit data bias successfully written into /root/.mase/top/hardware/rtl/fc1_bias_rom.dat[0m


data_buff:  0004f9fafd020006ff02fefef8fbfd00
0305fbfd0307fe06ff0107f9fbfefd07
fbfcfaf9fb07040400fc01f9fafc0505
fc000508030105fb01fafafc0403fb02
04ff00020508fafd0307070702f901fb
f90706f801fffffc03fb03040603f8fb
0402fafb0805fdfef800fafa0001fd05
fb0705f9fe0001020300fc04f8fbfefc
fdf9fe02fb0006ff00ff0205080508ff
f9fc0500fcfaf9f9fe0404f805fafefd
fefef9f9ff00fc03f9ff07fd0703f905
fffc06fa01fe06020403fefef9040605
fa00fafcfb03fb000005fafbfb06fd07
030100fe01fe0001fafc06faff080300
f904fafefdff0007010705fb04fafd02
0306fd0004f806f900fffc0107fefbfd

param_name in emit_bram_handshake: bias
parameter in emit_bram_handshake: Parameter containing:
tensor([-0.2478,  0.1128, -0.1201, -0.1668, -0.1440,  0.1437,  0.1324,  0.1919,
         0.0907, -0.0835, -0.0699,  0.0739,  0.2055,  0.0680, -0.1183, -0.1175],
       requires_grad=True)
out_size in emit_parameters_in_mem_internal: 4
fc1
param_data in emit_parameters_in_dat_internal:  tensor([-0.2478,  0.1128, -0.1201, -0.1668, -0.1440,  0.1437,  0.1324,  0.19

In [13]:
mg, _ = emit_cocotb_transform_pass(mg)


[32mINFO    [0m [34mEmitting testbench...[0m


> **_TASK:_** Now, you're ready to launch a simulation by calling the simulate action as follows.

In [16]:
from chop.actions import simulate

simulate(skip_build=False, skip_test=False)

INFO: Running command perl /usr/local/bin/verilator -cc --exe -Mdir /workspace/docs/labs/sim_build -DCOCOTB_SIM=1 --top-module top --vpi --public-flat-rw --prefix Vtop -o top -LDFLAGS '-Wl,-rpath,/usr/local/lib/python3.11/dist-packages/cocotb/libs -L/usr/local/lib/python3.11/dist-packages/cocotb/libs -lcocotbvpi_verilator' -Wno-fatal -Wno-lint -Wno-style --trace-fst --trace-structs --trace-depth 3 -I/root/.mase/top/hardware/rtl -I/workspace/src/mase_components/interface/rtl -I/workspace/src/mase_components/language_models/rtl -I/workspace/src/mase_components/memory/rtl -I/workspace/src/mase_components/vivado/rtl -I/workspace/src/mase_components/convolution_layers/rtl -I/workspace/src/mase_components/cast/rtl -I/workspace/src/mase_components/systolic_arrays/rtl -I/workspace/src/mase_components/scalar_operators/rtl -I/workspace/src/mase_components/transformer_layers/rtl -I/workspace/src/mase_components/common/rtl -I/workspace/src/mase_components/hls/rtl -I/workspace/src/mase_components/v

[32mINFO    [0m [34mBuild finished. Time taken: 7.35s[0m


make: Leaving directory '/workspace/docs/labs/sim_build'
cmd: [['/workspace/docs/labs/sim_build/top']]
INFO: Running command /workspace/docs/labs/sim_build/top in directory /workspace/docs/labs/sim_build
     -.--ns INFO     gpi                                ..mbed/gpi_embed.cpp:76   in set_program_name_in_venv        Did not detect Python virtual environment. Using system-wide Python interpreter
     -.--ns INFO     gpi                                ../gpi/GpiCommon.cpp:101  in gpi_print_registered_impl       VPI registered
     0.00ns INFO     cocotb                             Running on Verilator version 5.020 2024-01-01
     0.00ns INFO     cocotb                             Running tests with cocotb v1.8.0 from /usr/local/lib/python3.11/dist-packages/cocotb
     0.00ns INFO     cocotb                             Seeding Python random module with 1741204017
     0.00ns INFO     cocotb.regression                  Found test mase_top_tb.test.test
     0.00ns INFO     cocotb.regres

  self._thread = cocotb.scheduler.add(self._send_thread())
  self._thread = cocotb.scheduler.add(self._recv_thread())


in_tensors:  (1, 2, 3, 3)
GraphModule(
  (fc1): LinearInteger(in_features=16, out_features=16, bias=True)
)



def forward(self, x):
    flatten = torch.flatten(x, start_dim = 1, end_dim = -1);  x = None
    fc1 = self.fc1(flatten);  flatten = None
    mul = fc1 * 32;  fc1 = None
    round_1 = mul.round();  mul = None
    clamp = round_1.clamp(min = 0, max = 255);  round_1 = None
    truediv = clamp / 32;  clamp = None
    relu = torch.nn.functional.relu(truediv, inplace = False);  truediv = None
    return relu
    
# To see more debug info, please use `graph_module.print_readable()`
load_drivers in emit_tb.py
    40.00ns INFO     cocotb.regression                  test failed
                                                        Traceback (most recent call last):
                                                          File "/root/.mase/top/hardware/test/mase_top_tb/test.py", line 26, in test
                                                            tb.load_drivers(in_tensors)
 

[32mINFO    [0m [34mTest finished. Time taken: 11.21s[0m


- :0: Verilog $finish
INFO: Results file: /workspace/docs/labs/sim_build/results.xml


The `simulate` action creates a `dump.vcd` file within the `sim_build` directory, which contains the waveform trace of the simulation. The waveforms can be opened with a viewer like GTKWave.

> **TASK**: Follow the instructions [here](https://gtkwave.sourceforge.net/) to install GTKWave on your platform, then open the generated trace file to inspect the signals in the simulation.

# Main Task

Pytorch has a number of layers which are available to users to define neural network models. At the moment, `emit_verilog` supports generating Verilog for models including Linear layers and the ReLU activation.

> **_MAIN TASK:_** choose another layer type from the [Pytorch list](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) and write a SystemVerilog file to implement that layer in hardware. Then, change the generated `top.sv` file to inject that layer within the design. For example, you may replace the ReLU activations with [Leaky ReLU](https://pytorch.org/docs/stable/generated/torch.nn.RReLU.html#torch.nn.RReLU). Re-run the simulation and observe the effect on latency and accuracy.