# SOC Design

# Lab 4-1 Execute Code in User Memory

Group no: 5

# Members:

M11207415 陳謝鎧

M11207002 陳泊佑

M11107426 廖千慧

M11207328 吳奕帆

# Prepare firmware code & RTL:

Generate data in header file – fir.h:

➤ Define taps parameters and inputsignal as lab3 in header file.

```
// file's name: fie.v

#ifndef __FIR_H__
#define __FIR_H__

#define N 11

int taps[N] = {0,-10,-9,23,56,63,56,23,-9,-10,0};
int inputbuffer[N];
int inputsignal[N] = {1,2,3,4,5,6,7,8,9,10,11};
int outputsignal[N];

#endif
```

#### C code – fir.c:

> Implement FIR function in c code.

```
// file's name : fir.c
#include "fir.h"
void __attribute__((section(".mprjram"))) initfir()
    // initial your fir
    for (int i = 0; i < N; i++)
        inputbuffer[i] = 0;
        outputsignal[i] = 0;
int *__attribute__((section(".mprjram"))) fir()
    initfir();
    // write down your fir
    for (int i = 0; i < N; i++)
        int data_get = inputsignal[i]; // get data from axi-stream
        inputbuffer[i] = data_get;
                                       // store data to bram
        for (int j = 0; j <= i; j++)
            outputsignal[i] += inputbuffer[j] * taps[i - j];
    return outputsignal;
```

### Firmware management in main(): (Already designed)

➤ In testbench/counter\_la\_fir.c, parameter reg\_mprj\_xfer will be initially to 1, and will not start fir until the external signal is given to 0.

```
reg_mprj_io_31 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_30 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_29 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_28 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_27 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_26 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_25 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_24 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_23 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_22 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_21 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_20 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_19 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_18 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_17 = GPIO_MODE_MGMT_STD_OUTPUT;
reg_mprj_io_16 = GPIO_MODE_MGMT_STD_OUTPUT;
// Now, apply the configuration
reg_mprj_xfer = 1;
while (reg_mprj_xfer == 1);
```

### Linker for address arrangement: (Already designed)

In firmware/section.ids, mpjram is our bram, it's original address is at 0x38000000, and it's size is 4 KB.

### Design BRAM in user\_project

Estimated the required size of RAM

```
// file's name : bram.v
module bram(
        WEO,
        ENO.
        Do0,
         input
                                          [3:0]
                                                            WEO;
                         wire
         input
                         wire
wire
         input
         input
         output
                                           [31:0]
[31:0]
         input
        // 16KB (// 4 kB)
// 32 bit = 4 byte -> 16KB = 4 byte * 2 ^ N
parameter N = 12;
(* ram_style = "block" *) reg [31:0] RAM[0:2**N-1];
        always @(posedge CLK)
                 iys @(poseage CLN,
if(EN0) begin
    Do0 <= RAM[A0[N-1:0]];
    if(WE0[0]) RAM[A0[N-1:0]][7:0] <= Di0[7:0];
    if(WE0[1]) RAM[A0[N-1:0]][15:8] <= Di0[15:8];
    if(WE0[2]) RAM[A0[N-1:0]][23:16] <= Di0[23:16];
    if(WE0[31) RAM[A0[N-1:0]][31:24] <= Di0[31:24];</pre>
                         Do0 <= 32'b0;
endmodule
```

➤ Design the controller connected with wishbone bus and ack response need to after Delay (10 delays)

```
// file's name : user_proj_example.counter.v
                                                                                 // input to ram
                                                                                 wire ram_en;
`define MPRJ_IO_PADS_1 19 /* number of user GPIO pads on user1 side */
                                                                                 wire [3:0] ram we;
define MPRJ_IO_PADS_2 19 /* number of user GPIO pads on user2 side */
                                                                                 wire [31:0] ram_adr;
'define MPRJ_IO_PADS ('MPRJ_IO_PADS_1 + 'MPRJ_IO_PADS_2)
                                                                                 wire [31:0] ram_data;
`default_nettype wire
                                                                                 bram user_bram (
                                                                                     .CLK(clk),
module user_proj_example #(
                                                                                     .WEO(ram_we),
   parameter BITS = 32,
                                                                                     .ENO(ram_en),
   parameter DELAYS=10
                                                                                     .DiO(wbs_dat_i),
                                                                                     .DoO(wbs_dat_o),
`ifdef USE_POWER_PINS
   inout vccd1, // User area 1 1.8V supply
                                                                                     .AO(wbs_adr_i)
   inout vssd1, // User area 1 digital ground
                                                                                 // write data to on_chip ram only when request_sig assert
   // Wishbone Slave ports (WB MI A)
                                                                                 wire request_sig;
   input wb_clk_i,
                                                                                 assign request_sig = wbs_cyc_i & wbs_stb_i;
   input wb_rst_i,
                                                                                 assign ram_adr = (request_sig==1'b1)? wbs_adr_i : 32'b0;
   input wbs_stb_i,
                                                                                 assign ram_data = (request_sig==1'b1)? wbs_dat_i : 32'b0;
   input wbs_cyc_i,
                                                                                 assign ram_we = (request_sig==1'b1)? ({4{wbs_we_i}} & wbs_sel_i) : 4'b0;
   input wbs_we_i,
                                                                                 assign ram_en = (request_sig==1'b1)? (wbs_cyc_i & wbs_sel_i): 1'b0;
   input [3:0] wbs_sel_i,
    input [31:0] wbs_dat_i,
                                                                                 reg wbs ack o:
   input [31:0] wbs_adr_i,
                                                                                 reg [3:0] delay_cnt; // delay = 10 (DELAYS) < 2^4
   output wbs_ack_o,
                                                                                 always @ (posedge clk) begin
   output [31:0] wbs_dat_o,
                                                                                     if (rst) begin
                                                                                         wbs_ack_o <= 0;
   // Logic Analyzer Signals
                                                                                         delay_cnt <= 0;</pre>
   input [127:0] la_data_in,
   output [127:0] la_data_out,
   input [127:0] la_oenb,
                                                                                     else if (request_sig == 1'b1) begin
                                                                                         if (delay_cnt == DELAYS) begin
                                                                                            wbs_ack_o <= 1'b1;
   input [`MPRJ_IO_PADS-1:0] io_in,
                                                                                            delay_cnt <= 0;</pre>
   output [`MPRJ_IO_PADS-1:0] io_out,
                                                                                         end
   output ['MPRJ_IO_PADS-1:0] io_oeb,
                                                                                         else begin
                                                                                            wbs_ack_o <= 1'b0;
   // IRQ
                                                                                            delay_cnt <= delay_cnt + 1;</pre>
   output [2:0] irq
                                                                                     end
   wire clk;
                                                                                     else
   wire rst;
                                                                                          wbs_ack_o <= 1'b0;
   assign clk = wb_clk_i;
   assign rst = wb_rst_i;
                                                                             endmodule
   //wire [`MPRJ_IO_PADS-1:0] io_in;
   //wire [`MPRJ_IO_PADS-1:0] io_out;
                                                                             `default_nettype wire
   //wire [`MPRJ_IO_PADS-1:0] io_oeb;
```

### Compilation

Run\_clean

```
// file's name: run_clean
rm -rf ./gdb.debug ./gdbwave.debug
rm -f *.vcd *.hex
rm -f *.s *.o *.i *.out *.map
```

#### > Run sim

```
// file's name: run_sim
rm -f counter_la_fir.hex
// Given script to compile
riscv32-unknown-elf-gcc -Wl,--no-warn-rwx-segments -g \
    --save-temps \
    -Xlinker -Map=output.map \
    -I../../firmware \
    -march=rv32i -mabi=ilp32 -D__vexriscv__ \
    -Wl,-Bstatic,-T,../../firmware/sections.lds,--strip-discarded \
    -ffreestanding -nostartfiles -o counter_la_fir.elf ../../firmware/crt0_vex.S ../../firmware/isr.c fir.c counter_la_fir.c
riscv32-unknown-elf-objcopy -O verilog counter_la_fir.elf counter_la_fir.hex // Transfer .elf to .hex
riscv32-unknown-elf-objdump -D counter_la_fir.elf > counter_la_fir.out // Export assembly code for debugging
# to fix flash base address
sed -ie 's/@10/@00/g' counter_la_fir.hex
iverilog -Ttyp -DFUNCTIONAL -DSIM -DUNIT_DELAY=#1 \
   -f./include.rtl.list -o counter_la_fir.vvp counter_la_fir_tb.v
vvp counter la fir.vvp
rm -f counter_la_fir.vvp counter_la_fir.elf counter_la_fir.hexe
```

#### Compilation

```
ubuntu@ubuntu2004:~/lab-exmem_fir_Emma/testbench/counter_la_fir$ source run_clean
ubuntu@ubuntu2004:~/lab-exmem_fir_Emma/testbench/counter_la_fir$ source run_sim
Reading counter_la_fir.hex
counter_la_fir.hex loaded into memory
Memory 5 bytes = 0x6f 0x00 0x00 0x0b 0x13
VCD info: dumpfile counter_la_fir.vcd opened for output.
LA Test 1 started
LA Test 2 passed
Ubuntually a fire formation of the fire format
```

# Synthesis & Verification

➤ Waveform – Wishbone's ack will have a 10cycle delay when writing.



#### > Timing report

```
Path Group: (none)
From Clock:
To Clock:
Max Delay Paths
                                          inf
user_bram/RAM_reg_2/CLKBWRCLK
(rising edge-triggered cell RAMB36E2)
wbs_dat_o[17]
(output port)
(none)
Max at $1ow Process Corner
5.782ns (logic 4.091ns (70.752%) route 1.691ns (29.248%))
2 (OBUF=1 RAMB36E2=1)
Slack:
Source:
  Destination:
  Path Group:
Path Type:
Data Path Delay:
Logic Levels:
                                         Delay type Incr(ns) Path(ns) Netlist Resource(s)
    Location
                                         inf
user_bram/RAM_reg_0/CLKBWRCLK
(rising edge-triggered cell RAMB36E2)
wbs_dat_o[1]
(output port)
(none)
Max at Slow Process Corner
5.782ns (logic 4.091ns (70.752%) route 1.691ns (29.248%))
2 (OBUF=1 RAMB36E2=1)

TOURISS Path(ns) Netlist Resour
  Destination:
     Location
                                                                                     Incr(ns) Path(ns)
                                         user_bram/RAM_reg_0/CLKBWRCLK
                                                                                                                         user_bram/RAM_reg_0/DOUTBDOUT[1]
wbs_dat_0_OBUF[1]
wbs_dat_0_OBUF[1]_inst/I
wbs_dat_0_OBUF[1]_inst/O
wbs_dat_0[1]
wbs_dat_0[1]
wbs_dat_0[1]
```

#### > Synthesis report

Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

| Tool Version : Vivado v.2022.1 (lin64) Build 3526262 Mon Apr 18 15:47:01 MDT 2022 |
| Date : Wed Nov 8 12:26:34 2023 |
| Host : ubuntu2004 running 64-bit Ubuntu 20.04.4 LTS |
| Command : report\_utilization = file /home/ubuntu/Desktop/utilization\_rpt.txt -name utilization\_2 |
| Device : xck26-sfvc784-2LV-c |
| Speed File : -2LV |
| Design State : Synthesized

Utilization Design Information

1. CLB Logic

| Site Type             | Used | Fixed | Prohibited | Available | Util% |
|-----------------------|------|-------|------------|-----------|-------|
| CLB LUTs*             | 7    | 0     | 0          | 117120    | <0.01 |
| LUT as Logic          | 7    | 0     | 0          | 117120    | <0.01 |
| LUT as Memory         | 0    | 0     | 0          | 57600     | 0.00  |
| CLB Registers         | 5    | 0     | 0          | 234240    | <0.01 |
| Register as Flip Flop | 5    | 0     | 0          | 234240    | <0.01 |
| Register as Latch     | 0    | 0     | 0          | 234240    | 0.00  |
| CARRY8                | 0    | 0     | 0          | 14640     | 0.00  |
| F7 Muxes              | 0    | 0     | 0          | 58560     | 0.00  |
| F8 Muxes              | 0    | 0     | 0          | 29280     | 0.00  |
| F9 Muxes              | 0    | 0     | 0          | 14640     | 0.00  |
|                       |      |       |            |           |       |

\* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt\_design after synthesis, if not already completed, for a more realistic count.

#### 1.1 Summary of Registers by Type

| Total | Clock Enable | Synchronous | Asynchronous     |  |
|-------|--------------|-------------|------------------|--|
| 0     | _            | -           | 1                |  |
| 0 1   | _            | -           | Set  <br>  Reset |  |
| 0 1   | _            | Set         | - Keset          |  |
| 0     |              | Reset       | - 1              |  |
| 0     | Yes          | - 1         | - 1              |  |
| 0     | Yes          | - 1         | Set              |  |
| 0 1   | Yes          | -           | Reset            |  |
| 0     | Yes          | Set         | -                |  |
| 5     | Yes I        | Reset       | -                |  |

#### 2. BLOCKRAM

| +              | +    |       |                |                  |
|----------------|------|-------|----------------|------------------|
| Site Type      | Used | Fixed | Prohibited   A | vailable   Util% |
| +              | +    | +     |                |                  |
| Block RAM Tile | 4    | 0     | 0              | 144   2.78       |
| RAMB36/FIFO*   | 4    | 0     | 0              | 144   2.78       |
| RAMB36E2 only  | 4    | 1     | 1              |                  |
| RAMB18         | 0    | 0     | 0              | 288   0.00       |
| I URAM         | 0    | 0     | 0              | 64   0.00        |

#### 3. ARITHMETIC

|   |    |   |   | vailable   Util% |
|---|----|---|---|------------------|
| + | -+ | + | + | <br>+            |
|   |    |   |   | 1248   0.00      |
| + | -+ | + | + | <br>+            |

| +      |      | +    | +     | +          | -+        | -+     |
|--------|------|------|-------|------------|-----------|--------|
| Site   | Туре | Used | Fixed | Prohibited | Available | Util%  |
| +      |      | +    | +     | +          | -+        | -+     |
| Bonded | IOB  | 293  | 0     | 0          | 189       | 155.03 |
|        |      |      |       |            |           |        |

#### 5. CLOCK

| Site Type                            | Used | Fixed | Prohibited | Available          | Util% |
|--------------------------------------|------|-------|------------|--------------------|-------|
| GLOBAL CLOCK BUFFERS<br>  BUFGCE     | 1    | 0     | [ 0<br>[   | 352<br>112         | 0.28  |
| BUFGCE_DIV<br>  BUFG_GT<br>  BUFG_PS | 0 0  | 0     | [ 0<br>[ 0 | 16<br>  96<br>  96 | 0.00  |
| BUFGCTRL*                            | 0    | 0     | 0          | 32                 | 0.00  |
| MMCM                                 | 0    | 0     | 0          | 4                  | 0.00  |

\* Note: Each used BUFGCTRL counts as two GLOBAL CLOCK BUFFERs. This table does not include global clocking resources, only buffer cell usage. See the clock Utilization Report (report\_clock\_utilization) for detailed accounting of global clocking resource availability.

#### 6. ADVANCED

| Site Type                     | Used | Fixed | Prohibited | Available | Util% |
|-------------------------------|------|-------|------------|-----------|-------|
| GTHE4_CHANNEL                 | 0    | 0     | 0          | 4         | 0.00  |
| GTHE4_COMMON<br>  OBUFDS GTE4 | 1 0  | 0 0   | 0          | 1         | 0.00  |
| OBUFDS_GTE4 ADV               | 0    | 0     | 0          | 2         | 0.00  |
| PCIE40E4                      | 0    | 0     | 0          | 2         | 0.00  |
| PS8                           | 0    | 0     | 0          | 1         | 0.00  |
| SYSMONE4                      | 0    | 0     | 0          | 1         | 0.00  |
| I ACO                         | 0    | 1 0   | 0          | 1         | 0.00  |

#### 7. CONFIGURATION

| +           | +    |       | ·          | +         |       |
|-------------|------|-------|------------|-----------|-------|
| Site Type   | Used | Fixed | Prohibited | Available | Util% |
| +           | +    |       |            | +         | +     |
| BSCANE2     | 0    | 0     | 0          | 4         | 0.00  |
| DNA_PORTE2  | 0    | 0     | 0          | 1 1       | 0.00  |
| EFUSE_USR   | 0    | 0     | 0          | 1 1       | 0.00  |
| FRAME_ECCE4 | 0    | 0     | 0          | 1         | 0.00  |
| ICAPE3      | 0    | 0     | 0          | 2         | 0.00  |
| MASTER_JTAG | 0    | 0     | 0          | 1 1       | 0.00  |
| STARTUPE3   | 0    | 0     | 0          | 1         | 0.00  |
| +           | +    |       |            | +         |       |

#### 8. Primitives

| Ref Name | Used  | Functional Category |
|----------|-------|---------------------|
| +        |       |                     |
| I OBUFT  | 207 1 | 1/0                 |
| INBUF    | 53    | 1/0                 |
| IBUFCTRL | 53    | Others              |
| I OBUF   | 33 i  | 1/0                 |
| LUT4     | 7     | CLB                 |
| I FDRE   | 5 1   | Register            |
| RAMB36E2 | 4     | BLOCKRAM            |
| LUT3     | 3     | CLB                 |
| LUT6     | 1 1   | CLB                 |
| LUT2     | 1 1   | CLB                 |
| BUFGCE   | 1 1   | Clock               |
| +        |       |                     |

# 心得:

在 Lab4-0 中我們練習了 Caravel SOC simulation,在 Lab4-1 中學習設計能夠在 Caravel RISC-V core 中跑的 firmware code (exemem-fir),以及在 user project 中整合 RAM,使其能和 firmware 及 testbench 交互作用。

在設計過程中的 compilation 時曾發生過 "Time out, Test LA failed (RTL)",在 github 上也看到有人發問同樣的問題,我參考了底下留言的建議,去檢查確實我們的 Bram Module 接腳是否有正確的與 Wishbone 連接,確實沒有,因此我們很快找出問題點並將其修正。

之後在 vivado 中跑 bram.v 和 user\_proj\_example.counter.v 的合成驗證時也出現了 error,error 內容為 " use of undefined macro 'MPRJ\_IO\_PADS' in user\_proj\_example.counter.v."以及 "net type must be explicitly specified for 'wb\_clk\_i' when defult\_nettype is none."同樣的在 github 上已經有同學遇到了此問題,也有同學很熱心且詳細的在底下解答,由於 MPRJ\_IO\_PADS 這個參數是定義於 rtl/header/defines.v 裡面,但我們在合成時只使用了 bram.v和 user\_proj\_example.counter.v,因此我們需自行定義 MPRJ\_IO\_PADS 參數於 user\_proj\_example.counter.v 中。而第二個 error 是由於 'default nettype none 造成的,因此我們將 none 改為 wire 即可順利的合成。