



#### **Outline**

- Problem 1 cache
  - Specification
  - Verification
- Problem 2 booting
  - Specification
  - → Verification
- Submission rule





## Problem 1 - cache

**Specification** 







### **System Architecture**







### Specification(1/7)







### Specification(2/7)

|          | clk        | input  | 1   | clock                                                                          |
|----------|------------|--------|-----|--------------------------------------------------------------------------------|
|          | rst        | input  | 1   | reset (active high)                                                            |
|          | core_addr  | input  | 32  | address from CPU                                                               |
|          | core_req   | input  | 1   | memory access request from CPU                                                 |
| Ī        | core_write | input  | 1   | write signal from CPU                                                          |
|          | core_in    | input  | 32  | data from CPU                                                                  |
|          | core_type  | input  | 3   | write/read byte, half word, or word (listed in include/def.svh) from CPU       |
|          | D_out      | input  | 32  | data from CPU wrapper                                                          |
|          | D_wait     | input  | 1   | wait signal from CPU wrapper                                                   |
|          | core_out   | output | 32  | data to CPU                                                                    |
|          | core_wait  | output | 1   | wait signal to CPU                                                             |
| L1C_data | D_req      | output | 1   | request to CPU wrapper                                                         |
|          | D_addr     | output | 32  | address to CPU wrapper                                                         |
| 1.7      | D_write    | output | 1   | write signal to CPU wrapper                                                    |
|          | D_in       | output | 32  | write data to CPU wrapper                                                      |
|          | D_type     | output | 3   | write/read byte, half word, or word (listed in include/def.svh) to CPU wrapper |
|          | valid      | logic  | 64  | valid bits of each cache line                                                  |
|          | index      | logic  | 6   | address to tag array/data array                                                |
|          | TA_write   | logic  | 1   | write signal to tag_array                                                      |
|          | TA_read    | logic  | 1   | read signal to tag_array                                                       |
|          | TA_in      | logic  | 22  | write data to tag_array                                                        |
| 10.75    | TA_out     | logic  | 22  | read data from tag_array                                                       |
|          | DA_write   | logic  | 16  | write signal to data_array                                                     |
|          | DA_read    | logic  | 1   | read signal to data_array                                                      |
|          | DA_in      | logic  | 128 | write data to data_array                                                       |
|          | DA_out     | logic  | 128 | read data from data_array                                                      |

Inputs and outputs of module L1C\_data/L1C\_inst has declared in src/L1C\_data.sv and src/L1C\_inst.sv. You can modify input and output bit width if needed. If you do so, please modify bit width in /sva/cache\_props.sva





### Specification(3/7)







### Specification(4/7)

|          | clk        | input  | 1   | clock                                                                          |
|----------|------------|--------|-----|--------------------------------------------------------------------------------|
|          | rst        | input  | 1   | reset (active high)                                                            |
|          | core_addr  | input  | 32  | address from CPU                                                               |
|          | core_req   | input  | 1   | memory access request from CPU                                                 |
|          | core_write | input  | 1   | write signal from CPU                                                          |
|          | core_in    | input  | 32  | data from CPU                                                                  |
|          | core_type  | input  | 3   | write/read byte, half word, or word (listed in include/def.svh) from CPU       |
|          | I_out      | input  | 32  | data from CPU wrapper                                                          |
|          | I_wait     | input  | 1   | wait signal from CPU wrapper                                                   |
|          | core_out   | output | 32  | data to CPU                                                                    |
| L1C_inst | core_wait  | output | 1   | wait signal to CPU                                                             |
|          | I_req      | output | 1   | request to CPU wrapper                                                         |
|          | I_addr     | output | 32  | address to CPU wrapper                                                         |
| 1.7      | I_write    | output | 1   | write signal to CPU wrapper                                                    |
|          | I_in       | output | 32  | write data to CPU wrapper                                                      |
|          | I_type     | output | 3   | write/read byte, half word, or word (listed in include/def.svh) to CPU wrapper |
|          | valid      | logic  | 64  | valid bits of each cache line                                                  |
|          | index      | logic  | 6   | address to tag array/data array                                                |
|          | TA_write   | logic  | 1   | write signal to tag_array                                                      |
|          | TA_read    | logic  | 1   | read signal to tag_array                                                       |
| 10.00    | TA_in      | logic  | 22  | write data to tag_array                                                        |
| 10000    | TA_out     | logic  | 22  | read data from tag_array                                                       |
|          | DA_write   | logic  | 16  | write signal to data_array                                                     |
|          | DA_read    | logic  | 1   | read signal to data_array                                                      |
|          | DA_in      | logic  | 128 | write data to data_array                                                       |
|          | DA_out     | logic  | 128 | read data from data_array                                                      |

Inputs and outputs of module L1C\_data/L1C\_inst has declared in src/L1C\_data.sv and src/L1C\_inst.sv. You can modify input and output bit width if needed. If you do so, please modify bit width in /sva/cache\_props.sva







### Specification(5/7)

Table 1-1: Cache actions on different condition

| condition | core read               | core write                       |
|-----------|-------------------------|----------------------------------|
| hit       | transmit data into core | write data into cache and memory |
| miss      | read a line from memory | only write data into memory      |

Table 1-2: Array characteristics

| Туре       | Lines | Words<br>per line | Bytes per<br>word | Bits per<br>line | Writing<br>mode |
|------------|-------|-------------------|-------------------|------------------|-----------------|
| data_array | 64    | 4                 | 4                 | 128              | Byte Write      |
| tag_array  | 04    | 1                 |                   | 22               | Word Write      |



data\_array



### Specification(6/7)

Tag\_array is word write, while data\_array is byte write







### Specification(7/7)

Table 1-2: Caches specification₽

| Type₽  | Size₽                       | Line Bits₽  | Associativity₽ | Write Policy₽  |               |
|--------|-----------------------------|-------------|----------------|----------------|---------------|
| G1     | C 1 - 1 VD - 100 - D: + M - | Disset Mass | Hit₽           | write through₽ |               |
| Cache₽ | 1 KB₽                       | 128₽        | Direct Map     | miss↵          | write around₽ |

Table 1-3: Cache actions on different condition₽

| condition - | core read₽               | core write₽                      |
|-------------|--------------------------|----------------------------------|
| hit₽        | transmit data into core₽ | write data into cache and memory |
| miss₽       | read a line from memory₽ | only write data into memory₽     |





## Problem 1 - cache

Verification





### **Verification(1/2)**







### Verification(2/2)

- You can modify cache\_props.sva if needed
  - You can add some assumption to your input
  - → You can modify assertion if it doesn't inconsistent with your intention. -> need to explain in your report
- Use JasperGold to verify L1 data cache(L1C\_data.sv) by command "make cache"
- There should be no assertion violations in the result
- □ Please add performance counter in your design, to show your cache hit rate and IPC in your report.







- □ Proper explanation of your design is required for full credits.
- □ Block diagrams shall be drawn to depict your designs.
- ☐ Show the result of JasperGold with full prove, explain the assertion you modified.
- □ Show the hit rate of your instruction cache and data cache
- □ Show the IPC (instruction per cycle) of your CPU

cache hit rate 
$$= \frac{Number\ of\ cache\ hits}{Number\ of\ cache\ hits\ +\ Number\ of\ cache\ misses}$$

$$IPC = \frac{Number\ of\ instructions\ CPU\ execute}{Total\ cycles}$$





# Problem 2 - booting

Specification





### **Slave Configuration**

#### **Table 1-5: Slave configuration**

| NAME        | Number  | Start address | End address |
|-------------|---------|---------------|-------------|
| ROM         | Slave 0 | 0x0000_0000   | 0x0000_1FFF |
| IM          | Slave 1 | 0x0001_0000   | 0x0001_FFFF |
| DM          | Slave 2 | 0x0002_0000   | 0x0002_FFFF |
| sensor_ctrl | Slave 3 | 0x1000_0000   | 0x1000_03FF |
| DRAM        | Slave 4 | 0x2000_0000   | 0x201F_FFFF |

You should design all slave wrappers (except slave 3) !!





### **ROM**

|   | System signals |        |        |                             |  |  |  |  |
|---|----------------|--------|--------|-----------------------------|--|--|--|--|
|   | CK             | input  | 1      | System clock                |  |  |  |  |
| d |                |        | Memory | ports                       |  |  |  |  |
|   | DO             | output | 32     | ROM data output             |  |  |  |  |
| ľ | OE             | input  | 1      | Output enable (active high) |  |  |  |  |
|   | CS             | input  | 1      | Chip select (active high)   |  |  |  |  |
|   | A              | input  | 12     | ROM address input           |  |  |  |  |
|   |                |        | Memory | Space                       |  |  |  |  |
|   | Memory_byte0   | reg    | 8      | Size: [0:4095]              |  |  |  |  |
|   | Memory_byte1   | reg    | 8      | Size: [0:4095]              |  |  |  |  |
|   | Memory_byte2   | reg    | 8      | Size: [0:4095]              |  |  |  |  |
|   | Memory_byte3   | reg    | 8      | Size: [0:4095]              |  |  |  |  |

ROM





#### **DRAM**

|      | System signals |        |        |                            |  |  |
|------|----------------|--------|--------|----------------------------|--|--|
|      | CK             | input  | 1      | System clock               |  |  |
|      | RST            | input  | 1      | System reset (active high) |  |  |
|      |                |        | Memory | ports                      |  |  |
|      | CSn            | input  | 1      | DRAM Chip Select           |  |  |
|      | CSII           | mput   | 1      | (active low)               |  |  |
|      | WEn            | input  | 4      | DRAM Write Enable          |  |  |
|      | VVIDII         | mpat   |        | (active low)               |  |  |
|      | RASn           | input  | 1      | DRAM Row Access Strobe     |  |  |
|      |                | mpat   |        | (active low)               |  |  |
| DRAM | CASn           | input  | 1      | DRAM Column Access Strobe  |  |  |
|      |                |        |        | (active low)               |  |  |
|      | A              | input  | 11     | DRAM Address input         |  |  |
|      | D              | input  | 32     | DRAM data input            |  |  |
|      | Q              | output | 32     | DRAM data output           |  |  |
|      | VALID          | output | 1      | DRAM data output valid     |  |  |
|      |                |        | Memory | space                      |  |  |
|      | Memory_byte0   | reg    | 8      | Size: [0:2097151]          |  |  |
|      | Memory_byte1   | reg    | 8      | Size: [0:2097151]          |  |  |
|      | Memory_byte2   | reg    | 8      | Size: [0:2097151]          |  |  |
|      | Memory_byte3   | reg    | 8      | Size: [0:2097151]          |  |  |

- ★ Row address is 11-bit and column address is 10-bit
- ★ Row addr = araddr/awaddr [22:12]
- ★ Col addr = araddr/awaddr[11:2]





### **System Architecture**







# Problem 2 - booting

Verification





### **Verification(1/7)**

- prog0
  - → 測試37個instruction (助教提供)
- prog1
  - Sort algorithm of half-word
- prog2
  - Gray scale
- prog3
  - Matrix multiplication



### **Verification (2/7)**

- Booting
  - → The booting program is stored in ROM
  - Moves data from DRAM to IM and DM

```
extern unsigned int dram i start;
extern unsigned int dram i end;
extern unsigned int imem start;
extern unsigned int
                     sdata start;
extern unsigned int
                     sdata end;
extern unsigned int
                     sdata paddr start;
extern unsigned int
                     data start;
extern unsigned int
                     data end;
extern unsigned int
                     data paddr start;
```



# LPHPLMB VLSI Design LAB

### Verification (3/7)

- \_dram\_i\_start = instruction start address in DRAM.
- \_dram\_i\_end = instruction end address in DRAM.
- \_\_imem\_start = instruction start address in IM







### **Verification (4/7)**

- \_data\_start = Main\_data start address in DM.
- \_data\_end = Main\_data end address in DM.
- \_data\_paddr\_start = Main\_data start address in DRAM







### **Verification (5/7)**

- \_sdata\_start = Main\_data start address in DM.
- \_sdata\_end = Main\_data end address in DM.
- \_\_sdata\_paddr\_start = Main\_data start address in DRAM





### **Verification(6/7)**



12

1f

25

12

**1**f

25

Header (54 bytes) x = 0.11\*b + 0.59\*g + 0.3\*r= 0.11\*25 + 0.59\*1f + 0.3\*12

y = 0.11\*b + 0.59\*g + 0.3\*r

= 0.11\*25 + 0.59\*1f + 0.3\*12

Header (54 bytes)

X

binary image bmp start

M Size of BMP file (byte) The number of bits per pixel

Address a b Dump 偏移的byte數 00 (18) 00 00 00 00 00 00 0e 00 00<u>c4 %e 00 00 00</u> \$.?..?.... 00000024 00000030 00 00 00 00 00 25 1f 12 0000003c 25 1f 12 8..8..8..8.. 00000048 25 1f 12 25 1f 12 25 1f 12 25 1f 12 %..%..%.. 00000054 25 1f 12 25 1f 12 25 1f 12 25 1f 12 %..%..%.. 00000060 25 1f 12 25 1f 12 25 1f 12 25 1f 12 0000006c 25 1f 12 25 1f 12 25 1f 12 25 1f 12 00000078 25 1

BMP檔案解析參考資料:





### **Verification(7/7)**

#### Matrix Multiplication



$$\begin{pmatrix} \textit{data} \cdot 1 & \textit{data} \cdot 2 & \textit{data} \cdot 3 \\ \textit{data} \cdot 4 & \textit{data} \cdot 5 & \textit{data} \cdot 6 \end{pmatrix} \cdot \begin{pmatrix} \textit{data} \cdot 7 & \textit{data} \cdot 8 & \textit{data} \cdot 9 & \textit{data} \cdot 10 \\ \textit{data} \cdot 11 & \textit{data} \cdot 12 & \textit{data} \cdot 13 & \textit{data} \cdot 14 \\ \textit{data} \cdot 15 & \textit{data} \cdot 16 & \textit{data} \cdot 17 & \textit{data} \cdot 18 \end{pmatrix} = \begin{pmatrix} \textit{result} \cdot 1 & \textit{result} \cdot 2 & \textit{result} \cdot 3 & \textit{result} \cdot 3 \\ \textit{result} \cdot 5 & \textit{result} \cdot 6 & \textit{result} \cdot 7 & \textit{result} \cdot 8 \end{pmatrix}$$





### **Report Requirements**

- Proper explanation of your design is required for full credits.
- Block diagrams shall be drawn to depict your designs.
- Show your screenshots of the waveforms and the simulation results on the terminal for the different test cases in your report and illustrate the correctness of your results. Explain cache read hit/read miss/write hit/write miss with waveform.
- Explain your codes of program1, program2 and program3.
- Explain your codes of boot.c.
- □ Report the number of lines of your RTL code, the final results of running Superlint and 3~5 most frequent warning/errors in your code. Describe how you modify your code to comply with Superlint.





### **Report Requirements**

Report and show screenshots of your prog0 to prog3 simulation time after synthesis and total cell area of your design. 15% homework credit will be given base on your design performance & area.

Performace(P) = 
$$\sum_{i=0}^{3}$$
 simulation time of progi  
Area(A) = total cell area of your design

Credit you get 
$$=$$
  $\frac{PA \ of \ TA's \ design}{PA \ of \ your \ design} * 15$ ; max  $= 15$ , round off after the decimal

|    | Prog0 (ns) | Prog1(ns) | Prog2(ns) | Prog3(ns) | area    |
|----|------------|-----------|-----------|-----------|---------|
| TA | 458680     | 4096390   | 10990320  | 452970    | 6530417 |





### **Submission rule**





- □ 請使用附在檔案內的Submission Cover
- □ 請勿將code貼在.docx內 (program的程式可截圖 說明)
  - → 請將.sv包在壓縮檔內,不可截圖於.docx中
- 需要Summary及Lessons learned(Summary table請放在第二頁,清楚列出有完成以及沒完成的部分)
- □ 若兩人為一組,須寫出貢獻度(貢獻度請放第二 頁)
  - Ex: A(N26071234) 55%, B(N26075678) 45%
  - → Total 100%
  - → 自己一組則不用寫





### **Specification**

□ Module name須符合下表要求

|            | Name                  |                    |              |  |  |  |
|------------|-----------------------|--------------------|--------------|--|--|--|
| Category   | File                  | Module             | Instance     |  |  |  |
| RTL        | top.sv                | top                | TOP          |  |  |  |
| Gate-Level | top_syn.v             | top                | TOP          |  |  |  |
| RTL        | L1C_inst.sv           | L1C_inst           | L1CI         |  |  |  |
| RTL        | L1C_data.sv           | L1C_data           | L1CD         |  |  |  |
| RTL        | SRAM_wrapper.sv       | SRAM_wrapper       | IM1          |  |  |  |
| RTL        | SRAM_wrapper.sv       | SRAM_wrapper       | DM1          |  |  |  |
| RTL        | SRAM_rtl.sv           | SRAM               | i_SRAM       |  |  |  |
| RTL        | tag_array_wrapper.sv  | tag_array_wrapper  | TA           |  |  |  |
| RTL        | tag_array_rtl.sv      | tag_array          | i_tag_array  |  |  |  |
| RTL        | data_array_wrapper.sv | data_array_wrapper | DA           |  |  |  |
| RTL        | data_array_rtl.sv     | data_array         | i_data_array |  |  |  |
| Behavior   | ROM.v                 | ROM                | i_ROM        |  |  |  |
| Behavior   | DRAM.v                | DRAM               | i_DRAM       |  |  |  |

□ 需按照要求命名,以免testbench抓不到正確的名稱



- □ 依照檔案結構壓縮成 ".tar" 格式
  - → 在Homework主資料夾(N260XXXXX)使用make tar產生的tar檔即可符合要求
- □ 檔案結構請依照作業說明
- □ 請勿附上檔案結構內未要求繳交的檔案
  - → 在Homework主資料夾(N260XXXXX)使用make clean即可刪除不必要的檔案
- □ 請務必確認繳交檔案可以在SoC實驗室的工作站下compile,且功能正常
- □ 無法compile將直接以0分計算
- □ 請勿使用generator產生code再修改
- □ 禁止抄襲





- □ 一組只需一個人上傳作業到Moodle
  - → 兩人以上都上傳會斟酌扣分
- □ 壓縮檔、主資料夾名稱、Report名稱、StudentID 檔案內的學號都要為上傳者的學號,其他人則在 Submission Cover內寫上自己的學號。
  - → Ex: A(N26101234)負責上傳,組員為B(N26105678)
  - → N26101234.tar (壓縮檔) N26101234 (主資料夾) N26101234.docx (Report, Cover寫上兩者的學號)





### 繳交期限

- □ 2021/12/01 (三) 14:00前上傳
  - → 不接受遲交,請務必注意時間
  - → Moodle只會留存你最後一次上傳的檔案,檔名只要是「N260XXXXX.tar」即可,不需要加上版本號





# Thanks for your participation and attendance!!



