

# Computer Architecture Final Project: Single Cycle CPU

TA1: Chun-Yen Yao (姚鈞嚴)

r08943003@ntu.edu.tw

TA2: Ya-Cheng Ko (柯亞承)

ycko@eecs.ee.ntu.edu.tw



#### **Announcement**

- ◆ 1 ~ 3 people / group
- Please find a representative to fill out the google form before 8:00, 5/25 (Tue.)
  - https://forms.gle/MjAAk5jwLtCTjq216
- TA will help you find group members if you can not find any partner
  - ◆ Select "徴隊友" in the form
- The final member list will be announced before 23:59, 5/28 (Fri.)
  - Those who do not response will be regarded as one people in one group



#### Goal

- Implement a single cycle CPU
- Add multiplication/division unit (mulDiv) to CPU (HW2)
- Handle multi-cycle operations
- Get more familiar with assembly and Verilog
- Run your own assembly in HW1-1 on your CPU (Bonus)



### **Specification**





#### **Port Definition**

| Name        | I/O | Width | Description                                                            |
|-------------|-----|-------|------------------------------------------------------------------------|
| clk         | 1   | 1     | Positive edge-triggered clock                                          |
| rst_n       | I   | 1     | Asynchronous negative edge reset                                       |
| mem_wen_D   | 0   | 1     | 0: Read data from data/stack memory 1: Write data to data/stack memory |
| mem_addr_D  | 0   | 32    | Address of data/stack memory                                           |
| mem_wdata_D | 0   | 32    | Data written to data/stack memory                                      |
| mem_rdata_D | I   | 32    | Data read from data/stack memory                                       |
| mem_addr_l  | 0   | 32    | Address of instruction (text) memory                                   |
| mem_rdata_I | I   | 32    | Instruction read from instruction (text) memory                        |



## **Memory Layout**

- In Jupiter simulator
- Text
  - Program code
- Data
  - Variables, arrays, etc.
- Stack
  - Automatic storage





## Relate Memory to Testbench (1/4)

Instruction (text) memory





## Relate Memory to Testbench (2/4)

Data/stack memory





define SIZE\_TEXT 32

## Relate Memory to Testbench (3/4)

 Reduce size of memory blocks to improve simulation speed

 Define offset address for each memory block

 Define high impedance to avoid output conflict

Not synthesizable coding style!

```
define SIZE_DATA 32
             define SIZE_STACK 32
module memory #(
        parameter BITS = 32,
        parameter word_depth = 32
        clk,
        rst n.
        wen,
        offset
```

```
always @(*) begin
    q = {(BITS-1){1'bz}};
    for (i=0; i<word_depth; i=i+1) begin
        if (mem_addr[i] == a)
          q = mem[i];
    end
end</pre>
```



## Relate Memory to Testbench (4/4)

In Jupiter



In Testbench





#### **Architecture**

Not complete (does not include jal, jalr, ...)





### **Supporting Instructions**

- Your design must <u>at least</u> support
  - ◆ auipc, jal, jalr
  - beq, lw, sw
  - addi, slti, add, sub
  - mul
- For bonus challengers
  - srai, slli, ... (observe which instructions do you use)
- See "Instruction\_Set\_Listings.pdf" for more information of machine code



## Supplement: Instruction "auipc"

| 31                  | 12 11 | 7 6   | 0 |
|---------------------|-------|-------|---|
| imm[31:12]          | rd    | opcod | e |
| 20                  | 5     | 7     |   |
| U-immediate [31:12] | dest  | AUIPO |   |

- Add upper immediate to PC, and store the result to rd
  - auipc rd, U-immediate
- ◆ Example: auipc x5, 1 (PC = 0x0001001c)
  - $\bullet$  0x0001001c + 0x00001000 = 0x0001101c
  - Store 0x0001101c in x5



#### Supplement: Instruction "mul"

| 3 | 1 25   | 24 20                       | 0 19 15      | 14 12        | 11 7             | 6      | 0 |
|---|--------|-----------------------------|--------------|--------------|------------------|--------|---|
|   | funct7 | rs2                         | rs1          | funct3       | $^{\mathrm{rd}}$ | opcode |   |
|   | 7      | 5                           | 5            | 3            | 5                | 7      |   |
|   | MULDIV | $\operatorname{multiplier}$ | multiplicand | MUL/MULH[[S] | [U] dest         | OP     |   |

- Not included in RV32I
- Store the lower 32-b result (rs1 x rs2) to rd
- ◆ Example: mul x10, x10, x6
  - $\star$  x10 = 0x00000001, x6 = 0x00000002
  - $\bullet$  0x00000001  $\times$  0x00000002 = 0x00000002
  - Store 0x00000002 in x10
- Your mulDiv can support this instruction!



## **Multi-Cycle Operation**

- Once CPU decodes mul operation, issue valid to your mulDiv
- Once CPU receives ready, store the lower 32-b result to rd
- You might have to design FSM in your CPU





#### **Test Pattern 1: Leaf Example**

- Modified from lecture slides
- The procedure loads a,b,c,d from 0x00010064– 0x00010070, and stores the result to 0x00010074
- Run simulation:
  - \$ ncverilog Final\_tb.v +define+leaf +access+r

```
def leaf(a,b,c,d):
    f = (a+b) - (c+d)
    return f
```







#### **Test Pattern 2: Fact**

- Modified from lecture slides
- ◆ The procedure loads n from 0x0001006c, and stores the result to 0x00010070
- Run simulation:
  - \$ ncverilog Final\_tb.v +define+fact +access+r

```
def fact(n):
    if n < 1:
        return 1
    else:
        return n*fact(n-1)</pre>
```







## (Bonus) Test Pattern 3: HW1 (1/3)

Design your assembly first (hw1.s)

$$T(n) = \begin{cases} 8T\left(\left\lfloor \frac{n}{2}\right\rfloor\right) + 4n, & n \ge 2\\ 7, & n = 1 \end{cases}$$

- Example: T(7) = 572, T(77) = 2187508
- Use recursive function

```
# TOdo: define your own function in HW1

# Do NOT modify this part!!!
__start:
    la t0, n
    lw x10, 0(t0)
    jal x1,FUNCTION
    la t0, n
    sw x10, 4(t0)
    addi a0,x0,10
    ecall
```



## (Bonus) Test Pattern 3: HW1 (2/3)

- Go to simulator
- Dump code → binary file

1 0x00000317 2 0x00830067 3 0x00000297 4 0x02428293 5 0x0002a503 6 0xff5ff0ef 7 0x00000297 8 0x01428293 9 0x00a2a223 10 0x00a00513 11 0x00000073

|      | <b>D</b>    | н            |                         | dump code        |
|------|-------------|--------------|-------------------------|------------------|
| Bkpt | ext address | Machine Code | used inst.              | Source Code      |
|      | 0x00010000  | 0x00000317   | auipc x6, 0             | auipc жб, 0      |
|      | 0x00010004  | 0x00830067   | jalr x0, x6, 8          | jalr x0, x6, 8   |
|      | 0x00010008  | 0x00000297   | auipc x5, 0             | la t0, n         |
|      | 0x0001000c  | 0x02428293   | addi x5, x5, 36         | la t0, n         |
|      | 0x00010010  | 0x0002a503   | lw x10, x5, 0           | lw x10, 0(t0)    |
|      | 0x00010014  | 0xff5ff0ef   | jal x1, −12             | jal x1, FUNCTION |
|      | 0x00010018  | 0x00000297   | auipc x5, 0             | la t0, n         |
|      | 0x0001001c  | 0x01428293   | addi x5, x5, <b>2</b> 0 | la t0, n         |
|      | 0x00010020  | 0x00a2a223   | sw x5, x10, 4           | sw x10, 4(t0)    |
|      | 0x00010024  | 0x00a00513   | addi x10, x0, 10        | addi a0, x0, 10  |
|      | 0x00010028  | 0x00000073   | ecall                   | ecall            |



## (Bonus) Test Pattern 3: HW1 (3/3)

- Modify the code and save as: ./Verilog/hw1/hw1\_text.txt
- Test pattern generation: ./Verilog/hw1/hw1\_gen.py
- Run simulation:
  - \$ ncverilog Final\_tb.v +define+hw1 +access+r

#### **Delete**

| 1  | 0x00000317 |
|----|------------|
| 2  | 0x00830067 |
| 3  | 0x00000297 |
| 4  | 0x02428293 |
| 5  | 0x0002a503 |
| 6  | 0xff5ff0ef |
| 7  | 0x00000297 |
| 8  | 0x01428293 |
| 9  | 0x00a2a223 |
| 10 | 0x00a00513 |
| 11 | 0x00000013 |



| 1 | 00000317 |
|---|----------|
| 2 | 00830067 |
| 3 | 00000297 |
| 4 | 02428293 |
| 5 | 0002a503 |
| 6 | ff5ff0ef |
| 7 | 00000297 |
| 8 | 01428293 |
| 9 | 00a2a223 |
|   |          |



#### **Pattern Generation**

- Three python codes provided:
  - leaf\_gen.py
  - fact\_gen.py
  - hw1\_gen.py
- ◆ TA will change the variables in \*\_gen.py to generate new test patterns when testing your CPU design



### **Coding Style Check**

| Register Name                                       | Туре                                                   | Width                    | 1 | Bus         | I | МВ          | I | AR               | Ī | AS          | Ī | SR          | I | SS          | Ī | ST          |
|-----------------------------------------------------|--------------------------------------------------------|--------------------------|---|-------------|---|-------------|---|------------------|---|-------------|---|-------------|---|-------------|---|-------------|
| alu_in_reg<br>counter_reg<br>shreg_reg<br>state reg | Flip-flop<br>  Flip-flop<br>  Flip-flop<br>  Flip-flop | 32<br>  5<br>  64<br>  2 |   | Y<br>Y<br>Y |   | N<br>N<br>N |   | Y<br>Y<br>Y<br>Y |   | N<br>N<br>N |   | N<br>N<br>N |   | N<br>N<br>N |   | N<br>N<br>N |

- All sequential elements must be flip-flops
- Check by Design Compiler
- Command:
  - \$ dv -no\_gui
  - design\_vision> read\_verilog CHIP.v
- Exit:
  - design\_vision> exit



#### Report

- Briefly describe your CPU architecture
- Describe how you design the data path of instructions not referred in the lecture slides (jal, jalr, auipc, ...)
- Describe how you handle multi-cycle instructions (mul)
- Record total simulation time (CYCLE = 10 ns)
  - ◆ Leaf: a = 0, b = 6, c = 1, d = 5
  - ◆ Fact: n = 3
  - (Bonus) HW1: n = 7

Simulation complete via \$finish(1) at time 4795 NS + 0

- Describe your observation
- Snapshot the "Register table" in Design Compiler (p. 22)
- List a work distribution table



#### **Submission**

- ◆ Deadline: 6/15 (Tue.) 8:00 am
  - Late submission: 20 % reduction per day
- Upload Final\_group\_<group\_id>.zip to ceiba
  - Final\_group\_<group\_id>.zip
    - Final \_group\_<group\_id>/
    - Final \_group\_<group\_id>/CHIP.v
    - Final \_group\_<group\_id>/hw1.s (bonus)
    - Final \_group\_<group\_id>/hw1\_text.txt (bonus)
    - Final \_group\_<group\_id>/report.pdf

#### Example

```
[r08943003@eda1 ~]$ unzip Final_group_0.zip
Archive: Final_group_0.zip
    creating: Final_group_0/
    inflating: Final_group_0/CHIP.v
    inflating: Final_group_0/hw1.s
    inflating: Final_group_0/hw1_text.txt
    extracting: Final_group_0/report.pdf
```



#### Score

- ◆ Simulation: 70 % (+ bonus 20 %)
  - Leaf
    - Default: 15 %
    - Change test pattern: 15 %
  - Fact
    - Default: 20 %
    - > Change test pattern: 20 %
  - HW1 (bonus)
    - Default: 10 %
    - Change test pattern: 10 %
- ◆ Report: 30 %
  - Content: 20 %
  - Snapshots: 5 %
  - Work distribution: 5 %