# Computer Organization Lab 4: Pipelined CPU

**Due: 2021/6/6** 

# 1. Goal

Based on your Lab 3 CPU design, implement a pipelined CPU.

### 2. Homework Requirement

- a. Please use ModelSim or Xilinx as your HDL simulator.
- b. Please attach student IDs as comments at the top of each file.
- c. Please zip the Verilog files and the report and name it as "ID.zip".
- d. You must use the supplied Reg\_File.v
- e. In the top module, please change N to the value which is total lengths of input signal (include data and control) of pipeline register.

Pipe\_Reg #(.size(N)) ID\_EX

(google → verilog+parameter / parameterized modules / 參數式模組)

- f. Your CPU needs to support the following instructions: (90%)
  - i. ADD
  - ii. ADDI
  - iii. SUB
    - iv. AND
    - v. OR
    - vi. SLT
  - vii. SLTI
- viii. LW
  - ix. SW
  - x. BEQ
  - xi. MULT

mult rd, rs, rt; // rd=rs\*rt

| 0 | rs | rt | rd | 0 | 24 |
|---|----|----|----|---|----|
|---|----|----|----|---|----|

g. Testbench ("CO\_P4\_test\_1.txt"):Use this testbench to test the basic instructions:

```
begin:
                                      // a = 3
addi
              $1,$0,3;
                                      // b = 4
addi
              $2,$0,4;
                                      // c = 1
              $3,$0,1;
addi
                                      // A[1] = 3
              $1,4($0);
sw
                                      // $4 = 2*a
              $4,$1,$1;
add
              $6,$1,$2;
                                      // e = a |
or
                                                     b
                                      // f = a \&
              $7,$1,$3;
and
                                                     c
                                      // d = 2*a
sub
              $5,$4,$2;
                                                     - b
              $8,$1,$2;
                                      // g = a <
slt
                                                     b
              $1,$2,begin
beq
lw
              $10,4($0);
                                      // i = A[1]
```

#### h. Bonus: Answer the question below and write it on your report. (20%)

Consider "CO\_P4\_test\_2.txt", try to solve the data hazard of I1/I2, I5/I6, and I8/I9 data dependency. Just modify the machine code of the testbench and test it on your pipeline CPU. (Write down the machine code and show the execution result in your report.)

| I1:  | addi | <b>\$1</b> ,\$0,16 |
|------|------|--------------------|
| I2:  | addi | \$2,\$1,4          |
| I3:  | addi | \$3,\$0,8          |
| I4:  | sw   | \$1,4(\$0)         |
| I5:  | lw   | <b>\$4</b> ,4(\$0) |
| I6:  | sub  | \$5,\$4,\$3        |
| I7:  | add  | \$6,\$3,\$1        |
| I8:  | addi | <b>\$7</b> ,\$1,10 |
| I9:  | and  | \$8,\$7,\$3        |
| I10: | addi | \$9,\$0,100        |

Hint: You may (1) insert NOP or (2) reorder instructions.

#### 3. Architecture Diagram



You can modify this design arbitrarily. For example, the logic of the **Mux3** is a little different from our expectation. If the **MemtoReg** control signal is 0, it selects the output from the memory. I think that it is much more intuitive if it selects the output from the memory if **MemtoReg** is 1. As a result, you can add an inverter, revise the logic in the decoder, or swap the multiplexer inputs whenever you need. By the way, you can write down the modifications of your design in your report.

# 4. Report

- a. Your Architecture
- b. Hardware Module Analysis
- c. Problems You Met and Solutions
- d. Result
- e. Summary

#### 5. Grade

- a. **Total:** 120 points (plagiarism will get 0 point), 90+10+20
- b. **Report:** 10 points (please use **pdf format**)
- c. **Late submission:** Score \* 0.8 before 6/13. After 6/13, you will get 0.

#### 6. Q&A

If you have any question, it is recommended to ask in the facebook discussion forum.