# **HW2 Solution**

#### Note to Students:

If you have a different solution to a particular problem or if you have other things that you would like the TAs to be aware of during grading, please contact the corresponding TAs below: Aashish will grade P1 (and extra), Paris will grade P2, and Tingwei will grade P3.

### P1.



Nop (5 pts)

nop (2 no-ops to wait until writeback stage finish)

**ADD** 

Nop

Nop (5 pts)

**XOR** 

SUB 1.3)

**ADD** 

**XOR** 

There is no need to add no-op into these instructions since all dependencies can be solved properly by full forwarding. (10 pts)

#### 1.4)

Timeline without any forwarding:

| 1  | 2   | 3   | 4   | 5   | 6   | 7   | 8  | 9  | 10  | 11 |
|----|-----|-----|-----|-----|-----|-----|----|----|-----|----|
| IF | ID  | EX  | MEM | WB  |     |     |    |    |     |    |
|    | Nop | 0   | 0   | 0   | 0   |     |    |    |     |    |
|    |     | Nop | 0   | 0   | 0   | 0   |    |    |     |    |
|    |     |     | IF  | ID  | EX  | MEM | WB |    |     |    |
|    |     |     |     | Nop | 0   | 0   | 0  | 0  |     |    |
|    |     |     |     |     | Nop | 0   | 0  | 0  | 0   |    |
|    |     |     |     |     |     | IF  | ID | EX | MEM | WB |

Timeline with full forwarding:

| 1  | 2  | 3  | 4   | 5   | 6   | 7  |
|----|----|----|-----|-----|-----|----|
| IF | ID | EX | MEM | WB  |     |    |
|    | IF | ID | EX  | MEM | WB  |    |
|    |    | IF | ID  | EX  | MEM | WB |

Clock Time without forwarding: 11 cycles x 250 = 2750 ps (3 pts) Clock Time with full forwarding: 7 cycles x 300 = 2100 ps (2 pts)

Speedup of B over A is simply T\_A / T\_B.

Therefore, the speedup is 1.31 (5 pts)

# P2.

# 2.1) **Table (5 pts)**

|    |    | <u> </u> |     |    |     |     |    |    |     |     |    |
|----|----|----------|-----|----|-----|-----|----|----|-----|-----|----|
| 1  | 2  | 3        | 4   | 5  | 6   | 7   | 8  | 9  | 10  | 11  | 12 |
| IF | ID | EX       | MEM | WB |     |     |    |    |     |     |    |
|    | 0  | IF       | ID  | EX | MEM | WB  |    |    |     |     |    |
|    |    |          | IF  | ID | EX  | MEM | WB |    |     |     |    |
|    |    |          |     | 0  | 0   | IF  | ID | EX | MEM | WB  |    |
|    |    |          |     |    |     |     | IF | ID | EX  | MEM | WB |

# 12 cycles (5 pts)

### 2.2) **Table (3 pts)**

|    | (- 1 |    |     |    |     |     |    |     |     |    |
|----|------|----|-----|----|-----|-----|----|-----|-----|----|
| 1  | 2    | 3  | 4   | 5  | 6   | 7   | 8  | 9   | 10  | 11 |
| IF | ID   | EX | MEM | WB |     |     |    |     |     |    |
|    | 0    | IF | ID  | EX | MEM | WB  |    |     |     |    |
|    |      |    | IF  | ID | EX  | MEM | WB |     |     |    |
|    |      |    |     | 0  | IF  | ID  | EX | MEM | WB  |    |
|    |      |    |     |    |     | IF  | ID | EX  | MEM | WB |

11 cycles (2 pts)

For Branch determined in EX stage,

The time needed to complete above instructions is 12 \* 280 ps = 3360 ps;

For Branch determined in ID stage,

The time needed to complete above instructions is 11 \* 280 ps = **3080 ps**;

#### So the speedup should be 12/11 = 1.091 (5 pts)

2.3) The clock cycle time for the second one need to be changed to 200\*1.5 = **300 ps (2 pts)**For Branch determined in EX stage,

The time needed to complete above instructions is 12 \* 280 ps = **3360 ps**; **(1 pts)** For Branch determined in ID stage,

The time needed to complete above instructions is 11 \* 300 ps = **3300 ps**; **(2 pts)** 

3360/3300 = 1.018. (5 pts)

#### 2.4) **Table (5 pts)**

| 1  | 2  | 3  | 4   | 5  | 6   | 7  | 8  | 9   | 10 | 11  | 12  | 13 |
|----|----|----|-----|----|-----|----|----|-----|----|-----|-----|----|
| IF | ID | EX | MEM | WB |     |    |    |     |    |     |     |    |
|    | 0  | IF | ID  | EX | MEM | WB |    |     |    |     |     |    |
|    |    |    | 0   | 0  | IF  | ID | EX | MEM | WB |     |     |    |
|    |    |    |     |    |     | 0  | IF | ID  | EX | MEM | WB  |    |
|    |    |    |     |    |     |    |    | IF  | ID | EX  | MEM | WB |

13 cycles (5 pts)

### P3.

#### **Explanation:**

- 1. With pipeline architecture, each instruction needs 1 cycle to complete.
- **2.** Since there is full-forwarding path in the pipeline, load-use cases need additional 1-cycle latency.
- **3.** Branch needs data in EX, and the result is known in MEM. It takes 3 extra cycles to know if the pipeline needs to be flushed.
- 3.1) On a match

SEARCH: LW R5,0(R3) (1)

BNE R5,R2,NOMATCH (1+1stall)

ADDI R1,R1,#1 (1)

NOMATCH: ADDI R3,R3,#4 (1)

BNE R4,R3,SEARCH (1+3 flushes)

### 9 cycles (5 pts)

| LW                | IF | ID  | EX | MEM | WB |     |     |     |     |    |   |   |   |
|-------------------|----|-----|----|-----|----|-----|-----|-----|-----|----|---|---|---|
|                   |    | Nop | 0  | 0   | 0  | 0   |     |     |     |    |   |   |   |
| BNE               |    |     | IF | ID  | EX | МЕМ | WB  |     |     |    |   |   |   |
| ADDI              |    |     |    | IF  | ID | EX  | MEM | WB  |     |    |   |   |   |
| ADDI              |    |     |    |     | IF | ID  | EX  | MEM | WB  |    |   |   |   |
| BNE               |    |     |    |     |    | IF  | ID  | EX  | MEM | WB |   |   |   |
|                   |    |     |    |     |    |     | Nop | 0   | 0   | 0  | 0 |   |   |
|                   |    |     |    |     |    |     |     | Nop | 0   | 0  | 0 | 0 |   |
|                   |    |     |    |     |    |     |     |     | Nop | 0  | 0 | 0 | 0 |
| Next<br>Iteration |    |     |    |     |    |     |     |     |     | IF |   |   |   |

# 3.2) On no match

SEARCH: LW R5,0(R3) (1)

BNE R5,R2,NOMATCH (1+1stall before + 3 flushes after)

NOMATCH: ADDI R3,R3,#4 (1)

BNE R4,R3,SEARCH (1+3 flushes)

### 11 cycles (5 pts)

|      | <b>-</b> | 3 (3 pt. | -, |     |     |     |    |    |    |         |         |    |  |
|------|----------|----------|----|-----|-----|-----|----|----|----|---------|---------|----|--|
| LW   | IF       | ID       | EX | MEM | WB  |     |    |    |    |         |         |    |  |
|      |          | Nop      | 0  | 0   | 0   | 0   |    |    |    |         |         |    |  |
| BNE  |          |          | IF | ID  | EX  | MEM | WB |    |    |         |         |    |  |
|      |          |          |    | Nop | 0   | 0   | 0  | 0  |    |         |         |    |  |
|      |          |          |    |     | Nop | 0   | 0  | 0  | 0  |         |         |    |  |
|      |          |          |    |     |     | Nop | 0  | 0  | 0  | 0       |         |    |  |
| ADDI |          |          |    |     |     |     | IF | ID | EX | ME<br>M | WB      |    |  |
| BNE  |          |          |    |     |     |     |    | IF | ID | EX      | ME<br>M | WB |  |

|                   |  |  |  |  | Nop | 0   | 0   | 0  | 0 |
|-------------------|--|--|--|--|-----|-----|-----|----|---|
|                   |  |  |  |  |     | Nop | 0   | 0  | 0 |
|                   |  |  |  |  |     |     | Nop | 0  | 0 |
| Next<br>Iteration |  |  |  |  |     |     |     | IF |   |

For 3.3) and 3.4), When the Branch result is determined in the ID stage (data is also needed in ID), it takes 1 extra cycle to know if the pipeline needs to be flushed.

3.3) On a match

SEARCH: LW R5,0(R3) (1)

BNE R5,R2,NOMATCH (1+2stall)

ADDI R1,R1,#1 (1)

NOMATCH: ADDI R3,R3,#4 (1)

BNE R4,R3,SEARCH (1+1stall + 1 flushes)

9 cycles (5 pts)

|                   | yolos | (5 pts | ')  |     |    |    |     |     |     |    |     |    |
|-------------------|-------|--------|-----|-----|----|----|-----|-----|-----|----|-----|----|
| LW                | IF    | ID     | EX  | МЕМ | WB |    |     |     |     |    |     |    |
|                   |       | Nop    | 0   | 0   | 0  | 0  |     |     |     |    |     |    |
|                   |       |        | Nop | 0   | 0  | 0  | 0   |     |     |    |     |    |
| BNE               |       |        |     | IF  | ID | EX | MEM | WB  |     |    |     |    |
| ADDI              |       |        |     |     | IF | ID | EX  | MEM | WB  |    |     |    |
| ADDI              |       |        |     |     |    | IF | ID  | EX  | MEM | WB |     |    |
|                   |       |        |     |     |    |    | Nop | 0   | 0   | 0  | 0   |    |
| BNE               |       |        |     |     |    |    |     | IF  | ID  | EX | MEM | WB |
|                   |       |        |     |     |    |    |     |     | Nop | 0  | 0   | 0  |
| Next<br>Iteration |       |        |     |     |    |    |     |     |     | IF |     |    |

### 3.4) On no match

SEARCH: LW R5,0(R3) (1)

BNE R5,R2,NOMATCH (1+2stall+1 flushes)

NOMATCH: ADDI R3,R3,#4 (1)

BNE R4,R3,SEARCH (1+1stall + 1 flushes)

9 cycles (5 pts)

| LW                | IF | ID  | EX  | MEM | WB  |    |     |    |     |    |     |    |
|-------------------|----|-----|-----|-----|-----|----|-----|----|-----|----|-----|----|
|                   |    | Nop | 0   | 0   | 0   | 0  |     |    |     |    |     |    |
|                   |    |     | Nop | 0   | 0   | 0  | 0   |    |     |    |     |    |
| BNE               |    |     |     | IF  | ID  | EX | MEM | WB |     |    |     |    |
|                   |    |     |     |     | Nop | 0  | 0   | 0  | 0   |    |     |    |
| ADDI              |    |     |     |     |     | IF | ID  | EX | MEM | WB |     |    |
|                   |    |     |     |     |     |    | Nop | 0  | 0   | 0  | 0   |    |
| BNE               |    |     |     |     |     |    |     | IF | ID  | EX | MEM | WB |
|                   |    |     |     |     |     |    |     |    | Nop | 0  | 0   | 0  |
| Next<br>Iteration |    |     |     |     |     |    |     |    |     | IF |     |    |

# Extra Credits: (10 pts)

Note that there is no forwarding from EX/MEM pipeline register to the input of EX in any case. Therefore, delaying by one cycle is not sufficient because R2 is not available then.

Timeline with only ALU-ALU forwarding:

| 1  | 2  | 3   | 4   | 5   | 6  | 7  | 8   | 9  |
|----|----|-----|-----|-----|----|----|-----|----|
| IF | ID | EX  | MEM | WB  |    |    |     |    |
|    | IF | ID  | EX  | MEM | WB |    |     |    |
|    |    | Nop | 0   | 0   | 0  | 0  |     |    |
|    |    |     | Nop | 0   | 0  | 0  | 0   |    |
|    |    |     |     | IF  | ID | EX | MEM | WB |