## Dynamically Scheduled Instruction Processing: Tomasulo

#### **Problems?**

- How do we prevent WAR and WAW hazards?
- How do we deal with variable latency?

|       |           |    | Clock Cycle Number |    |     |       |    |       |            |       |       |       |       |       |       |       |    |    |
|-------|-----------|----|--------------------|----|-----|-------|----|-------|------------|-------|-------|-------|-------|-------|-------|-------|----|----|
| Ins   | truction  | 1  | 2                  | 3  | 4   | 5     | 6  | 7     | 8          | 9     | 10    | 11    | 12    | 13    | 14    | 15    | 16 | 17 |
| LD    | F6,34(R2) | IF | ID                 | EX | MEM | WB    |    |       |            |       |       |       |       |       |       |       |    |    |
| LD    | F2,45(R3) |    | IF                 | ID | EX  | MEM   | WB |       |            |       |       |       |       |       |       | RA    | W  |    |
| MULTD | F0,F2,F4  |    |                    | IF | ID  | stall | M1 | M2    | M3         | M4    | M5    | M6    | M7    | M8    | M9    | M10   |    | WB |
| SUBD  | F8,F6,F2  |    |                    |    | IF  | ID    | A1 | A2    | MEM        | WB    |       |       |       |       |       |       |    |    |
| DIVD  | F10,F0,F6 |    |                    |    |     | IF    | ID | stall | stall      | stall | stall | stall | stall | stall | stall | stall | D1 | D2 |
| ADDD  | F6,F8,F2  |    |                    |    |     |       | IF | ID    | <b>A</b> 1 | A2    | MEM   | WB    | +     |       | W     | AD    |    |    |

## What do registers offer?

- Short, absolute name for a recently computed (or frequently used) value
- Fast, high bandwidth storage in the data path
- Means of broadcasting a computed value to set of instructions that use the value

## **Broadcasting result value**

- Series of instructions issued and waiting for value to be produced by logically preceding instruction.
- CDC6600 has each come back and read the value once it is placed in register file
- Alternative: broadcast value and reg # to all the waiting instructions

## Tomasulo Algorithm vs. Scoreboard

- Control & buffers <u>distributed</u> with Function Units (FU) vs. centralized in scoreboard;
  - FU buffers called "reservation stations"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations (RS); called register renaming;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so it can do optimizations compilers can't
- Results to FU from RS, not through registers. Over Common Data Bus that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as well

### The basic structure of a MIPS floatingpoint unit using Tomasulo's algorithm



#### Load and store buffer

#### Load buffer

- Hold the components of the effective address until it is computed
- Track outstanding loads that are waiting on the memory
- Hold the results of completed loads

#### Store buffer

- Hold the components of the effective address until it is computed
- Hold the destination memory addresses of outstanding stores that are waiting for data value to store
- Hold the address and value to store until the memory unit is available

## **Three Stages of Tomasulo Algorithm**

Each stage could take an arbitrary number of clock cycles.

## 1. Issue stage—get instruction from FP Op Queue

- FIFO order to ensure the correct data flow
- If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).
- If no empty reservation: structural hazard, stall until a station or buffer is freed
- if operands are not in the registers, keep track of the functional units that will produce the operands.
- Eliminate WAR and WAW hazards

### **Execution Stage of Tomasulo Algorithm**

#### 2. Execution—operate on operands (EX)

- If both operands ready then execute;
- if not ready, watch Common Data Bus for result (Resolve RAW)
- several instructions could become ready in the same clock cycle
- independent functional units could begin execution in the same clock cycle for different instructions
- if more than one instruction is waiting for a single functional unit, the unit will choose among them.

### Write result Stage of Tomasulo Algorithm

#### 3. Write result —finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

- Normal data bus: data + destination ("go to" bus)
- Common data bus: data + source ("come from" bus)
  - 64 bits of data + 4 bits of Functional Unit source address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast

#### **Reservation Station: Seven Fields**

- **Op:** Operation to perform in the unit
- Vj, Vk: Value of Source operands
  - -Store buffers has V field, result to be stored
- Qj, Qk: Reservation stations producing source registers (source registers to be written by other instructions)
  - –Note: Qj, Qk=0 indicates source is already in Vj or Vk (or is unnecessary)
- **Busy:** Indicates reservation station or FU is busy
- A: Hold information for memory address calculation for
  - a load store
  - initially: immediate field of the instruction
  - after address calculation: effective address

## Register result status: one field Qi

- Indicates the number of the reservation station (which functional unit) will write the register, if one exists.
- Blank when no pending instructions that will write that register.

## Load and store buffers each have a field: A

 Holds the result of the effective address once the first step of execution has been completed

#### **Notations**

- r : Reservation Station or buffer that the instruction is assigned to
- rs, rt : source register number
- rd : destination register number
- RS : Reservation Station
- RegStat : Register status data structure
- Regs : Register file

## **Tomasulo's Algorithm: The Details**

| Instruction state         | Wait until                  | Action/bookkeeping                                                                                                                                                                                                                                                                                                                                                                                                      |
|---------------------------|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue<br>ALU<br>operation | Reservation Station r empty | if (RegStat[rs].Qi $\neq$ 0)<br>{RS[r].Qj $\leftarrow$ RegStat[rs].Qi}<br>else { RS[r].Vj $\leftarrow$ Regs[rs]; RS[r].Qj $\leftarrow$ 0}<br>if (RegStat[rt].Qi $\neq$ 0)<br>{RS[r].Qk $\leftarrow$ RegStat[rt].Qi}<br>else { RS[r].Vk $\leftarrow$ Regs[rt]; RS[r].Qk $\leftarrow$ 0}<br>RS[r].busy $\leftarrow$ yes;<br>RegStat[rd].Qi $\leftarrow$ r;                                                                |
| Issue<br>Load or<br>store | Buffer r empty              | if (RegStat[rs].Qi $\neq$ 0)<br>{RS[r].Qj $\leftarrow$ RegStat[rs].Qi}<br>else { RS[r].Vj $\leftarrow$ Regs[rs]; RS[r].Qj $\leftarrow$ 0}<br>RS[r].A $\leftarrow$ imm;<br>RS[r].busy $\leftarrow$ yes;<br>RegStat[rt].Qi $\leftarrow$ r; // for load only<br>If (RegStat[rt].Qi $\neq$ 0) // for store only<br>{RS[r].Qk $\leftarrow$ RegStat[rt].Qi}<br>Else {RS[r].Vk $\leftarrow$ Regs[rt]; RS[r].Qk $\leftarrow$ 0} |

## **Tomasulo's Algorithm: The Details**

| Instruction state               | Wait until                                 | Action/bookkeeping                        |
|---------------------------------|--------------------------------------------|-------------------------------------------|
| Execute<br>ALU<br>operation     | RS[r].Qj=0 and RS[r].Qk=0                  | Compute result: operands are in Vj and Vk |
| Execute<br>Load-store<br>step 1 | RS[r].Qj=0 & r is head of load-store queue | RS[r].A ← RS[r].Vj + RS[r].A;             |
| Execute<br>Load step 2          | Load step1 complete                        | Read from Mem[RS[r].A]                    |

## **Tomasulo's Algorithm: The Details**

| Instruction state | Wait until           | Action/bookkeeping                  |
|-------------------|----------------------|-------------------------------------|
| Write Result      | Execution complete   | ∀ x ( if(RegStat[x].Qi=r) )         |
| ALU operation     | at r & CDB available | { Regs[x]←result ; RegStat[x].Qi←0} |
| or                |                      | ∀ x ( if(RS[x].Qj=r) )              |
| load              |                      | { RS[x].Vj←result ; RS[x].Qj←0}     |
|                   |                      | ∀ x ( if(RS[x].Qk=r) )              |
|                   |                      | { RS[x].Vk←result ; RS[x].Qk←0}     |
|                   |                      | RS[r].busy ← no;                    |
| Write Result      | Execution complete   | Mem[RS[r].A] ← RS[r].Vk;            |
| Store             | at r & RS[r].Qk=0    | RS[r].busy ← no;                    |
|                   |                      |                                     |
|                   |                      |                                     |
|                   |                      |                                     |

## **Tomasulo Example**



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30



Instruction kLD R2 F6 34+**R3** LD F2 45 +**MULTD** F0 F2 F4 **SUBD** F8 F2 F6 **DIVD** F10 F0 F6 **ADDD** F6 F8 F2

Exec Write



SI

*S*2

RS

Qj

Load1 Yes 34+R2
Load2 No
Load3 No

#### Reservation Stations:

Time Name Busy Op Vj Vk

Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

#### Register result status:

Clock

1

Γ

FU

F0

*F2 F4* 

F4 F6 Load1

F8

RS

Qk

F10

*F12* 

... *F30* 



Exec Write





#### Reservation Stations:

S1 S2 RS RS

| Гіте | Name  | Busy | Op | Vj | Vk | Qj | Qk |
|------|-------|------|----|----|----|----|----|
|      | Add1  | No   |    |    |    |    |    |
|      | Add2  | No   |    |    |    |    |    |
|      | Add3  | No   |    |    |    |    |    |
|      | Mult1 | No   |    |    |    |    |    |
|      | Mult2 | No   |    |    |    |    |    |

#### Register result status:

Clock

2



#### Instruction status: Exec Write Comp Result Instruction kIssue Busy Address Yes 34 + R2LD F6 34 +R21 Load1 LDF2 45 +**R**3 45 + R3Load2 Yes **MULTD** F2 F4 F0 Load3 No **SUBD** F8 F6 F2 **DIVD** F10 F6 F0 **ADDD** F8 F2 F6 Reservation Stations: SI*S*2 RS RS VkTime Name Busy Op $V_{j}$ Qj QkAdd1 No Add2 No Add3 Yes MULTD R(F4) Load2 Mult1 Mult2

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

Mult1 Load2 Load1

- Note: registers names are removed ("renamed") in Reservation Stations;
- Load1 completing; what is waiting for Load1?

#### Instruction status: Exec Write Comp Result Address Instruction kIssue Busy 3 LD F6 34 +R21 4 Load1 No LD F2 45 +**R**3 4 Yes Load2 45 + R3**MULTD** F4 FO F2 Load3 No SUBD F8 F6 F2 4 **DIVD** F10 F6 F0 **ADDD** F8 F2 F6 Reservation Stations: SI*S*2 RS RS $V_{j}$ Time Name Busy OpVk $Q_{j}$ QkAdd1 Yes SUBD M(A1) Load2 Add2 No Add3 No Yes MULTD R(F4) Load2 Mult1 Mult2 No

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 4 FU Mult1 Load2 M(A1) Add1

Load2 completing; what is waiting for Load2?

#### Instruction status: Exec Write Issue Comp Result Busy Address Instruction kLD34 +3 No F6 R2 1 4 Load1 45 +5 LD F2 **R**3 4 Load2 No **MULTD** FO F2 F4 Load3 No **SUBD** F8 F6 F2 4 **DIVD** F10 F<sub>0</sub> **F6**

#### Reservation Stations:

F6

F8

F2

**ADDD** 

| Time | Name  | Busy | Op    | Vj    | Vk    | Qj    | Qk |
|------|-------|------|-------|-------|-------|-------|----|
| 2    | Add1  | Yes  | SUBD  | M(A1) | M(A2) |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 10   | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | M(A1) | Mult1 |    |

SI

#### Register result status:

| Clock |    | FO    | F2    | <i>F4</i> | <i>F6</i> | F8   | F10   | F12 | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|------|-------|-----|-----|-----|
| 5     | FU | Mult1 | M(A2) |           | M(A1)     | Add1 | Mult2 |     |     |     |

*S*2

RS

RS

#### Instruction status:

Exec Write

| Instructio | n   | $\dot{j}$ | k          | Issue | Comp | Result |
|------------|-----|-----------|------------|-------|------|--------|
| LD         | F6  | 34+       | R2         | 1     | 3    | 4      |
| LD         | F2  | 45+       | <b>R</b> 3 | 2     | 4    | 5      |
| MULTD      | FO  | F2        | F4         | 3     |      |        |
| SUBD       | F8  | F6        | F2         | 4     |      |        |
| DIVD       | F10 | F0        | F6         | 5     |      |        |
| ADDD       | F6  | F8        | F2         | 6     |      |        |

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### Reservation Stations:

SI*S*2 RS RS  $V_i$ V/L  $O_i$  $O^{k}$ 

| Time Name | Busy | Op    | Vj    | Vk    | Qj    | Qk |
|-----------|------|-------|-------|-------|-------|----|
| 1 Add1    | Yes  | SUBD  | M(A1) | M(A2) |       |    |
| Add2      | Yes  | ADDD  |       | M(A2) | Add1  |    |
| Add3      | No   |       |       |       |       |    |
| 9 Mult1   | Yes  | MULTD | M(A2) | R(F4) |       |    |
| Mult2     | Yes  | DIVD  |       | M(A1) | Mult1 |    |

#### Register result status:

Clock

6

FU

F0*F2*  F4

*F6* F8

F10

*F12* 

F30

Mult1 M(A2)

Add2 Add1

Mult2

| Instructio  | nstruction status: |        |            |       |       |            |       |       |      |         |
|-------------|--------------------|--------|------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on                 | j      | k          | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6                 | 34+    | R2         | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2                 | 45+    | R3         | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | F0                 | F2     | <b>F</b> 4 | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8                 | F6     | F2         | 4     | 7     |            |       |       |      |         |
| DIVD        | F10                | F0     | <b>F6</b>  | 5     |       |            |       |       |      |         |
| ADDD        | F6                 | F8     | F2         | 6     |       |            |       |       |      |         |
| Reservation | on St              | ations |            |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time               | Name   | Busy       | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             | O                  | Add1   | Yes        | SUBD  | M(A1) | M(A2)      |       |       |      |         |
|             |                    | Add2   | Yes        | ADDD  |       | M(A2)      | Add1  |       |      |         |
|             |                    | Add3   | No         |       |       |            |       |       |      |         |
|             | 8                  | Mult1  | Yes        | MULTD | M(A2) | R(F4)      |       |       |      |         |
|             |                    | Mult2  | Yes        | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

| Clock |    | FO    | F2    | <i>F4</i> | <i>F6</i> | F8   | F10   | F12 | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|------|-------|-----|-----|-----|
| 7     | FU | Mult1 | M(A2) |           | Add2      | Add1 | Mult2 |     |     |     |

· Add1 completing; what is waiting for it?

| In | struction  | n stai | tus:   |                  |       | Exec  | Write     |    |       |      |         |
|----|------------|--------|--------|------------------|-------|-------|-----------|----|-------|------|---------|
|    | Instructio | n      | j      | $\boldsymbol{k}$ | Issue | Comp  | Result    |    |       | Busy | Address |
|    | LD         | F6     | 34+    | R2               | 1     | 3     | 4         |    | Load1 | No   |         |
|    | LD         | F2     | 45+    | R3               | 2     | 4     | 5         |    | Load2 | No   |         |
|    | MULTD      | FO     | F2     | F4               | 3     |       |           |    | Load3 | No   |         |
|    | SUBD       | F8     | F6     | F2               | 4     | 7     | 8         |    |       |      |         |
|    | DIVD       | F10    | FO     | <b>F6</b>        | 5     |       |           |    |       |      |         |
|    | ADDD       | F6     | F8     | F2               | 6     |       |           |    |       |      |         |
| Re | eservatio  | on Sto | ations | s:               |       | S1    | <i>S2</i> | RS | RS    |      |         |
|    |            | Time   | Name   | Busy             | Op    | Vj    | Vk        | Qj | Qk    | _    |         |
|    |            |        | Add1   | No               |       |       |           |    |       |      |         |
|    |            | 2      | Add2   | Yes              | ADDD  | (M-M) | M(A2)     |    |       |      |         |
|    |            |        | Add3   | No               |       |       |           |    |       |      |         |

Yes MULTD M(A2) R(F4)

**DIVD** 

#### Register result status:

Yes

7 Mult1

Mult2

| Clock |    | FO    | F2    | F4 | <i>F6</i> | F8    | F10   | F12 | ••• | F30 |
|-------|----|-------|-------|----|-----------|-------|-------|-----|-----|-----|
| 8     | FU | Mult1 | M(A2) |    | Add2      | (M-M) | Mult2 |     |     |     |

M(A1) Mult1

| In | structio    | n stai | tus:   |                  |       | Exec  | Write     |    |       |      |         |
|----|-------------|--------|--------|------------------|-------|-------|-----------|----|-------|------|---------|
|    | Instruction | n      | j      | $\boldsymbol{k}$ | Issue | Comp  | Result    |    |       | Busy | Address |
|    | LD          | F6     | 34+    | R2               | 1     | 3     | 4         |    | Load1 | No   |         |
|    | LD          | F2     | 45+    | <b>R</b> 3       | 2     | 4     | 5         |    | Load2 | No   |         |
|    | MULTD       | FO     | F2     | F4               | 3     |       |           |    | Load3 | No   |         |
|    | SUBD        | F8     | F6     | F2               | 4     | 7     | 8         |    |       |      |         |
|    | DIVD        | F10    | FO     | <b>F6</b>        | 5     |       |           |    |       |      |         |
|    | ADDD        | F6     | F8     | F2               | 6     |       |           |    |       |      |         |
| Re | eservatio   | on St  | ations | s:               |       | S1    | <i>S2</i> | RS | RS    |      |         |
|    |             | Time   | Name   | Busy             | Op    | Vj    | Vk        | Qj | Qk    | _    |         |
|    |             |        | Add1   | No               |       |       |           |    |       |      |         |
|    |             | 1      | Add2   | Yes              | ADDD  | (M-M) | M(A2)     |    |       |      |         |
|    |             |        | Add3   | No               |       |       |           |    |       |      |         |

Yes MULTD M(A2) R(F4)

**DIVD** 

#### Register result status:

6 Mult1

Mult2 Yes

| Clock |    | F0    | F2    | F4 | <i>F6</i> | F8    | F10   | F12 | ••• | F30 |
|-------|----|-------|-------|----|-----------|-------|-------|-----|-----|-----|
| 9     | FU | Mult1 | M(A2) |    | Add2      | (M-M) | Mult2 |     |     |     |

M(A1) Mult1

```
Instruction status:
                                         Write
                                   Exec
   Instruction
                        k
                             Issue Comp Result
                                                              Busy
                                                                    Address
                                      3
   LD
            F6
                  34+
                        R2
                               1
                                             4
                                                       Load1
                                                               No
   LD
            F2
                  45 +
                        R3
                                      4
                                             5
                                                       Load2
                                                               No
   MULTD
                  F2
                        F4
                                                       Load3
            FO
                                                               No
   SUBD
            F8
                        F2
                               4
                                      7
                                             8
                  F6
   DIVD
            F10
                  F0
                        F6
   ADDD
            F6
                  F8
                        F2
                                      10
Reservation Stations:
                                     SI
                                            S2
                                                  RS
                                                         RS
           Time Name Busy
                                     V_i
                                            Vk
                              Op
                                                  Q_{j}
                                                         Qk
                 Add1
                        No
               0 \text{ Add} 2
                        Yes ADDD (M-M) M(A2)
                 Add3
                        No
               5 Mult1
                        Yes MULTD M(A2) R(F4)
                 Mult2
                       Yes
                             DIVD
                                          M(A1) Mult1
```

#### Register result status:

| Clock |    | F0    | F2    | F4 | <i>F6</i> | F8    | F10   | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|----|-----------|-------|-------|------------|-----|-----|
| 10    | FU | Mult1 | M(A2) |    | Add2      | (M-M) | Mult2 |            |     |     |

#### Add2 completing;

```
Instruction status:
                                     Exec
                                            Write
   Instruction
                          \boldsymbol{k}
                              Issue
                                     Comp Result
                                                                 Busy
                                                                       Address
                                        3
   LD
             F6
                   34 +
                         R2
                                 1
                                               4
                                                          Load1
                                                                  No
   LD
             F2
                  45 +
                         R3
                                        4
                                               5
                                                                  No
                                                          Load2
   MULTD
                   F2
                         F4
             FO
                                                          Load3
                                                                  No
   SUBD
             F8
                                4
                                        7
                                               8
                   F6
                         F2
   DIVD
             F10
                   FO
                         F6
   ADDD
                   F8
                         F2
                                       10
             F6
                                              11
Reservation Stations:
                                       SI
                                              S2
                                                    RS
                                                           RS
                                              Vk
            Time Name Busy
                               Op
                                       V_{j}
                                                     Q_{i}
                                                           Qk
                 Add1
                         No
                 Add2
                         No
                 Add3
                         No
                         Yes MULTD M(A2) R(F4)
                4 Mult1
                 Mult2
                        Yes
                                            M(A1) Mult1
                              DIVD
```

#### Register result status:

| Clock |    | F0    | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | <i>F12</i> | • • • | F30 |
|-------|----|-------|-------|-----------|-----------|-------|-------|------------|-------|-----|
| 11    | FU | Mult1 | M(A2) |           | (M-M+N)   | (M-M) | Mult2 |            |       |     |

- Write result of ADDD here vs. scoreboard?
- · All quick instructions complete in this cycle!

| Instructio  | n sta | tus:   |      |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | k    | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4   | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6   | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10    | 11         |       |       |      |         |
| Reservation | on St | ations | s:   |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No   |       |       |            |       |       |      |         |
|             |       | Add2   | No   |       |       |            |       |       |      |         |
|             |       | Add3   | No   |       |       |            |       |       |      |         |
|             | 3     | Mult1  | Yes  | MULTI | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes  | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

| Clock |    | F0    | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|-------|-------|------------|-----|-----|
| 12    | FU | Mult1 | M(A2) | (         | M-M+N     | (M-M) | Mult2 |            |     |     |

| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3               | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | <b>F6</b>        | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reservation | on St | ations | S.:              |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |       |       |      |         |
|             |       | Add2   | No               |       |       |            |       |       |      |         |
|             |       | Add3   | No               |       |       |            |       |       |      |         |
|             | 2     | Mult1  | Yes              | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       | ]    |         |

#### Register result status:

Clock

13 F0 F2 F4 F6 F8 F10 F12 ... F30

| Mult | M(A2) | (M-M+N (M-M) | Mult | Mult

| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3               | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | <b>F6</b>        | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reservation | on St | ations | 7.               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |       |       |      |         |
|             |       | Add2   | No               |       |       |            |       |       |      |         |
|             |       | Add3   | No               |       |       |            |       |       |      |         |
|             | 1     | Mult1  | Yes              | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

| Clock |    | F0    | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|-------|-------|------------|-----|-----|
| 14    | FU | Mult1 | M(A2) | (         | M-M+N     | (M-M) | Mult2 |            |     |     |

| Instructio  | n sta | tus:   |      |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | k    | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | F0    | F2     | F4   | 3     | 15    |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6   | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10    | 11         |       |       |      |         |
| Reservati   | on St | ations |      |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No   |       |       |            |       |       |      |         |
|             |       | Add2   | No   |       |       |            |       |       |      |         |
|             |       | Add3   | No   |       |       |            |       |       |      |         |
|             | C     | Mult1  | Yes  | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes  | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 15 FU Mult1 M(A2) (M-M+N(M-M)) Mult2

| Instructio  | n sta | tus:      |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|-----------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | $\dot{j}$ | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+       | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+       | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | FO    | F2        | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6        | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | FO        | F6   | 5     |      |            |    |       |      |         |
| ADDD        | F6    | F8        | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservation | on St | ations    | :    |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name      | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1      | No   |       |      |            |    |       |      |         |
|             |       | Add2      | No   |       |      |            |    |       |      |         |
|             |       | Add3      | No   |       |      |            |    |       |      |         |
|             |       | Mult1     | No   |       |      |            |    |       |      |         |
|             | 40    | Mult2     | Yes  | DIVD  | M*F4 | M(A1)      |    |       |      |         |

#### Register result status:

| Clock |    | F0   | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | F12 | ••• | F30 |
|-------|----|------|-------|-----------|-----------|-------|-------|-----|-----|-----|
| 16    | FU | M*F4 | M(A2) | (         | M-M+N     | (M-M) | Mult2 |     |     |     |

## Faster than light computation (skip a couple of cycles)

| Instructio  | n sta | tus:      |                  |       | Exec | Write     |    |       |      |         |
|-------------|-------|-----------|------------------|-------|------|-----------|----|-------|------|---------|
| Instruction | on    | $\dot{j}$ | $\boldsymbol{k}$ | Issue | Comp | Result    |    |       | Busy | Address |
| LD          | F6    | 34+       | R2               | 1     | 3    | 4         |    | Load1 | No   |         |
| LD          | F2    | 45+       | R3               | 2     | 4    | 5         |    | Load2 | No   |         |
| MULTD       | FO    | F2        | F4               | 3     | 15   | 16        |    | Load3 | No   |         |
| SUBD        | F8    | F6        | F2               | 4     | 7    | 8         |    |       |      |         |
| DIVD        | F10   | FO        | F6               | 5     |      |           |    |       |      |         |
| ADDD        | F6    | F8        | F2               | 6     | 10   | 11        |    |       |      |         |
| Reservation | on St | ations    | <b>:</b>         |       | S1   | <i>S2</i> | RS | RS    |      |         |
|             | Time  | Name      | Busy             | Op    | Vj   | Vk        | Qj | Qk    |      |         |
|             |       | Add1      | No               |       |      |           |    |       |      |         |
|             |       | Add2      | No               |       |      |           |    |       |      |         |
|             |       | Add3      | No               |       |      |           |    |       |      |         |
|             |       | Mult1     | No               |       |      |           |    |       |      |         |
|             | 1     | Mult2     | Yes              | DIVD  | M*F4 | M(A1)     |    |       |      |         |

#### Register result status:

| Clock |    | FO   | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | <i>F12</i> | ••• | F30 |
|-------|----|------|-------|-----------|-----------|-------|-------|------------|-----|-----|
| 55    | FU | M*F4 | M(A2) | (         | M-M+N     | (M-M) | Mult2 |            |     |     |

| Instructio  | n sta | tus:      |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|-----------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | $\dot{j}$ | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+       | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+       | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | FO    | F2        | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6        | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | FO        | F6   | 5     | 56   |            |    |       |      |         |
| ADDD        | F6    | F8        | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservation | on St | ations    | S.:  |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name      | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1      | No   |       |      |            |    |       |      |         |
|             |       | Add2      | No   |       |      |            |    |       |      |         |
|             |       | Add3      | No   |       |      |            |    |       |      |         |
|             |       | Mult1     | No   |       |      |            |    |       |      |         |
|             | 0     | Mult2     | Yes  | DIVD  | M*F4 | M(A1)      |    |       |      |         |

#### Register result status:

· Mult2 is completing; what is waiting for it?



#### Register result status:

 Once again: In-order issue, out-of-order execution and completion.

## **Compare to Scoreboard Cycle 62**

| Instruction | n sta | Read | Exec             | Write   | 2    |      |      |      |    |
|-------------|-------|------|------------------|---------|------|------|------|------|----|
| Instructio  | n     | j    | $\boldsymbol{k}$ | $I_{s}$ | ssue | Oper | Comp | Resu | lt |
| LD          | F6    | 34+  | R2               |         | 1    | 2    | 3    | 4    |    |
| LD          | F2    | 45+  | R3               |         | 5    | 6    | 7    | 8    |    |
| MULTD       | F0    | F2   | F4               |         | 6    | 9    | 19   | 20   |    |
| SUBD        | F8    | F6   | F2               |         | 7    | 9    | 11   | 12   |    |
| DIVD        | F10   | F0   | F6               |         | 8    | 21   | 61   | 62   |    |
| ADDD        | F6    | F8   | F2               |         | 13   | 14   | 16   | 22   |    |

| Exec Writ         |  |    |  |    |   |  |  |  |  |  |  |
|-------------------|--|----|--|----|---|--|--|--|--|--|--|
| Issue Comp Result |  |    |  |    |   |  |  |  |  |  |  |
| 1                 |  | 3  |  | 4  |   |  |  |  |  |  |  |
| 2                 |  | 4  |  | 5  | Ш |  |  |  |  |  |  |
| 3                 |  | 15 |  | 16 | Ш |  |  |  |  |  |  |
| 4                 |  | 7  |  | 8  | Ш |  |  |  |  |  |  |
| 5                 |  | 56 |  | 57 |   |  |  |  |  |  |  |
| 6                 |  | 10 |  | 11 |   |  |  |  |  |  |  |

## Why is Tomasulo's scheme widely adopted in multiple-issue processors after 1990s

- Achieving high performance without requiring the compiler to target code to a specific pipeline structure.
- With the presence of cache, the delays are unpredictable (miss). Out-of-order execution allows the processors to continue executing instructions while awaiting the completion of a cache miss (hiding cache miss penalty).
- Processors becoming more aggressive in issue capability demands register renaming and dynamic scheduling.
- Dynamic scheduling is a key component of speculation, it was adopted along with hardware speculation in the mid-1990s.

### **Hardware-Based Speculation**

- Speculation: fetch, issue, and execute instructions as if the branch predictions were always correct
- key ideas of Hardware-based speculation
  - Dynamic branch prediction
  - Allow execution of instructions before the control dependences are resolved
  - Dynamic scheduling of different combinations of basic blocks
- Perform update that cannot be undone only when the instruction is no longer speculative
  - Update the register file or memory: instruction commit
- Execute out of order but commit in order

## **Hardware-Based Speculation**

- Allow instruction to execute out of order but commit in order
- Separate execution complete from instruction commit
- Require additional hardware buffers (Reorder buffer, ROB): hold the results (finished execution but have not committed)

- Tomasulo's algorithm with speculation
  - 4 steps
  - Issue, execute, write result, commit

#### Reorder Buffer

 Reorder buffer – holds the result of instruction between completion and commit

#### Four fields:

- Instruction type: branch/store/register
- Destination field: register number
- Value field: output value
- Ready field: completed execution?

# Differences between Tomasulo's algorithm with and without speculation

### Without speculation

- Only partially overlaps basic blocks because it requires that a branch to be resolved before actually executing any instructions in the successor basic block
- Once an instruction writes its result, any subsequently issued instructions will find the result in the register file

### With speculation

- Dynamic scheduling of different combinations of basic blocks
- The register file is not updated until the instruction commits
- integrate the function of store buffer into ROB

## Four steps of hardware-based speculation

#### 1. Issue

- Get the instruction from the instruction queue
- Issue if there is an empty reservation and empty slot in ROB (stall if no empty entry)
- Send the operands to reservation station if they are available
- Update the control entries
- The number of the ROB entry allocated for the result is also sent to the reservation station

#### 2. Execute

- If one or more operands is not yet available, monitor the CDB
- When both operands are available at a reservation station, execute the operation (eliminate RAW)

## Four steps of hardware-based speculation

#### 3. Write result

 When the result is available, write it on the CDB, any reservation stations waiting for this result, and from the CDB to ROB

#### 4. Commit

 Branch with an incorrect prediction: the speculation is wrong, so the ROB is flushed and the execution is restarted at the correct successor of the branch

(if the branch is correctly predicted, the branch is finished)

- Store: when an instruction reaches the head of ROB, update the memory with the result and removes the instruction from the ROB
- Any other instruction: when an instruction reaches the head of ROB, update the register with the result and removes the instruction from the ROB

#### **Reorder Buffer**

