```
! ! ! !
!
```

R1, O(R0) LW R2.4(R0) LW R3, R1, R2 ADD ;a=b+e R3, 12(R0) SW R4, 8(R0) LW R5, R1, R4 ADD ;c=b+f R5, 16(R0) SW

WB ID
EX ID EX MEM ID

EX

|                | 1  | 2  | 3  | 4   | 5   | 6  | 7  | 8  | 9   |
|----------------|----|----|----|-----|-----|----|----|----|-----|
| LW R1, 0(R0)   | IF | ID | EX | MEM | WB  |    |    |    |     |
| LW R2, 4(R0)   |    | IF | ID | EX  | MEM | WB |    |    |     |
| ADD R3, R1, R2 |    |    | IF | ID  | ID  | ID | ID | EX | MEM |
| SW R3, 12(R0)  |    |    |    | IF  | IF  | IF | IF | ID | EX  |
| LW R4, 8(R0)   |    |    |    |     |     |    |    | IF | ID  |
| ADD R5, R1, R4 |    |    |    |     |     |    |    |    | IF  |
| SW R5, 16(R0)  |    |    |    |     |     |    |    |    |     |

|                | 10  | 11  | 12   | 13  | 14  | 15 |
|----------------|-----|-----|------|-----|-----|----|
| LW R1, 0(R0)   |     |     |      |     |     |    |
| LW R2, 4(R0)   |     |     |      |     |     |    |
| ADD R3, R1, R2 | WB  |     |      |     |     |    |
| SW R3, 12(R0)  | MEM | WB  |      |     |     |    |
| LW R4, 8(R0)   | EX  | МЕМ | WB   |     |     |    |
| ADD R5, R1, R4 | ID  | ID  | EX 1 | MEM | WB  |    |
| SW R5, 16(R0)  | IF  | IF  | ID   | EX  | MEM | WB |

15

LW R1, 0(R0) LW R2, 4(R0) LW R4, 8(R0) ADD R5, R1, R4 ;c=b+f LW  $\mathsf{SW}$ R5, 16(R0) R3, R1, R2 ADD ;a=b+e LW  $\mathsf{SW}$ R3, 12(R0)

|              | 1  | 2  | 3  | 4   | 5  | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|--------------|----|----|----|-----|----|---|---|---|---|----|----|----|
| LW R1, 0(R0) | IF | ID | EX | MEM | WB |   |   |   |   |    |    |    |

| LW R2, 4(R0)   | IF | ID | EX | MEM | WB   |    |     |     |     |     |    |
|----------------|----|----|----|-----|------|----|-----|-----|-----|-----|----|
| LW R4, 8(R0)   |    | IF | ID | EX  | МЕМ  | WB |     |     |     |     |    |
| ADD R5, R1, R4 |    |    | IF | ID  | ID ' | EX | MEM | WB  |     |     |    |
| SW R5, 16(R0)  |    |    |    | IF  | IF   | ID | EX  | MEM | WB  |     |    |
| ADD R3, R1, R2 |    |    |    |     |      | IF | ID  | EX  | MEM | WB  |    |
| SW R3, 12(R0)  |    |    |    |     |      |    | IF  | ID  | EX  | MEM | WB |

12 3 MEM WB ΕX ΕX

13-11=2

13

| 13             |    |    |    |     |      |    |     |     |      |    |     |     |    |
|----------------|----|----|----|-----|------|----|-----|-----|------|----|-----|-----|----|
|                | 1  | 2  | 3  | 4   | 5    | 6  | 7   | 8   | 9    | 10 | 11  | 12  | 13 |
| LW R1, 0(R0)   | IF | ID | EX | MEM | WB   |    |     |     |      |    |     |     |    |
| LW R2, 4(R0)   |    | IF | ID | EX  | MEM  | WB |     |     |      |    |     |     |    |
| ADD R3, R1, R2 |    |    | IF | ID  | ID 🔻 | EX | MEM | WB  |      |    |     |     |    |
| SW R3, 12(R0)  |    |    |    | IF  | IF   | ID | EX  | MEM | WB   |    |     |     |    |
| LW R4, 8(R0)   |    |    |    |     |      | IF | ID  | EX  | MEM  | WB |     |     |    |
| ADD R5, R1, R4 |    |    |    |     |      |    | IF  | ID  | ID 🖠 | EX | MEM | WB  |    |
| SW R5, 16(R0)  |    |    |    |     |      |    |     | IF  | IF   | ID | EX  | MEM | WB |

LW R1, 0(R0)

LW R2, 4(R0)

LW R4, 8(R0)

ADD R3, R1, R2 ;a=b+e

ADD R5, R1, R4 ;c=b+f

SW R3, 12(R0)

R5, 16(R0) SW

> 11 2

!!

与 与 !

!

MIPS

LW R1, 0(R0) ; load A LW R2, 4(R0) ; load B

R3, R1, R2 ADD ; A=A+B

SUB R4, R3, R2

; C=A-B

 $\mathsf{SW}$ R3, 0(R0) ; store A

SW R4, 8(R0) ; store C

|                | 1  | 2  | 3  | 4   | 5   | 6  | 7  | 8  | 9   | 10 | 11 | 12 | 13 | 14 | 15 |
|----------------|----|----|----|-----|-----|----|----|----|-----|----|----|----|----|----|----|
| LW R1, 0(R0)   | IF | ID | EX | MEM | WB  |    |    |    |     |    |    |    |    |    |    |
| LW R2, 4(R0)   |    | IF | ID | EX  | MEM | WB |    |    |     |    |    |    |    |    |    |
| ADD R3, R1, R2 |    |    | IF | ID  | ID  | ID | ID | EX | MEM | WB |    |    |    |    |    |

| SUB R4, R3, R2 |  | IF | IF | IF | IF | ID | ID | ID | ID | EX | MEM | WB  |    |
|----------------|--|----|----|----|----|----|----|----|----|----|-----|-----|----|
| SW R3, 0(R0)   |  |    |    |    |    | IF | IF | IF | IF | ID | EX  | MEM | WB |
| SW R4, 8(R0)   |  |    |    |    |    |    |    |    |    | IF | ID  | ID  | ID |

|                | 16 | 17  | 18 |
|----------------|----|-----|----|
| LW R1, 0(R0)   |    |     |    |
| LW R2, 4(R0)   |    |     |    |
| ADD R3, R1, R2 |    |     |    |
| SUB R4, R3, R2 |    |     |    |
| SW R3, 0(R0)   |    |     |    |
| SW R4, 8(R0)   | EX | MEM | WB |

!!

|       | ADDIU R3, R0, N  | ; N             |
|-------|------------------|-----------------|
|       | SLL R3, R3, 2    | ; N*4           |
|       | ADDU R3, R1, R3  | ; R3=R1+N*4     |
| Loop: | LWC1 F1, 0(R1)   | ; load X(i)     |
|       | LWC1 F2, 0(R2)   | ; load Y(i)     |
|       | MUL.S F3, F0, F1 | ; a*X(i)        |
|       | ADD.S F4, F2, F3 | ; a*X(i)+Y(i)   |
|       | ADDIU R1, R1, 4  | ; i++ for X     |
|       | ADDIU R2, R2, 4  | ; i++ for Y     |
|       | BNE R1,R3, Loop  |                 |
|       | SWC1 F4, -4(R1)  | ; store to X(i) |

X EX

!

|                  | 1  | 2  | 3  | 4   | 5   | 6   | 7   | 8   | 9   |
|------------------|----|----|----|-----|-----|-----|-----|-----|-----|
| LWC1 F1, 0(R1)   | IF | ID | EX | MEM | WB  |     |     |     |     |
| LWC1 F2, 0(R2)   |    | IF | ID | EX  | МЕМ | WB  |     |     |     |
| MUL.S F3, F0, F1 |    |    | IF | ID  | EX1 | EX2 | EX3 | EX4 | MEM |
| ADD.S F4, F2, F3 |    |    |    | IF  | ID  | ID  | ID  | ID  | EX1 |
| ADDIU R1, R1, 4  |    |    |    |     | IF  | IF  | IF  | IF  | ID  |
| ADDIU R2, R2, 4  |    |    |    |     |     |     |     |     | IF  |
| BNE R1,R3, Loop  |    |    |    |     |     |     |     |     |     |
| SWC1 F4, -4(R1)  |    |    |    |     |     |     |     |     |     |

|                | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|----------------|----|----|----|----|----|----|----|----|
| LWC1 F1, 0(R1) |    |    |    |    |    |    |    |    |
| LWC1 F2, 0(R2) |    |    |    |    |    |    |    |    |

| MUL.S F3, F0, F1 | WB  |     |     |     |     |     |     |    |
|------------------|-----|-----|-----|-----|-----|-----|-----|----|
| ADD.S F4, F2, F3 | EX2 | EX3 | MEM | WB  |     |     |     |    |
| ADDIU R1, R1, 4  | ID  | ID  | EX  | MEM | WB  |     |     |    |
| ADDIU R2, R2, 4  | IF  | IF  | ID  | EX  | MEM | WB  |     |    |
| BNE R1,R3, Loop  |     |     | IF  | ID  | EX  | MEM | WB  |    |
| SWC1 F4, -4(R1)  |     |     |     | IF  | ID  | EX  | MEM | WB |

!! 与!

| (1) |                 | 99 | )  |    |     |    |    |     |    |    |     |     |
|-----|-----------------|----|----|----|-----|----|----|-----|----|----|-----|-----|
|     |                 | 1  | 2  | 3  | 4   | 5  | 6  | 7   | 8  | 9  | 10  | 11  |
|     | LD R1, 0(R2)    | IF | ID | EX | MEM | WB |    |     |    |    |     |     |
|     | DADDI R1, R1, 4 |    | IF | ID | ID  | ID | EX | MEM | WB |    |     |     |
|     | SD R1, 0(R2)    |    |    | IF | IF  | IF | ID | ID  | ID | EX | MEM | WB  |
|     | DADDI R2, R2, 4 |    |    |    |     |    | IF | IF  | IF | ID | EX  | MEM |
|     | DSUB R4, R3, R2 |    |    |    |     |    |    |     |    | IF | ID  | ID  |

BNEZ R4, Loop IF IF

```
(3)
                    taken
Loop:
    LD
             R1, 0(R2)
    DADDI
             R2, R2, #4
    DSUB
             R4, R3, R2
    DADDI
             R1, R1, #4
    BNEZ
             R4, Loop
    SD
             -4(R2), R1
                              3
                                                                          10
                                                                                 11
     LD R1, 0(R2)
                   IF
                        ID
                                   MEM
                                         WB
                              ΕX
     DADDI R2, R2, 4
                              ID
                                   EX
                                         MEM
     DSUB R4, R3, R2
                              IF
                                   ID
                                               MEM
                                                      WB
     DADDI R1, R1, 4
                                    IF
                                                EX
                                                     MEM
                                                            WB
                                          ID
     BNEZ R4, Loop
                                                            MEM
                                                                   WB
                                          IF
                                                ID
                                                      ΕX
     SD -4(R2), R1
                                                                  MEM
                                                      ID
                                                             ΕX
     LD R1, 0(R2)
                                                                   EX
                                                                         MEM
                                                                                 WB
   99
                                                                      10
               6 99+10=604
       !!
           (nop ) 与
                               nop
                                                                                  memory
alignment
                            hazard
                                                             branch delay slot
     !
               !
```

!

(1) ALU

!

!

!

!

!

! ! ! ! ! ! ! ! ! ! 与 ! ! ! ! ! !! ! ! !!!!!!! ! ! ! !!! ! !!! ! ! ! ! !!! ! ! !!! ! ! ! ! ! ! ! !! ! !!! ! ! !!!! ! !!!! ! 1 1 1 ! !!!!! ! ! !!!!! !



| !<br>!!<br>!<br>!<br>!                 | ! ! !<br>! !<br>! !<br>! ! !<br>! !!  | !<br>!<br>!<br>! | !!!         | ! !<br>! !<br>! ! | !!!              | !<br>!<br>!<br>! |             | ! !<br>! !<br>! ! |
|----------------------------------------|---------------------------------------|------------------|-------------|-------------------|------------------|------------------|-------------|-------------------|
| !                                      | !<br>! ! ! !<br>! ! ! !<br>! ! !      | !<br>!<br>!<br>! | !<br>!<br>! | !<br>!<br>!<br>!  | !<br>!<br>!<br>! | !!<br>!          |             |                   |
| ! ! !!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! | !<br>!<br>!      | !!!!        | !!!!!!!           | !!!!!!!!         | !<br>!<br>!<br>! | !<br>!<br>! | !                 |



! ! ! 与 !!! ! !!! ! ! ! !! !! !! ! ! ! ! ! ! ! 与 与! ! ! !

| L: | L.D    | F0, 0(R1)  | ; load X[i]        |
|----|--------|------------|--------------------|
|    | L.D    | F6, -8(R1) | ; load X[i-1]      |
|    | MUL.D  | F0, F0, F2 | ; a*X[i]           |
|    | MUL.D  | F6, F6, F2 | ; a*X[i-1]         |
|    | L.D    | F4, 0(R2)  | ; load Y[i]        |
|    | L.D    | F8, -8(R2) | ; load Y[i-1]      |
|    | ADD.D  | F0, F0, F4 | ; a*X[i] + Y[i]    |
|    | ADD.D  | F6, F6, F8 | ; a*X[i-1]+ Y[i-1] |
|    | DSUBUI | R2, R2, 16 |                    |
|    | DSUBUI | R1, R1, 16 |                    |
|    | S.D    | F0, 16(R2) | ; store Y[i]       |
|    | S.D    | F6, 8(R2)  | ; store Y[i-1]     |
|    | BNEZ   | R1, L      |                    |
|    |        | !          |                    |

!

!

| D      | ΓΟ Ο/D1\          |                                                                                                                                                           |                                                                                                                                                                                                      |
|--------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|        | F0, 0(R1)         |                                                                                                                                                           |                                                                                                                                                                                                      |
| D      | F6, -8(R1)        |                                                                                                                                                           |                                                                                                                                                                                                      |
| D      | F10, -16(R1)      | MUL.D                                                                                                                                                     | F0, F0, F2                                                                                                                                                                                           |
| D      | F14, -24(R1)      | MUL.D                                                                                                                                                     | F6, F6, F2                                                                                                                                                                                           |
| D      | F4, 0(R2)         | MUL.D                                                                                                                                                     | F10, F10, F2                                                                                                                                                                                         |
| D      | F8, -8(R2)        | MUL.D                                                                                                                                                     | F14, F14, F2                                                                                                                                                                                         |
| D      | F12, -16(R2)      | ADD.D                                                                                                                                                     | F0, F0, F4                                                                                                                                                                                           |
| D      | F16, -24(R2)      | ADD.D                                                                                                                                                     | F6, F6, F8                                                                                                                                                                                           |
| OSUBUI | R2, R2, 32        | ADD.D                                                                                                                                                     | F10, F10, F12                                                                                                                                                                                        |
| OSUBUI | R1, R1, 32        | ADD.D                                                                                                                                                     | F14, F14, F16                                                                                                                                                                                        |
| S.D    | F0, 32(R2)        |                                                                                                                                                           |                                                                                                                                                                                                      |
| 5.D    | F6, 24(R2)        |                                                                                                                                                           |                                                                                                                                                                                                      |
| 0      | D.D.D.D.D.D.SUBUI | F10, -16(R1)  F14, -24(R1)  F4, 0(R2)  F8, -8(R2)  F12, -16(R2)  F16, -24(R2)  F16, -24(R2)  F17, R2, R2, R2  F18, R1, R1, R1, R1, R1, R1, R1, R1, R1, R1 | ED F10, -16(R1) MUL.D  ED F14, -24(R1) MUL.D  ED F4, 0(R2) MUL.D  ED F8, -8(R2) MUL.D  ED F12, -16(R2) ADD.D  ED F16, -24(R2) ADD.D  ESUBUI R2, R2, 32 ADD.D  ESUBUI R1, R1, 32 ADD.D  ED F0, 32(R2) |

S.D F10, 16(R2)

```
BNEZ
                              R1, L
                                         !
!
   !!!
int main()
{
   // initialize the X[i] and Y[i];
   int i;
   for (i=100; i>=0; i--)
        Y[i] = a*X[i] + Y[i];
   return 0;
}
gcc -0 与 -01, -02,
                                            与
                                                                -03
5. :
                      n t2+m
                                                 n
                                                                        n
                                                  与
                          t2
                                                               n*t2
                                                          n*t2+m
                        m
```

6.



!

!

!

!

!

!

| !            |         |         |                   |                             |           |            |             |         |            |     |  |  |
|--------------|---------|---------|-------------------|-----------------------------|-----------|------------|-------------|---------|------------|-----|--|--|
|              |         | Ė       | र्गे              |                             |           |            |             |         |            | !   |  |  |
| !            |         |         |                   |                             |           |            |             |         |            |     |  |  |
|              |         |         |                   |                             |           |            |             |         |            |     |  |  |
|              |         |         |                   |                             |           |            | !           |         |            |     |  |  |
|              |         |         |                   |                             |           |            |             |         |            |     |  |  |
| 7.           |         |         |                   |                             |           |            |             |         |            |     |  |  |
| Intel        | Nehalem | Core    |                   |                             | P6        | Netl       | burst       |         | P6         |     |  |  |
|              |         |         |                   |                             |           |            | cache       | 16      | byte       |     |  |  |
|              | 4 X     | Κ86     |                   |                             | 6         | micro      | -instructio | n       |            | 3   |  |  |
|              | 3       |         |                   | 4                           |           |            | commi       | it      |            |     |  |  |
| 14           |         |         | Instruction queue |                             |           |            |             |         |            |     |  |  |
|              |         | ROB     |                   |                             |           |            |             |         |            |     |  |  |
| Schedule     | r       |         |                   |                             |           | ROB ROB    |             |         |            |     |  |  |
|              |         |         |                   | Unified Reservation Station |           |            |             |         |            |     |  |  |
| 36 Nehalem   |         |         |                   |                             |           | 128 in-fly |             |         |            | y   |  |  |
| Nehalem      |         |         |                   |                             |           |            |             |         |            |     |  |  |
|              |         |         |                   |                             |           |            |             |         | Sa         | ndy |  |  |
| Bridge       |         |         |                   |                             |           |            | Nehalem     |         |            |     |  |  |
|              |         |         |                   |                             |           |            |             | For     | warding    |     |  |  |
|              | 30%     |         | load              | stor                        | e         |            | 与           |         |            |     |  |  |
|              | No      | ehalem  |                   |                             |           | 128        | load        |         | 128        |     |  |  |
| store        |         |         |                   |                             |           | ľ          | MOB Mer     | nory Or | der Buffe  | er  |  |  |
| MOB          |         |         | load              | sto                         | ore       |            |             |         |            |     |  |  |
| MO           |         |         |                   |                             | ad buffer | store b    | uffer Load  |         | 48         |     |  |  |
| store buffer | 36      |         | 10                | oad                         | store     |            |             | 10      | fill buffe | er  |  |  |
| cache        | Neh     | alem    | cac               | che                         |           | cache      | 32KB        | cach    | e 256I     | KΒ  |  |  |
|              |         | cache   | 8MB               |                             | cache     | •          |             |         |            |     |  |  |
|              |         | Nehalem |                   |                             |           |            |             |         |            |     |  |  |
|              |         |         | Nehalem           |                             |           |            |             |         |            |     |  |  |

Branch

!

BTB

Prediction Unit BPU

Nehalem Loop stream detection LSD

与 X86 Netburst Trace cache

X86