```
1 # OrderRewrite_0
2 #
    arguments:
3 #
        $a0: order
4 #
        $a1: base
5 #
        $a2: output
6 #
7 OrderRewrite_0:
8
       add
                $t0, $0, $0
                                      # i <- 0
9
       addi
                $t1, $0, -1
                                      # t1 <- sentinel
10 next:
11
                $t2, $t0, 2
                                      # t2 <- 4*i
       sll
                $t3, $a0, $t2
12
       add
                                      # t3 <- &(order[i])
13
                $t4, 0($t3)
                                       # t4 <- order[i]
       lw
14
                $t4, $t1, done
                                      # if order[i] == sentinel
       beq
                $t5, $t4, 2
15
       sll
                                       # t5 <- 4*order[i]
                $t6, $a1, $t5
                                      # t6 <- &(base[order[i]])
16
       add
17
                $t7, 0($t6)
                                      # t7 <- base[order[i]]</pre>
       lw
18
       add
                $t5, $a2, $t2
                                      # t5 <- &(base[i])
19
                $t7, 0($t5)
                                      # output <- base[order[i]]</pre>
       SW
                $t0, $t0, 1
20
       addi
                                      # i <- i+1
21
       j
                next
22 done:
23
       jr
                $ra
```

Figure 1: OrderRewrite — Optimization level 0.

Question 3 (20 points): The new MIPS processor also has store instructions with the same addressing mode and similar assembly syntax as the load instructions, in particular it has a store word indexed instruction swi. In this question we will study the performance of the subroutine OrderRewrite that receives three arguments that are the base address of three vectors: order, base, and output. The vector order contains a list of positions of the vector base. A sentinel value of -1 signals the end of this list. OrderRewrite writes into the vector output the values found in the vector base in the order indicated by the vector order. Assume that the sentinel value -1 appears in position N of the vector order. Whenever necessary, provide answers to the questions below in terms of N.

You will analyze two versions of OrderRewrite. The first version, called OrderRewrite\_O, shown in Figure 1, was produced by a basic compiler that is unaware of the existence of the lwi and swi instructions. This compiler also does not perform sophisticated optimizations. The second version, OrderRewrite\_3, shown in Figure 2, was produced by a newer optimizing compiler that is able to use lwi and swi. In this version the compiler also restructured the code to avoid the additional jump instruction at the end of the loop.

The average number of clock cycles needed to execute each of the instructions used in the two versions of OrderRewrite is given in Table 1.

a. (4 points) Complete the table below. To compute the CPI, assume that N is very large:

```
1 # OrderRewrite_3
2 # arguments:
3 #
        $a0: order
4 #
        $a1: base
5 #
        $a2: output
6 #
7 OrderRewrite_3:
8
       addi
               $t1, $0, -1
                                     # tm <- sentinel
9
       add
               $t0, $0, $0
                                     # index <- 0
               $t2, 0($a0)
                                     # t2 <- order[0]
10
       lw
11
       beq
               $t2, $t1, done
                                     # if order[0] == sentinel
12 next:
13
       sll
               $t3, $t2, 2
                                     # t3 <- 4*(order[i])
14
       lwi
               $t4, ($t3,0)($a1)
                                     # t4 <- base[order[i]]</pre>
15
               $t4, ($t0,0)($a2)
                                     # output <- base[order[i]]</pre>
       swi
16
       addi
               $t0, $t0, 4
                                     # index <- index + 4</pre>
               $t2, ($t0,0)($a0)
17
       lwi
                                     # t2 <- order[i]
                                     # if order[i] == sentinel
18
       bne
               $t2, $t1, next
19 done:
20
       jr
               $ra
```

Figure 2: OrderRewrite — Optimization level 3.

| Instruction | Cycles | Instruction | Cycles | Instruction | Cycles |
|-------------|--------|-------------|--------|-------------|--------|
| add         | 1      | lw          | 4      | beq         | 3      |
| addi        | 1      | sw          | 4      | bne         | 3      |
| sll         | 1      | swi         | 4      | j           | 2      |
| sub         | 1      | lwi         | 4      | jr          | 2      |

Table 1: Average number of clocks to execute instructions of each type in both versions of the MIPS processor.

| Subroutine     | Number of Instructions Executed | Average Cycles per Instruction (CPI) |
|----------------|---------------------------------|--------------------------------------|
| OrderRewrite_O |                                 |                                      |
| OrderRewrite_3 |                                 |                                      |



c. (4 points) For a given execution of OrderRewrite\_3 N=1,000,000 and it takes 9.375 milliseconds to execute the subroutine (1 millisecond = 0.001 second). What is the frequency of execution of this processor expressed in GHz (1 GHz =  $10^9$  Hz).

d. (4 points) Still assuming that both subroutines are running on the same processor, for which value of N will the execution of OrderRewrite\_O take 9.375 milliseconds?