# COMP 230: Computer Architecture and Organization December 03, 2017 EXAM 3 (PRACTICE)

#### Instructions:

- This practice exam is meant to give you the flavor of questions that will be asked on the exam. Do not expect the real exam to be the same questions with only the numbers or MIPS instructions changed.
- The final exam is cumulative, so there are too many topics to be represented on any reasonably-sized practice exam. Expect that there will be topics on the final exam that are barely covered or not covered at all in this practice exam. For example, virtual memory is only briefly covered on this practice exam...
- Your real exam will be open book and notes. You are allowed to use calculators but not laptops, cell phones, or other electronics.
- If you do not show your work, do not expect partial credit for incorrect answers.
- If you believe a problem is incorrectly or incompletely specified, make a reasonable assumption and solve the problem. The assumption should not result in a trivial solution.
- In all cases, clearly state any assumptions you make in your answers.

|--|

| Part  | Description      | Points Possible | Grade |
|-------|------------------|-----------------|-------|
| 1     | Multiple choice  | 15              |       |
| 2     | Short Answer     | 25              |       |
| 3     | Arithmetic       | 20              |       |
| 4     | Pipelines        | 20              |       |
| 5     | Memory Hierarchy | 40              |       |
| 6     | Parallelism      | 10              |       |
| Total |                  | 130             |       |

## 1 Multiple Choice (15 points)

| 1. | [3 points] Processor A and Processor B have the same I help you decide which processor is faster?                 | SA and clock speed. What additional metric(s) can     |
|----|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------|
|    | (a) instructions per second                                                                                       |                                                       |
|    | (b) IPC                                                                                                           | Your Answer                                           |
|    | (c) Either A or B                                                                                                 | Tour Answer                                           |
|    | (d) None of the above                                                                                             |                                                       |
| 2. | [3 points] Processor A has twice the clock speed (half the same ISA and compiler. What information do you need to | - ,                                                   |
|    | (a) The number of instructions executed ( $dynamic$ instruction count)                                            |                                                       |
|    | (b) IPC                                                                                                           | Your Answer                                           |
|    | (c) Both A and B                                                                                                  |                                                       |
|    | (d) We already have enough information to decide.                                                                 |                                                       |
| 3. | [3 points] Which of the following techniques would $not$ be experiences?                                          | nelp reduce the number of conflict misses your cache  |
|    | (a) Increasing the block size                                                                                     |                                                       |
|    | (b) Increasing the cache capacity                                                                                 | Your Answer                                           |
|    | (c) Increasing the cache associativity                                                                            | Totti Aliswei                                         |
|    | (d) All of the above reduce conflict misses                                                                       |                                                       |
| 4. | [3 points] Which of the following does a cache designer $n$                                                       | ot have control over?                                 |
|    | (a) The number of bits in the offset                                                                              |                                                       |
|    | (b) The number of bits in the page offset                                                                         | Your Answer                                           |
|    | (c) The number of bits in the index                                                                               | Totti Aliswei                                         |
|    | (d) The associativity of the cache                                                                                |                                                       |
| 5. | [3 points] x86 is a very complicated ISA that is quite difficult this complexity in their implementations?        | icult to implement efficiently. How does Intel reduce |
|    | (a) They use a really long pipeline with no forwarding                                                            |                                                       |
|    | (b) They decrease the clock speed                                                                                 |                                                       |
|    | (c) They don't; they just hire really good chip designers and pay them a lot of money.                            | Your Answer                                           |
|    | (d) During execution they translate complex instructions into simpler instructions.                               |                                                       |

#### 2 Short Answer (25 points)

6. [5 points] You are told that processor A has a CPI of 1.5 when running program 1. When you run program 2 on processor A, the CPI is 2.0. What might account for the difference?

7. [5 points] Do TLB misses and page faults occur independently of one another? Explain.

8. [5 points] With regard to multiple-issue processors, give one reason why static scheduling is better than dynamic scheduling. Only your first answer will be graded.

| 9. | . [5 points] The MIPS ISA is relatively simple to implement with a pipeline by design — all Al | LU operations |
|----|------------------------------------------------------------------------------------------------|---------------|
|    | work only on register operands. Give one way in which our typical                              |               |

$$\text{Fetch} \rightarrow \text{Decode} \rightarrow \text{Execute} \rightarrow \text{Memory} \rightarrow \text{Write-back}$$

pipeline must change if want to perform ALU operations directly on memory operands.

10. [5 points] Give two reasons to use virtual memory.

## 3 Arithmetic (20 points)

| 11. | [10 points] Write MIPS assembly code to for a procedure vmult that takes in three pointers (to arrays of integers all the same size) and an integer (the size of the arrays). This procedure should multiply each element of the first array be the corresponding element of the second array, and put the result into the corresponding element in the third array. In other words, if $A, B$ , and $C$ represent out arrays, our procedure sets $C[i] = A[i] * B[i]$ for all valid $i$ . |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     | This procedure should return an integer. If any of the multiplications overflow a 32-bit register, the procedure should immediately return 0. Otherwise, return 1.                                                                                                                                                                                                                                                                                                                         |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| 12. | Consider an 8-bit <i>fixed</i> point representation for fractional numbers. The first 3 bits represent the (signed integer part, while the last 5 bits represent the fractional component.                                                                                                                                                                                                                                                                                                 |
|     | (a) [3 points] How many different objects can be represented?                                                                                                                                                                                                                                                                                                                                                                                                                              |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     | (b) [3 points] How many different <i>real</i> numbers are represented in this format?                                                                                                                                                                                                                                                                                                                                                                                                      |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|     | (c) [4 points] What is the smallest positive (and non-zero) number that can be represented?                                                                                                                                                                                                                                                                                                                                                                                                |
|     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

### 4 Pipelines (20 points)

13. [10 points] Figure 1 shows the datapath for our MIPS pipeline, minus the instruction fetch stage. I've shown some forwarding writes in color, allowing us to remove some stalls. Below each stage is an instruction currently in progress in that stage.



Figure 1: MIPS pipeline stages with forwarding.

For each input specified below, state where the instruction gets its data — one of the forwarding lines (write the letter of the line shown in figure 1), or the register file (write RF).

- 14. [10 points] Give a series of instructions that would require a pipeline stall, i.e. the forwarding shown above would not help resolve the dependency.

## 5 Memory Hierarchy (40 points)

15. [13 points] Complete the following trace of a 2-way set associative cache with 2B cache lines and an LRU replacement policy. The LRU column stores which way was most recently accessed.

**Pay Attention:** For an access on row n, first state on row n whether the access results in a hit or miss. If a miss occurs, say which type (capacity, conflict, or cold). Then, if the access changes the contents of the cache, show that change on row n+1. Update only the fields in the cache that change.

The accesses before the question started were the addresses of the data: N, U, G, L, in that order.

|     |      | Set 0 |     |      | Set 1 Cache acce |      |     | ne access |      |       |      |        |        |
|-----|------|-------|-----|------|------------------|------|-----|-----------|------|-------|------|--------|--------|
| wa  | ay 0 |       | wa  | ay 1 | wa               | ay 0 |     | wa        | ay 1 |       |      |        |        |
| tag | data | LRU   | tag | data | tag              | data | LRU | tag       | data | addr  | data | H or M | M type |
| 011 | M N  | 0     | 101 | UV   | 010              | KL   | 1   | 001       | GH   | 00001 | В    |        |        |
|     |      |       |     |      |                  |      |     |           |      | 10111 | X    |        |        |
|     |      |       |     |      |                  |      |     |           |      | 00111 | Н    |        |        |
|     |      |       |     |      |                  |      |     |           |      | 01101 | N    |        |        |

Data in memory:

| address | data |
|---------|------|
| 00000   | A    |
| 00001   | В    |
| 00010   | С    |
| 00011   | D    |

| 00100 | E |
|-------|---|
| 00101 | F |
| 00110 | G |
| 00111 | Н |
| 01000 | I |

| 01001 | J |
|-------|---|
| 01010 | K |
| 01011 | L |
| 01100 | M |
| 01101 | N |

| 01110 | О |
|-------|---|
| 01111 | Р |
| 10000 | Q |
| 10001 | R |
| 10010 | S |

| 10011 | Τ |
|-------|---|
| 10100 | U |
| 10101 | V |
| 10110 | X |
|       |   |

| W |
|---|
| Y |
| Z |
|   |

16. [12 points] The following table gives the parameters for a number of different caches. For each cache, fill in the missing fields in the table. Let m be the number of physical address bits, C be the total number of bytes in the cache, B be the block size (in bytes), E be the associativity, E be the number of cache sets, E be the number of tag bits, E be the number of block offset bits.

| $\underline{m}$ | C    | B  | E   | S | t | s | b |
|-----------------|------|----|-----|---|---|---|---|
| 32              | 1024 | 4  | 4   |   |   |   |   |
| 32              | 1024 | 4  | 256 |   |   |   |   |
| 32              | 1024 | 8  | 1   |   |   |   |   |
| 32              | 1024 | 8  | 128 |   |   |   |   |
| 32              | 1024 | 32 | 1   |   |   |   |   |
| 32              | 1024 | 32 | 4   |   |   |   |   |

17. Consider a web server that records a log of each access to a particular web page. Each entry uses a structure as shown on the left of figure 2, and each entry is later processed with the function described on the right of figure 2.

Figure 2: Web server code.

- (a) [5 points] Given a cache with 64-byte blocks, how many cache misses does latest10\_hits incur for each entry? Assume each entry is accessed exactly once.
- (b) [10 points] How can you reorganize the entry data structure to reduce cache misses? Show your new structure definition code. (You do NOT need to calculate the number of cache misses with this change.)

## 6 Parallelism (10 points)

18. [10 points] Your coworker has developed a new (sequential) algorithm for matrix multiplication, but it is not fast enough. You are able to parallelize  $\frac{3}{4}$  of its execution time, independent of the size of the input matrices. How many cores do you need to achieve a 3x speedup over a single-core execution?