CS/ECE 472/572 Midterm 2

ID#: 931615649

Name: Geoffrey Corey

1. (15 points) Use LATEX to typeset this document.

Answer: Done.

2. (10 points) [9.23] When a CPU writes to the cache, both the item in the cache and the corresponding item in the memory must be updated. IF data is not in the cache, it must be fetched from memory and loaded in the cache. If  $t_1$  is the time taken to reload the cache on a miss, show that the effective average access time of the memory system is given by:

$$t_{ave} = ht_c + (1-h)t_m + (1-h)t_1$$

Answer: meow? I have no idea about this problem.

3. (10 points) [9.26] A system has a level 1 cache and a level 2 cache. The hit rate of the level 1 cache is 90%, and the hit rate of the level 2 cache is 80%. An access of level 1 cache requires one cycle, an access of level 2 cache requires four cycles, and an access to main memory requires 50 cycles. What is the average access time?

**Answer:** AMAT = 
$$1_{cyc} + 0.1(4_{cyc} + 0.2(50_{cyc})) = 2.4$$
 cycles.

4. (10 points) [9.35] A 64-bit processors has a 8-MB, four-way set-associative cache with 32-byte lines. How is the address arranged in terms of set, line, and offset bits?

### Answer:

Assumptions: Memory address is fully 64-bit

offset = 5 bits since each line is 32 bytes, line = 16 bits, set = 43 bits.

| Arrangement: | X    | Set     | Line   | Offset |
|--------------|------|---------|--------|--------|
|              | Bits | 63 - 20 | 19 - 4 | 3 - 0  |

5. (7 points) Assume a 64-bit virtual address and a 64-bit physical address. The page size is 4KB. How many entries are there in the page table? Express your answer in powers of 2.

## Answer:

Assume: physical memory is actually 64-bit, and the max of 64-bit physical memory installed, and 1 layer page table.

Number of entries = Size of memory/Page Size = 
$$2^{64}/(4*1024) = 2^{52}$$
.

6. (10 points) [9.8] For the following systems that use a clocked microprocessor, calculate the maximum speedup ratio you could expect to see as h approaches 100%.

a. 
$$t_{cyc} = 20$$
ns,  $t_m = 75$ ns,  $t_c = 15$ ns

b. 
$$t_{cyc} = 20$$
ns,  $t_m = 75$ ns,  $t_c = 25$ ns

c. 
$$t_{cyc} = 10$$
ns,  $t_m = 75$ ns,  $t_c = 15$ ns

### Answer:

- a. Since all operations take place in units of 20ns, cache access is 20ns and main store access is 80ns.  $k_{effective} = 20/80$ , S =  $1/(h \times 0.25 + 1 h) = 4/(4 3h)$ , as h->100%, S = 4.
- b. (same argument as above),  $k_{effective} = 40/80 = 0.50$ ,  $S = 1/(h \times 0.50 + 1 h) = 2/(2-h)$ ,

as h->100%, S = 2.

- c. (same logic as a),  $k_{effective} = 20/80 = 0.25$ , S = 1/(h x 0.25 + 1 h) = 4/(4 3h), as h->100%, S = 4.
- 7. (10 points) [9.12] How is data in main store mapped on to each of the following?
  - a. A direct mapped cache
  - b. A fully associative cache
  - c. A set-associative cache

#### Answer:

- a. Set Line Word
- b. Key Index
- c. Set Line Word
- 8. (10 points) [6.6] Why is clock rate a poor metric of computer performance? What are the relative strengths and weaknesses of clock speed as a performance metric?

**Answer:** using clock rate as a metric of performance is only applicable if you're comparing two different model steppings from the same generation (such as intel Core i7-970 vs intel Core i7-980). However, the clock speed is only how fast something is osculating, not how fast the CPU can execute its instructions or perform integer operations and is not able to compare the performance of two unlike CPUs.

9. (10 points) [6.13] A computer has a set of parameters (from the book), if the average performance of the computer (in terms of CPI) is to be increased by 20% while executing the same instruction mix, what target must be achieved for the cycles per conditional branch instruction?

#### Answer:

| Op                 | Freq (%) | Cycles | CPI  |
|--------------------|----------|--------|------|
| Arith + Logic      | 65%      | 1      | 0.65 |
| Register Load      | 10%      | 5      | 0.5  |
| Register Store     | 5%       | 2      | 0.1  |
| Conditional Branch | 20%      | 8      | 1.6  |

CPI Target = 2.85 \* 0.2 = 2.28

2.28 = 0.65 + 0.5 + 0.1 + 0.2 \* x, where x is the new target cycles per conditional branch.

|x| = 5 cycles per conditional branch.

- 10. (8 points) [6.16] In a particular system, a CPU is used for 78% of the time and a disk drive for 22% of the time. A designer has two options:
  - a. Improve the disk performance by 40% and the CPU performance by 20%
  - b. Improve disk performance by 10% and CPU performance by 80%. Which is a better option and why?

Answer: It is infinitely easier to increase the performance of a CPU than it is for increasing the performance of storage media. there are two cases to this question: Is the system being limited by the CPU or is it being limited by the storage media access. If the limiting factor is the speed of the CPU, the best option would be to go with b. If the limiting factor of this system is the media access, then option a would be the better (but also infeasible) choice since it would allow the system to utilize the CPU with more efficiency.

11. (10 points) [6.18] A system has a single core processor that costs \$150. Suppose that adding more cores to the chis costs \$10 per additional processor. (*Note:* For this system, the value of  $f_s$  is 0.10) If it is considered worthwhile adding cores until the incremental speedup ratio increases by less than 5% over the original (unmodified) performance, what is the optimum number of processors? What percentage increase in cost is required to achieve this performance?

# Answer:

Case: Unmodified = original single core.

| Number of Cores (p) | Speedup (S) | Difference ( |
|---------------------|-------------|--------------|
| 1                   | 1           | +0           |
| 2                   | 1.818       | +55%         |

Adding only a second core achieves this result. Percentage cost increase: 6.6%

Case: Unmodified = previous core.

| Number of Cores (p) | Speedup (S) | Difference (over previous number of cores) |
|---------------------|-------------|--------------------------------------------|
| 1                   | 1           | +0%                                        |
| 2                   | 1.818       | +81%                                       |
| 3                   | 2.5         | +55%                                       |
| 4                   | 3.077       | +23.08%                                    |
| 5                   | 3.5714      | +16.06%                                    |
| 6                   | 4.0         | +12%                                       |
| 7                   | 4.375       | +9.375%                                    |
| 8                   | 4.705       | +7.54%                                     |
| 9                   | 5.0         | +6.27%                                     |
| 10                  | 5.025       | +5%                                        |

Adding 10 cores gets this result. Percentage cost increase: 66.6%