| Full Name: |
|------------|
| A-number:  |

# ECE 5720, Fall 2020

## **Take Home 3**

Due: November 3, 2020 (3:00 PM)

#### **Instructions:**

- Write your A-number on top of every sheet.
- Make sure that your exam is not missing any sheets, then write your full name on the front.
- The exam has a maximum score of 20 points. You must show your steps clearly to get any credit. Good luck!

| 1 (10):     |  |
|-------------|--|
| 2 (10):     |  |
| TOTAL (20): |  |

## Problem 1. (4+6 points):

This problem is on a machine with a 5-stage pipeline, and a cycle time of 50ns. Assume that you are executing a program where a fraction f, of all instructions immediately follows a load upon which they are dependent. Also assume that forwarding is enabled in this pipeline, so these instruction cause 1 cycle penalty.

**Part A.** What is the total execution time of N instructions, in terms of f?

**Part B.** In a different implementation of the above pipeline, consider a scenario where the MEMORY stage, along with its pipeline registers, needs 55ns (compared to the 50ns in other stages). There are now two options:

- add another MEMORY stage, so there are MEM1 and MEM2 stages, cycle time remain 50ns; or
- increase the cycle time to 55ns so that the MEMORY stage fits within the new cycle time and the number of pipeline stages remain unaffected.

For a program mix with f fraction of instructions exhibiting a load *immediately* followed by its dependent, when is the first option better than the second? Your answer should be based on the value of f.

## Problem 2. (10 points):

A. Assume a dual issue in-order, 5-stage pipelined microprocessor. This machine is identical to the one we covered in class, except that it can have two instructions proceed through every stage in the pipeline. The ideal CPI of this machine is 0.5. Now this pipelined architecture suffers performance penalty from four distinct sources:

- Cache Miss: this affects load and store instructions (instructions needing memory access). Assume that this takes 250 processor cycles, so 250 bubbles are inserted for every such cache miss.
- Load-use hazard: this affects dependent instructions behind a load. Assume a single cycle penalty for these.
- Branch mis-predictions: This refers to the penalty from branch mis-predictions, and assume a 2 cycle penalty.
- Return instructions: This refers to the penalty from return instructions, and assume a 3 cycle penalty.

Now consider a program composed of 10 billion instructions. Among all instructions, 20% are loads, 8% are stores, 15% are branches, and 2% are return instructions. Among load instructions, assume  $\alpha$  fraction misses in the cache, while  $\beta$  fraction incur load-use penalty. Among store instructions, Z fraction misses in the cache. Furthermore, the branch-misprediction rate is  $\delta$ . Calculate the CPI of the above described machine while executing this program.