#### **ECE 341 Final Exam**

|                           | Name: |  |
|---------------------------|-------|--|
| Time allowed: 110 minutes |       |  |
| Total Points: 100         |       |  |
| Points Scored:            |       |  |

### Problem No. 1 (10 points)

For each of the following statements, indicate whether the statement is **TRUE** or **FALSE**. Each correct answer carries 2 points. The answer for the last statement counts towards extra credit.

- (a) It is undesirable to use latches for building circuits that involve counters and shift registers.
- (b) Using operand forwarding in the 5-stage pipelined processor eliminates all the pipeline stalls caused by data hazards.
- (c) A write-back cache typically requires more writes to the main memory than a write-through cache.
- (d) A Translation Lookaside Buffer (TLB) acts as a cache for the page table.
- (e) Booth algorithm is most efficient when the multiplier has an alternating sequence of 1s and 0s.
- (f) **(Extra credit question)** When using port-mapped I/O, I/O devices typically need fewer address lines as compared to using memory-mapped I/O.

## Problem No. 2 (12 points)

Multiple possible answers are provided for each of the following questions. Only one answer is correct in each case. Encircle the correct answer for each question. Each correct answer carries  $\underline{4}$  points. The answer for the last question counts towards extra credit.

- (a) Consider a 16M x 128 memory built by using 512K x 16 memory chips. How many rows of memory chips are needed?
  - i. 8
  - ii. 32
  - iii. 64
  - iv. 256
- (b) A 16-bit **blocked carry-lookahead adder (CLA)** composed of four 4-bit CLA blocks is used to add two numbers X ( $x_{15}x_{14}x_{13}$ ..... $x_1x_0$ ) and Y ( $y_{15}y_{14}y_{13}$ .... $y_1y_0$ ). Under which of the following conditions is the carry-out  $c_{12}$  equal to 0?
  - i.  $c_0 = 0$ , all the CLA blocks generate a carry, but none of the CLA blocks propagate a carry
  - ii.  $c_0 = 0$ , all the CLA blocks propagate a carry, but none of the CLA blocks generates a carry
  - iii.  $c_0 = 1$ , all the CLA blocks propagate a carry, but none of the CLA blocks generates a carry

- (c) A processor uses 46-bit virtual addresses with 2 MB pages. Which bits in the virtual address correspond to the "offset" field?
  - i. The most significant 34 bits
  - ii. The most significant 25 bits
  - iii. The least significant 12 bits
  - iv. The least significant 21 bits
- (d) **(Extra credit question)** A **non-pipelined** 5-stage RISC processor running at 3 GHz is used to execute a program P<sub>1</sub>. 40% of the instructions in P<sub>1</sub> are Load or Store instructions. Assume that the processor does not use a cache. All the instruction fetch and data read/write requests are served by main memory with a fixed latency of 6 clock cycles. How many instructions are completed by the processor in 10 seconds?
  - i. 2.5 billion
  - ii. 1.5 billion
  - iii. 100 million

#### Problem No. 3 (18 points)

Consider the following sequence of instructions being processed on the **pipelined** 5-stage RISC processor discussed in class:

Add R4, R2, R3 Store R5, #100(R4) Load R6, #200(R4) Subtract R7, R5, R6

(a) **(6 points)** Identify all the data dependencies in the above instruction sequence. For each dependency, indicate the two instructions and the register that causes the dependency.

| (b) | (6 points) Assume that the pipeline does not use operand forwarding. Also assume that the only   |
|-----|--------------------------------------------------------------------------------------------------|
|     | sources of pipeline stalls are the data hazards. Draw a diagram that represents instruction flow |
|     | through the pipeline during each clock cycle. How long does it take for the instruction sequence |
|     | to complete?                                                                                     |
|     |                                                                                                  |

(c) **(6 points)** Now, assume that the pipeline <u>uses</u> operand forwarding. There are separate forwarding paths from the outputs of stage-3 and stage-4 to the input of stage-3. Draw a diagram that represents the flow of instructions through the pipeline during each clock cycle. Indicate operand forwarding by arrows.

#### **Problem No. 4 (12 points)**

(a) **(6 points)** The 5-stage RISC processor discussed in class is used to execute the following sequence of instructions, one after the other:

Instruction 1: Add R4, R2, R3
Instruction 2: Load R5, #100(R4)
Instruction 3: Call\_Register R7

The processor data path with all the control signals is shown in the following figure:



Write down the values of the following control signals for each of the three instructions:

- I. **Y\_select** during stage-4 of instruction processing
- II. **RF\_write** during stage-5 of instruction processing
- III. **C\_select** during stage-5 of instruction processing

(b) (6 points) Assume that the initial contents of registers R2, R3 and R4 are 200, 400 and 500, respectively. Also assume that the initial contents of memory addresses 500, 600 and 700 are 10, 20 and 30, respectively. Write down the contents of <u>inter-stage register RZ</u> after the completion of stage-3 for instructions 1 and 2. Also write down the contents of <u>register RS</u> after the completion of above instruction sequence.

# Problem No. 5 (15 points)

(a) **(6 points)** You are required to design a 2-bit synchronous counter by using a finite state machine. The counter has three external input signals *r*, *x*, and *y*, which dictate the operation of the counter as follows: (i) If exactly one of the three inputs is a zero, the counter counts <u>up</u>, (ii) If exactly two of the three inputs are zeroes, the counter counts <u>down</u>, (iii) If all the three inputs have identical values, the counter is set to a count of "2", irrespective of its previous state. The counter also has an output signal *z*, which is equal to 1 only if the present value of the counter is an odd number. Draw the state diagram for this state machine.

(b) **(4 points)** A 4-bit carry-looakhead adder (CLA) is used to add two numbers X = 1101 and Y = 0101 with an external carry-in  $c_0 = 1$ . Compute the value of  $c_4$  by using the carry-lookahead equations.

(c) **(5 points)** The input waveforms for a <u>positive edge-triggered</u> JK flip flop are shown in the following figure. Assume that each waveform starts at time = 0 and output Q of the flip-flop has a logic value of "0" at time = 0. What is the logic value of output Q at the following points of time: (i) 18 ns, (ii) 35 ns, (iii) 50 ns?



### Problem No. 6 (15 points)

A 5-stage pipelined RISC processor  $C_I$  running at 3 GHz is used to execute a program  $P_I$ . The instruction statistics for  $P_I$  are as follows:

Branches: 20% Loads: 20% Stores: 10%

Arithmetic Instructions: 50%

Assume that the program  $P_I$  has no data dependencies.  $C_I$  uses a dynamic branch predictor and a branch target buffer to predict the branch instructions. The computation of actual branch outcomes is carried out in the "compute" stage (stage-3). Also assume that  $C_I$  uses a cache such that 100% of the instruction fetches and data writes hit in the cache, whereas 95 % of the data reads hit in the cache. The penalty to access the main memory for a cache miss is 15 cycles.

To execute the program completely, the processor  $C_I$  needs to run 12 billion instructions. A customer requires that the program must be completed in less than 5 seconds. What is the minimum branch prediction accuracy which would satisfy this requirement?

# Problem No. 7 (18 points)

(a) (6 points) A computer system uses 36-bit memory addresses. It has a 16M-byte 16-way set-associative cache, with 128 bytes per cache block. Assume that the size of each memory word is 1 byte. Calculate the number of bits in each of the *Tag*, *Set*, and *Word* fields of the memory address.

(b) (12 points) A processor that uses 10-bit memory addresses has a small 2-way set-associative cache capable of holding four cache blocks. Each cache block consists of 16 words. The cache uses the least-recently-used (LRU) replacement algorithm. Assume that the initial tag values for each cache block are as follows:

| Set   | 0     |         | 1       |       |
|-------|-------|---------|---------|-------|
| Block | 0     | 1       | 0       | 1     |
| Tag   | 00001 | Invalid | Invalid | 00010 |

The processor reads data sequentially from the following decimal addresses: 36, 10, 80, 64, 36

For each of the above addresses, indicate whether the cache access will result in a hit or a miss.