# Department of Electrical and Computer Engineering The University of Texas at Austin

EE 460N Spring 2014 Aater Suleman, Instructor Stephen Pruett, Jay Patel, Chirag Sakhuja, TAs Exam 1 February 26, 2013

| Name:                                                                                                                                                                           |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                                                 |
| Problem 1 (25 points):                                                                                                                                                          |
| Problem 2 (15 points):                                                                                                                                                          |
| Problem 3 (10 points):                                                                                                                                                          |
| Problem 4 (25 points):                                                                                                                                                          |
| Problem 5 (25 points):                                                                                                                                                          |
| Total (100 points):                                                                                                                                                             |
|                                                                                                                                                                                 |
|                                                                                                                                                                                 |
| You have 75 minutes to take this exam.  Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided. |
| Note: Please be sure your name is recorded on each sheet of the exam.                                                                                                           |
| Please sign the following. I have not given nor received any unauthorized help on this exam.                                                                                    |
| Signature:                                                                                                                                                                      |

| Name:                                                                                                                                                                                                                                                                                                                                                                      |     |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Problem 1 (25 points):                                                                                                                                                                                                                                                                                                                                                     |     |
| <b>Part a (5 points):</b> On a chip you designed, the two longest paths, P1 and P2, take 7 ns and 8 ns respectively. The reamining paths take only 2 ns each. For the same amount of engineering effort, you can <i>either</i> shrink P2 alone by ns <i>or</i> shrink both P1 and P2 by 2 ns each. Which optimization will provide higher performance? Explain your answer | 5   |
|                                                                                                                                                                                                                                                                                                                                                                            |     |
| Part b (5 points): An aggie hired by Little Computer Inc to design the memory circuit for LC-3b made an error: F connected MAR[3] to memory address pin 15 on the memory and connected MAR[15] to memory address pin 3 of                                                                                                                                                  |     |
| the memory. Will this microarchitecture work?                                                                                                                                                                                                                                                                                                                              | /11 |
| Circle one: Yes / No                                                                                                                                                                                                                                                                                                                                                       |     |
| Specify the range of memory addresses, if any, for which this microarchitecture will fail.                                                                                                                                                                                                                                                                                 |     |
|                                                                                                                                                                                                                                                                                                                                                                            |     |
|                                                                                                                                                                                                                                                                                                                                                                            |     |
| Part c (5 points): Software can choose to ignore a particular interrupt by clearing the                                                                                                                                                                                                                                                                                    | oit |
| corresponding to the interrupt.                                                                                                                                                                                                                                                                                                                                            |     |

| Name:                                                                                                                                                                                                                 |                 |                 |              |                |               |          |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|-----------------|--------------|----------------|---------------|----------|
| Problem 1 continued                                                                                                                                                                                                   |                 |                 |              |                |               |          |
| Part d (5 points): When an exception is detected, the exception service routine can execute. An aggie wants to embark on a project to optimize exception he move on to the other pending tasks? Explain your answers. | worki<br>andlir | ng at Little    | Computer Inc | finds this     | rollback wast | eful and |
|                                                                                                                                                                                                                       |                 |                 |              |                |               |          |
|                                                                                                                                                                                                                       |                 |                 |              |                |               |          |
|                                                                                                                                                                                                                       |                 |                 |              |                |               |          |
|                                                                                                                                                                                                                       |                 |                 |              |                |               |          |
|                                                                                                                                                                                                                       |                 |                 |              |                |               |          |
|                                                                                                                                                                                                                       |                 | ĺ               |              |                |               |          |
| Part e (5 points): Although a                                                                                                                                                                                         | cache is pro    | one to cache co | onsistency i | ssues, such is | sues can      |          |
| be prevented if we can ensure that the sum of the numb                                                                                                                                                                | oer of          |                 |              | bits and       |               |          |
| bits is less than or equal to the number of                                                                                                                                                                           |                 |                 | bits.        | _              |               |          |

| Name:                                                                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Problem 2 (15 points):                                                                                                                                              |
| Answer parts (a-d) for a PIPT, write-back cache. Assume:                                                                                                            |
| • Physcial address is 16-bits.                                                                                                                                      |
| • The tag store entry is 1 byte.                                                                                                                                    |
| • There are no unused bits in the tag store entry.                                                                                                                  |
| • Cache block size is 8 bytes.                                                                                                                                      |
| • The cache uses a tree replacement policy that requires 7 bits.  Hint: In the tree replacement policy, the replacement bits are not a part of the tag store entry. |
| Part a (2 points): What is the cache's associativity?                                                                                                               |
|                                                                                                                                                                     |
|                                                                                                                                                                     |
| Part b (3 points): How many bits of the physical address are used as the index?                                                                                     |
|                                                                                                                                                                     |
| Part c (5 points): How big is the data store? Answer in bytes.                                                                                                      |
|                                                                                                                                                                     |
| Part d (5 points): How big is the tag store? Answer in bits.                                                                                                        |
|                                                                                                                                                                     |
|                                                                                                                                                                     |

| Name: |
|-------|
|-------|

## Problem 3 (10 points):

We show two slightly different memory systems below. Please refer to them when answering the questions on the next page.

Note: The CE signal indicates whether a chip should be enabled for an access or not.



Figure 1: Memory system M1



Figure 2: Memory system M2

| Name:                                                                                                                           |
|---------------------------------------------------------------------------------------------------------------------------------|
| Problem 3 continued:                                                                                                            |
| Assume:                                                                                                                         |
| • The logic blocks contain all the necessary gates and latches to complete the memory accesses.                                 |
| • The logic blocks are implemented to maximize performance.                                                                     |
| • Each memory chip takes 100 cycles to access.                                                                                  |
| • The system only performs reads (and no write operations).                                                                     |
| Part a (5 points): How many cycles will it take to execute the data load (not instruction fetch) of LDW R0, x3000 c M1 and M2?  |
| Cycles with M1:                                                                                                                 |
| Explain in less than 20 words.                                                                                                  |
|                                                                                                                                 |
| Cycles with M2:                                                                                                                 |
| Explain in less than 20 words.                                                                                                  |
|                                                                                                                                 |
| Part b (5 points): How many cycles will it take to execute the data load (not instruction fetch) of LDW R0, x3001 of M1 and M2? |
| Cycles with M1:                                                                                                                 |
| Explain in less than 20 words.                                                                                                  |
|                                                                                                                                 |
| Cycles with M2:  Explain in less than 20 words.                                                                                 |
| Explain in less than 20 words.                                                                                                  |

#### Problem 4 (25 points):

We have added an 8-entry, 2-way write-through PIPT cache and an x86 style 2-level virtual-to-physical translation scheme to the LC-3b. Each virtual address is 16 bits. The most significant 5-bits, VA[15:11], are used to access the system page table, the next 5 bits, VA[10:6], are used to access the process page table, and least significant 6-bits, VA[5:0], are the the byte in page bits. The PTE only contains a valid bit as its most significant bit and the PFN as its least significant bits. All remaining bits are reserved and always set to zeros.

| VA  |   | 5       |  | 5 | I |   | T   | Τ | 6 | Τ |  |
|-----|---|---------|--|---|---|---|-----|---|---|---|--|
|     |   |         |  |   |   |   |     |   |   |   |  |
| PTE | V | 0 0 0 . |  |   | 0 | ] | PFN |   |   |   |  |

#### **Notes:**

- 1. Physical address is 10 bits.
- 2. The cache implements a perfect LRU replacement policy.
- 3. The cache block size is 4 bytes.
- 4. Instructions and data share the same cache.
- 5. The cache is initially empty.
- 6. The memory accesses to system page table, the user page table, and user pages are **all** cached.
- 7. No page faults happen during the execution of this instruction.
- 8. Only for the sake of this problem, assume that the LC-3b is **big endian** (most significant byte is in least significant address) and does not permit unaligned accesses.

The PC is loaded with address x3184 and one instruction is run to completion. The figure below shows the contents of the cache after the end of this instruction. **Your job:** Use this data to answer the questions on the next next page.

| LRU | V0 | Tag0   | V1 | Tag1   |
|-----|----|--------|----|--------|
| 0   | 0  | 101100 | 0  | 010010 |
| 1   | 1  | 011100 | 0  | 011100 |
| 0   | 0  | 011000 | 0  | 110110 |
| 0   | 1  | 111100 | 1  | 110000 |

|     | Da  | ata0 |     |     | Dat | ta1 |     |
|-----|-----|------|-----|-----|-----|-----|-----|
| x80 | x70 | xAE  | xFD | x80 | x06 | x80 | x77 |
| x60 | x43 | x59  | xAE | xFD | xE9 | x46 | x57 |
| x53 | x74 | x65  | x70 | x68 | x65 | х6Е | x21 |
| x80 | x0C | x00  | x0F | x80 | x07 | x80 | x06 |

#### PROBLEM IS CONTINUED ON THE NEXT PAGE!!!

<sup>\*</sup>Cache after execution of first instruction

<sup>\*</sup>The order of bytes in the data store is byte0 (00), byte1 (01), byte2 (10), byte3 (11).

<sup>\*</sup>V0, Tag0, and Data0 correspond to Way 0 and V0, Tag1, and Data1 correspond to Way 1.

<sup>\*</sup>If the LRU bit is 0, the data in Way 0 will be evicted next.

| Name:                                           |                                   |                        |                          |                |
|-------------------------------------------------|-----------------------------------|------------------------|--------------------------|----------------|
| Problem 4 continued                             |                                   |                        |                          |                |
| Part a (2 points): How bi                       | g is each page (in bytes)?        |                        |                          |                |
|                                                 |                                   | Answer:                |                          |                |
| Part b (3 points): What is                      | s the size of a PTE (in bytes)?   |                        |                          |                |
|                                                 |                                   | Answer:                |                          |                |
| Part c (5 points): What is                      | s the value of SBR?               |                        |                          |                |
|                                                 |                                   | Answer:                |                          |                |
| Part d (10 points): Fill in be left blank.      | the table in the order of physica | al memory accesses. N  | lote that some rows in t | his table may  |
| Physical Address                                | Data                              | Explanation of         | data                     |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
|                                                 |                                   |                        |                          |                |
| Part e (5 points): What i will get full credit) | s the value of R0 and R1? (The    | re are two possible an | swers to this question a | and either one |
|                                                 |                                   | R0:                    |                          |                |
|                                                 |                                   | R1:                    |                          |                |

| Name: |  |
|-------|--|
|       |  |

## Problem 5 (25 points):

We are adding a new instruction, ADD32, to the LC-3b ISA. ADD32 adds two 32-bit values. One of the operands is loaded from memory and the other operand is obtained by concatenating two registers. The final answer is stored to memory. ADD32 works as follows:

#### **Assembler Format**

ADD32 DR, SR1, SR2H, SR2L

#### **Notation**

SR1: Pointer to a source; one of R0..R7 which species the *address* of the memory location from which the first 32 bit operand is obtained.

SR2H: Source register; one of R0..R7 which specifies the register from which the high 16-bits of the second operand are obtained SR2L: Source register; one of R0..R7 which specifies the register from which the low 16-bits of the second operand are obtained DR: Pointer to destination; one of R0..R7 which species the *address* of the memory location where the result of the 32-bit add must be written to.

## **Encodings**



## **Operation**

$$\label{eq:mem} \begin{split} \text{MEM}[DR+2]@\text{MEM}[DR] &= \text{MEM}[SR1+2]@\text{MEM}[SR1] + SR2H@SR2L;\\ \text{setCC();} \end{split}$$

### PROBLEM IS CONTINUED ON THE NEXT PAGE!!!

<sup>\*</sup>The @ symbolizes bit concatenation.

## **Problem 5 continued:**

Your job Implement ADD32 by modifying the datapath and the state diagram.

**Part a (12 points):** Complete the state diagram to implement ADD32 by filling in the shaded bubbles and state assignment for the last state.

**HINT:** You may find it helpful to look at part b while solving part a.



Figure 3: State diagram for the LC-3b

| Name:  |  |  |
|--------|--|--|
| ranic. |  |  |

## **Problem 5 continued:**

**Part b (13 points):** Below is the relevant portion of the LC-3b datapath. Two new blocks X and Y have been added. Additionally, the ALU has been modified to output a carry bit. The carry bit is saved in the "C" condition code register. **Note:** the ALU is still only 16-bits wide.



# **Problem 5, Part b continued:**

Complete the logic for blocks X and Y below by filling in *only* the three shaded boxes.







Figure 4: LC-3b Instruction Encodings

Table 1: Data path control signals

|                                                                                      | Table 1: Data path control signals                                                                           |                                                                                                    |  |  |  |
|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|--|--|--|
| Signal Name                                                                          | Signal Values                                                                                                |                                                                                                    |  |  |  |
| LD.MAR/1:<br>LD.MDR/1:<br>LD.IR/1:<br>LD.BEN/1:<br>LD.REG/1:<br>LD.CC/1:<br>LD.PC/1: | NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD( | 1)<br>1)<br>1)<br>1)<br>1)<br>1)                                                                   |  |  |  |
| GatePC/1:<br>GateMDR/1:<br>GateALU/1:<br>GateMARMUX/1:<br>GateSHF/1:                 | NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)                            |                                                                                                    |  |  |  |
| PCMUX/2:                                                                             | PC+2(0)<br>BUS(1)<br>ADDER(2)                                                                                | ;select pc+2<br>;select value from bus<br>;select output of address adder                          |  |  |  |
| DRMUX/1:                                                                             | 11.9(0)<br>R7(1)                                                                                             | ;destination IR[11:9]<br>;destination R7                                                           |  |  |  |
| SR1MUX/1:                                                                            | 11.9(0)<br>8.6(1)                                                                                            | ;source IR[11:9]<br>;source IR[8:6]                                                                |  |  |  |
| ADDR1MUX/1:                                                                          | PC(0), BaseR(1                                                                                               | )                                                                                                  |  |  |  |
| ADDR2MUX/2:                                                                          | ZERO(0)<br>offset6(1)<br>PCoffset9(2)<br>PCoffset11(3)                                                       | ;select the value zero<br>;select SEXT[IR[5:0]]<br>;select SEXT[IR[8:0]]<br>;select SEXT[IR[10:0]] |  |  |  |
| MARMUX/1:                                                                            | 7.0(0)<br>ADDER(1)                                                                                           | ;select LSHF(ZEXT[IR[7:0]],1) ;select output of address adder                                      |  |  |  |
| ALUK/2:                                                                              | ADD(0), AND(                                                                                                 | 1), XOR(2), PASSA(3)                                                                               |  |  |  |
| MIO.EN/1:<br>R.W/1:<br>DATA.SIZE/1:<br>LSHF1/1:                                      | NO(0), YES(1)<br>RD(0), WR(1)<br>BYTE(0), WOR<br>NO(0), YES(1)                                               | RD(1)                                                                                              |  |  |  |

Table 2: Microsequencer control signals

| Signal Name     | Signal Values                                                                    |                                                                |  | Signal Values |  |
|-----------------|----------------------------------------------------------------------------------|----------------------------------------------------------------|--|---------------|--|
| J/6:<br>COND/2: | COND <sub>0</sub><br>COND <sub>1</sub><br>COND <sub>2</sub><br>COND <sub>3</sub> | ;Unconditional<br>;Memory Ready<br>;Branch<br>;Addressing Mode |  |               |  |
| IRD/1:          | NO. YES                                                                          |                                                                |  |               |  |



Figure 5: A state machine for the LC-3b



Figure 6: The LC-3b data path



Figure 7: The microsequencer of the LC-3b base machine



Figure 8: The microsequencer of the LC-3b base machine