**Question #1** (4 points)

Let’s consider the Hardware-Based Speculation.

You are requested to:

1. Describe what the ReOrder Buffer (ROB) is;
2. Describe in detail each of the fields that make up each of the ROB entries;
3. Give the definition of precise exceptions;
4. Discuss how precise exceptions are handled in presence of the ROB.

Write your answer here.

1-Reorder Buffer allows in order commitment while execution in done out of order. It is FIFo and the oldest instruction is the first to be committed.

2-

status

3-precise exception are the exceptions that can be resumed after the branch or the instruction after the exception can be done as its normal order

4-Functional units providing results write the results in the CDB which can goes everywhere except the reload buffer and if the exception is not handled completely, then the content of the ROB is flushed and thus the wrong values are not considered any more

**Question 2** (4 points)

Let's consider a MIPS architecture using a *Branch History Table* (BHT) composed of *8 1-bit entries*. Let's assume that this architecture executes the following code: it counts the number of values in the vector *vec* that are equal to or multiples of 2 and 3 and then writes the results into the variables *mul2* and *mul3*, respectively. The calculation of the modulus, an operation used here to understand whether a value is a multiple of another, is done using the Barrett reduction, which can be described as follows:

*a mod n = a – [a/n]n*

In the code the module is calculated twice: the first time to check if the previously loaded value is a multiple of 2, the second time to check if it is a multiple of 3. In general, *a* is the value loaded from *vec* and *n* is first 2 and then 3. **Please note that some numbers may be multiples of both values.**

For every instruction, the hexadecimal address of the memory cell storing the instruction is reported.

Assuming that before executing the code fragment the BHT is full of null values (corresponding to the prediction Not Taken), you are asked to compute:

* The number of mispredicted branches during the execution of the code.
* The BHT content when the execution finishes (using the third table).

For all computations, it is suggested to use the two tables on the next page. Write in the highlighted cells whether the result of the prediction of the current branch and the real behavior (result) of the software is *Taken* (T) or *Not* *Taken* (NT). Then, report the results on the third table.

*Hint: To calculate the BHT entry corresponding to each branch instruction, remember that you should exclude the last two bits from the instruction address as they are always equal to 0.*

|  |  |  |  |
| --- | --- | --- | --- |
| ADDR | CODE | | |
|  | *.data* |  |  |
|  | vec: | .byte 15, 12, 9, 8, 6, 4, 3, 2 | # input vector |
|  | mul2: | .space 1 | # number of values equal to or multiples of 2 |
|  | mul3: | .space 1 | # number of values equal to or multiples of 3 |
|  | *.text* |  |  |
| 0x0000 |  | daddui r1, r0, 2 | # initialize the first value of n (2) |
| 0x0004 |  | daddui r2, r0, 3 | # initialize the second value of n (3) |
| 0x0008 |  | daddui r3, r0, 8 | # initialize the value used as a comparator |
| 0x000c |  | daddui r4, r0, 0 | # initialize the pointer |
| 0x0010 |  | daddui r5, r0, 0 | # initialize the counter of values equal to or multiples of 2 |
| 0x0014 |  | daddui r6, r0, 0 | # initialize the counter of values equal to or multiples of 3 |
| 0x0018 | cyc: | lbu r7, vec(r4) | # load an element from *vec* |
| 0x001c |  | ddiv r8, r7, r1 | # Barrett reduction (modulus calculation), *n = 2* |
| 0x0020 |  | dmulu r8, r8, r1 | # Barrett reduction (modulus calculation), *n = 2* |
| 0x0024 |  | dsubu r8, r7, r8 | # Barrett reduction (modulus calculation), *n = 2* |
| 0x0028 |  | bnez r8, m3 | # jump to *m3* if the modulus is not equal to zero |
| 0x002c |  | daddui r5, r5, 1 | # increment the counter of values equal to or multiples of 2 |
| 0x0030 | m3: | ddiv r8, r7, r2 | # Barrett reduction (modulus calculation), *n = 3* |
| 0x0034 |  | dmulu r8, r8, r2 | # Barrett reduction (modulus calculation), *n = 3* |
| 0x0038 |  | dsubu r8, r7, r8 | # Barrett reduction (modulus calculation), *n = 3* |
| 0x003c |  | bnez r8, nxt | # jump to *nxt* if the modulus is not equal to zero |
| 0x0040 |  | daddui r6, r6, 1 | # increment the counter of values equal to or multiples of 3 |
| 0x0044 | nxt: | daddui r4, r4, 1 | # increment the pointer |
| 0x0048 |  | bne r3, r4, cyc | # condition for exiting the cycle |
| 0x004c | term: | sb r5, mul2(r0) | # store the number of values equal to or multiples of 2 |
| 0x0050 |  | sb r6, mul3(r0) | # store the number of values equal to or multiples of 3 |
| 0x0054 |  | halt | # termination of the program |

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Address | Code | | BHT | Iteration #1 | | Iteration #2 | | Iteration #3 | | Iteration #4 | | Iteration #5 | |
|  |  | | entry # | pred | result | pred | result | pred | result | pred | result | pred | result |
| 0x0000 |  | daddui r1, r0, 2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0004 |  | daddui r2, r0, 3 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0008 |  | daddui r3, r0, 8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x000c |  | daddui r4, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0010 |  | daddui r5, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0014 |  | daddui r6, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0018 | cyc: | lbu r7, vec(r4) |  |  |  |  |  |  |  |  |  |  |  |
| 0x001c |  | ddiv r8, r7, r1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0020 |  | dmulu r8, r8, r1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0024 |  | dsubu r8, r7, r8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0028 |  | bnez r8, m3 | 2 | NT | T | T | NT | NT | T | T | NT | NT | NT |
| 0x002c |  | daddui r5, r5, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0030 | m3: | ddiv r8, r7, r2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0034 |  | dmulu r8, r8, r2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0038 |  | dsubu r8, r7, r8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x003c |  | bnez r8, nxt | 7 | NT | NT | NT | NT | NT | NT | NT | T | T | NT |
| 0x0040 |  | daddui r6, r6, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0044 | nxt: | daddui r4, r4, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0048 |  | bne r3, r4, cyc | 2 | NT | T | T | T | T | T | T | T | T | T |
| 0x004c | term: | sb r5, mul2(r0) |  |  |  |  |  |  |  |  |  |  |  |
| 0x0050 |  | sb r6, mul3(r0) |  |  |  |  |  |  |  |  |  |  |  |
| 0x0054 |  | halt |  |  |  |  |  |  |  |  |  |  |  |

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Address | Code | | BHT | Iteration #6 | | Iteration #7 | | Iteration #8 | | Iteration #9 | | Iteration #10 | |
|  |  | | entry # | pred | result | pred | result | pred | result | pred | result | pred | result |
| 0x0000 |  | daddui r1, r0, 2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0004 |  | daddui r2, r0, 3 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0008 |  | daddui r3, r0, 8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x000c |  | daddui r4, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0010 |  | daddui r5, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0014 |  | daddui r6, r0, 0 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0018 | cyc: | lbu r7, vec(r4) |  |  |  |  |  |  |  |  |  |  |  |
| 0x001c |  | ddiv r8, r7, r1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0020 |  | dmulu r8, r8, r1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0024 |  | dsubu r8, r7, r8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0028 |  | bnez r8, m3 | 2 | NT | NT | NT | T | T | NT |  |  |  |  |
| 0x002c |  | daddui r5, r5, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0030 | m3: | ddiv r8, r7, r2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0034 |  | dmulu r8, r8, r2 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0038 |  | dsubu r8, r7, r8 |  |  |  |  |  |  |  |  |  |  |  |
| 0x003c |  | bnez r8, nxt | 7 | NT | T | T | NT | NT | T |  |  |  |  |
| 0x0040 |  | daddui r6, r6, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0044 | nxt: | daddui r4, r4, 1 |  |  |  |  |  |  |  |  |  |  |  |
| 0x0048 |  | bne r3, r4, cyc | 2 | T | T | T | T | T | NT |  |  |  |  |
| 0x004c | term: | sb r5, mul2(r0) |  |  |  |  |  |  |  |  |  |  |  |
| 0x0050 |  | sb r6, mul3(r0) |  |  |  |  |  |  |  |  |  |  |  |
| 0x0054 |  | halt |  |  |  |  |  |  |  |  |  |  |  |

The number of mispredicted branches during the execution of the code is: \_13\_\_\_\_\_

**BHT - Final content**

|  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- |
| Entry 0 | 0 |  | Entry 4 | 0 |  |
| Entry 1 | 0 |  | Entry 5 | 0 |  |
| Entry 2 | 0 |  | Entry 6 | 0 |  |
| Entry 3 | 0 |  | Entry 7 | 1 |  |

**Question 3** (6 points)

A PREMISE: this 8086 exercise is focused on “brute force” approach, i.e., processing a (very) large number of cases/operations. Please focus on the “brute force” approach and **DO NOT TRY to find a FULLY OPTIMIZED algorithm**. Minor optimizations will be acceptable (but will not lead to increases in the score), but please keep in mind that a non-brute-force approach (WHICH IS NOT REQUESTED) could result in a much more difficult algorithm and implementation, i.e., well above what is requested in today’s exam.

Let us assume that to have two **UNSIGNED** values A and B, both greater than zero, stored in DL and BL, respectively. By using a brute force approach, please compute the value of the **product A\*B** and have it finally stored in **AX** (note: it is guaranteed that there is no overflow as AX is on 16 bits). **Please also return in BH the value 0 if AX is not greater than 255, and 1 otherwise.**

**Kindly note that, for this exercise, all registers are available, but remember not to overwrite the inputs!**

Hint: the brute force approach is based on the basic definition of multiplication, i.e.,

**the value of A contributing to the final result B times**

Please **clearly & concisely** explain your algorithm. Please assume that A and B are already known inside your program. Failure to provide an explanation and relevant comments to the most important instructions/parts of the program, will result in severe score penalties.

**Write your code in a file saved in the 8086 folder.**

Click on the following link to open a web page with the 8086 instruction set:

<http://www.jegerlehner.ch/intel/IntelCodeTable.pdf>

There’s 4 possible options:

1-if the first number =even, second numbers=odd => then shift and add with the first number

2-if the first number=even, second number is=even =>shift as much as the number of dividened of the number to 2

3-if the first number=odd, second number=even=>shift as much as the number of dividened by division of the number to 2

4-if the first number=odd, second number=odd=>shift and add with the first number

**Question 4** (8 points)

A maze is saved in a NUM\_ROW \* NUM\_COL matrix of bytes. The character ‘X’ indicates a wall, and the space indicates a passage. It is guaranteed that all the borders of the maze are walls. The entrance is represented with ‘e’ and the exit with the integer number 0.

Example: a maze matrix with NUM\_ROW = 9 and NUM\_COL = 8 is shown below:

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| X | X | X | X | X | X | X | X |
| X | 0 | e | e | e | e | e | X |
| X | e | X | X | X | X | e | X |
| X | e | e | e | e | e | e | X |
| X | X |  | X | e | X | X | X |
| X |  |  | X | e | e | e | X |
| X |  | X | X |  | X | X | X |
| X |  |  |  |  |  |  | X |
| X | X | X | X | X | X | X | X |

Write the shortestPath subroutine in ARM assembly language to find the shortest path from the entrance to the exit. The subroutine receives the following parameters (in the order indicated):

* number of rows
* number of columns
* address of the *maze* matrix

The subroutine scans all the cells of the maze: if a passage is connected to the exit (i.e, it is at its right, left, top or bottom), then the content of the cell becomes the integer value 1. The loop is repeated: this time all neighbors of a cell with the value 1 assume the value 2. The loop is repeated again: it stops when the entrance becomes connected to a cell with an integer value.

**Example:** Iteration 1 Iteration 2 Iteration 3

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |
| X | 0 | 1 |  |  |  |  | X |  |  |  |  | X | 0 | 1 | 2 |  |  |  | X |  |  |  |  | X | 0 | 1 | 2 | 3 |  |  | X |
| X | 1 | X | X | X | X |  | X |  |  |  |  | X | 1 | X | X | X | X |  | X |  |  |  |  | X | 1 | X | X | X | X |  | X |
| X |  |  |  |  |  |  | X |  |  |  |  | X | 2 |  |  |  |  |  | X |  |  |  |  | X | 2 | 3 |  |  |  |  | X |
| X | X |  | X |  | X | X | X |  |  |  |  | X | X |  | X |  | X | X | X |  |  |  |  | X | X |  | X |  | X | X | X |
| X |  |  | X |  |  | e | X |  |  |  |  | X |  |  | X |  |  | e | X |  |  |  |  | X |  |  | X |  |  | e | X |
| X |  | X | X |  | X | X | X |  |  |  |  | X |  | X | X |  | X | X | X |  |  |  |  | X |  | X | X |  | X | X | X |
| X |  |  |  |  |  |  | X |  |  |  |  | X |  |  |  |  |  |  | X |  |  |  |  | X |  |  |  |  |  |  | X |
| X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |

Iteration 4 Iteration 5 Iteration 8

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |
| X | 0 | 1 | 2 | 3 | 4 |  | X |  |  |  |  | X | 0 | 1 | 2 | 3 | 4 | 5 | X |  |  |  |  | X | 0 | 1 | 2 | 3 | 4 | 5 | X |
| X | 1 | X | X | X | X |  | X |  |  |  |  | X | 1 | X | X | X | X |  | X |  |  |  |  | X | 1 | X | X | X | X | 6 | X |
| X | 2 | 3 | 4 |  |  |  | X |  |  |  |  | X | 2 | 3 | 4 | 5 |  |  | X |  |  |  |  | X | 2 | 3 | 4 | 5 | 6 | 7 | X |
| X | X | 4 | X |  | X | X | X |  |  |  |  | X | X | 4 | X |  | X | X | X | … | | | | X | X | 4 | X | 6 | X | X | X |
| X |  |  | X |  |  | e | X |  |  |  |  | X |  | 5 | X |  |  | e | X |  |  |  |  | X | 6 | 5 | X | 7 | 8 | e | X |
| X |  | X | X |  | X | X | X |  |  |  |  | X |  | X | X |  | X | X | X |  |  |  |  | X | 7 | X | X | 8 | X | X | X |
| X |  |  |  |  |  |  | X |  |  |  |  | X |  |  |  |  |  |  | X |  |  |  |  | X | 8 |  |  |  |  |  | X |
| X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |  |  |  |  | X | X | X | X | X | X | X | X |

The procedure returns the length of the shortest path, i.e., the number of iterations executed. In the example, the return value is 8.

Important notes:

1. Create a new project with Keil inside the “ARM” directory and write your code there. The “ARM” directory contains some subdirectories that you can add to your project if you need them. **It also contains the startup\_LPC17xx.s file with the Reset\_Handler procedure and the declaration of the memory areas.**
2. The assembly subroutine must comply with the ARM Architecture Procedure Call Standard (AAPCS) standard (in terms of parameter passing, returned value, callee-saved registers).
3. Click on the following links to open web pages with the ARM instruction set

https://developer.arm.com/documentation/dui0473/m/preface

<https://developer.arm.com/documentation/ddi0337/e/Introduction/Instruction-set-summary?lang=en>

**Question 5** (5 points)

The following matrix of chars is declared in C:

char maze[9][8] = {"XXXXXXXX",

{'X', 0, ' ', ' ', ' ', ' ', ' ', 'X'},

"X XXXX X",

"X X",

"XX X XXX",

"X X eX",

"X XX XXX",

"X X",

"XXXXXXXX"};

Pass this matrix to the shortestPath subroutine to find the shortest path in the maze.

Then, show the sequence of movements to arrive to the exit using the leds. In details, you start from the entrance (represented with ‘e’) and look for the cell *x* containing the value *k* returned by the shortestPath subroutine:

* if *x* is at the right of the entrance, switch on led 4
* if *x* is at the bottom of the entrance, switch on led 5
* if *x* is at the left of the entrance, switch on led 6
* if *x* is at the top of the entrance, switch on led 7

Then, you look for the cell containing the value *k* – 1 and switch on the corresponding led. The loop continues as long as *k* >= 0.

In the example, the sequence of leds are: led 6, led 6, led 7, led 7, led 6, led 6, led 6, led 7, led 7.

Each led remains lit for 0.5 s. Then, you have to wait 0.5 s before switching on the next led. (Suggestion: you need a timer generate interrupts every 0.5 s.)

**Notes about the leds.** The pins of leds 4-11 are P2.7 – P2.0. The function LED\_init (included in the provided template) initializes the pins as GPIO Port 2.0 (LPC\_GPIO2). You have to switch on the required leds by means of the following accessible registers:

* FIODIR: Fast GPIO Port Direction control register. This register individually controls the direction of each port pin.
* FIOMASK: Fast Mask register for port. Writes, sets, clears, and reads to port (done via writes to FIOPIN, FIOSET, and FIOCLR, and reads of FIOPIN) alter or return only the bits enabled by zeros in this register.
* FIOPIN: Fast Port Pin value register using FIOMASK. The current state of digital port pins can be read from this register. The value read is masked by ANDing with inverted FIOMASK. Writing to this register places corresponding values in all bits enabled by zeros in FIOMASK.
* FIOSET: Fast Port Output Set register using FIOMASK. This register controls the state of output pins. Writing 1s produces highs at the corresponding port pins. Writing 0s has no effect. Reading this register returns the current contents of the port output register. Only bits enabled by 0 in FIOMASK can be altered.
* FIOCLR: Fast Port Output Clear register using FIOMASK. This register controls the state of output pins. Writing 1s produces lows at the corresponding port pins. Writing 0s has no effect. Only bits enabled by 0 in FIOMASK can be altered.