# Architetture dei Sistemi di Elaborazione 02GOLOV [M-Z] Laboratory 1 Expected delivery of lab\_01.zip including: - program\_1.s - lab\_01.pdf (fill and export this file to pdf)

Please, configure the winMIPS64 processor architecture with the *Base Configuration* provided in the following:

- Integer ALU: 1 clock cycle
  Data memory: 1 clock cycle
  Branch delay slot: 1 clock cycle
- Code address bus: 12Data address bus: 12
- Pipelined FP arithmetic unit (latency): 6 stages
- Pipelined FP multiplier unit (latency): 8 stages
- FP divider unit (latency): not pipelined unit, 28 clock cycles
- Forwarding optimization is disabled
- Branch prediction is disabled
- Branch delay slot optimization is disabled.

#### Use the Configure menu:

- remove the flags (where activating Enable options)
- Browse the Architecture menu →

Modify the defaults Architectural parameters (where needed)



← Verify in the Pipeline window that the configuration is effective



1) Exercise your assembly skills and learn by example about pipeline optimizations. To write an assembly program called **program\_1.s** (to be delivered) for the *MIPS64* architecture and to execute it.

# The program must:

- 1. Given 2 arrays a and b, compute c[i] = 4 \* (a[i] + b[i]). Each array contains 30 **16-bit integer numbers**.
- 2. Search for **both** the maximum and minimum in the array c. The program saves the obtained value in two variables allocated in memory, called max and min respectively.

Identify and use the main components of the simulator:

- a. Running the WinMIPS simulator
  - Launch the graphic interface ...\winMIPS64\winmips64.exe
- b. Assembly and check your program:
  - Load the program from the **File→Open** menu (*CTRL-O*). In the case the of errors, you may use the following command in the command line to compile the program and check the errors:
  - ...\winMIPS64\asm program\_1.s
- c. Run your program step by step (F7), identifying the whole processor behavior in the six simulator windows:

Pipeline, Code, Data, Register, Cycles and Statistics

d. Enable one at a time the optimization features that were initially disabled and collect statistics to fill the following table (fill all required data in the table before exporting this file to pdf format to be delivered).

Table 1: **Program performance for different processor configurations** 

|           | Number of clock cycles |            |                         |            |  |
|-----------|------------------------|------------|-------------------------|------------|--|
| Program   | No optimization        | Forwarding | Branch Target<br>Buffer | Delay Slot |  |
| program_1 | 677                    | 467        | 619                     | 677        |  |

#### 2) Perform execution time measurements.

Search in the winMIPS64 folder the following benchmark programs:

- a. isort.s
- b. mult.s
- c. series.s
- d. program 1.s (your program)

Starting from the basic configuration with no optimizations, compute by simulation the number of cycles required to execute these programs; in this initial scenario, it is assumed that the programs weight is the same (25%) for everyone. Assume a processor frequency of 15MHz.

Then, change processor configuration and vary the programs weights as following. Compute again the performance for every case and fill the table below (fill all required data in the table before exporting this file to pdf format to be delivered).:

# 1) Configuration 1

- a. Enable Forwarding
- b. Disable branch target buffer
- c. Disable Delay Slot

Assume that the weight of all programs is the same (25%).

#### 2) Configuration 2

- a. Enable Forwarding
- b. Enable branch target buffer
- c. Disable Delay Slot

Assume that the weight of all programs is the same (25%).

# 3) Configuration 3

Configuration 1, but assume that the weight of the program your program is 50%.

#### 4) Configuration 4

Configuration 1, but assume that the weight of the program series.s is 50%.

Table 2: Processor performance for different weighted programs

| Program     | No opt        | Conf. 1      | Conf. 2      | Conf. 3       | Conf. 4      |
|-------------|---------------|--------------|--------------|---------------|--------------|
| isort.s     | 46041*(1/f)   | 33277*(1/f)  | 31039*(1/f)  | 33277*(1/f)   | 33277*(1/f)  |
|             | = 3.1 ms      | = 2.2  ms    | = 2.1  ms    | = 2.2  ms     | = 2.2  ms    |
| mult.s      | 1880*(1/f)    | 980*(1/f)    | 922*(1/f)    | 980*(1/f)     | 980*(1/f)    |
|             | = 125 µs      | = 65 µs      | = 62 µs      | = 65 µs       | = 65 µs      |
| series.s    | 550*(1/f)     | 233*(1/f)    | 234*(1/f)    | 233*(1/f)     | 233*(1/f)    |
|             | = 37 µs       | = 16 µs      | = 16 µs      | = 16 µs       | = 16 µs      |
| program_1.s | 677*(1/f      | 467*(1/f)    | 409*(1/f)    | 467*(1/f)     | 467*(1/f)    |
|             | = 45 µs       | = 31 µs      | = 27 µs      | = 31 µs       | = 31 µs      |
| TOTAL       | (0.25*3.1ms)  | (0.25*2.2ms) | (0.25*2.1ms) | (0.167*2.2ms) |              |
| TIME        | +(0.25*125µs) | +(0.25*65µs) | +(0.25*62µs) | +(0.167*65µs) |              |
| THVIL       | +(0.25*37µs)  | +(0.25*16µs) | +(0.25*16µs) | +(0.167*16µs) | +(0.50*16µs) |
|             | +(0.25*45µs)  | +(0.25*31µs) | +(0.25*27µs) | +(0.50*31µs)  |              |
|             | = 827 µs      | = 578 µs     | = 551 µs     | = 396 µs      | = 391 µs     |

For time computations, use a clock frequency of 15MHz.

# Appendix: winMIPS64 Instruction Set

| WinMIPS64                                                                          | beq - branch if pair of registers are equal                   |  |  |
|------------------------------------------------------------------------------------|---------------------------------------------------------------|--|--|
| The following assembler directives are supported                                   | bne - branch if pair of registers are not equal               |  |  |
| .data - start of data segment                                                      | beqz - branch if register is equal to zero                    |  |  |
| .text - start of code segment                                                      | bnez - branch if register is not equal to zero                |  |  |
| .code - start of code segment (same as .text)                                      |                                                               |  |  |
| .org <n> - start address</n>                                                       | j - jump to address                                           |  |  |
| .space <n> - leave n empty bytes</n>                                               | jr - jump to address in register                              |  |  |
| .asciiz <s> - enters zero terminated ascii string</s>                              | jal - jump and link to address (call subroutine)              |  |  |
| .ascii <s> - enter ascii string</s>                                                | jalr - jump and link to address in register (call subroutine) |  |  |
| .align <n> - align to n-byte boundary</n>                                          |                                                               |  |  |
| .word $\langle n1 \rangle$ , $\langle n2 \rangle$ enters word(s) of data (64-bits) | dsll - shift left logical                                     |  |  |
| .byte $\langle n1 \rangle, \langle n2 \rangle$ enter bytes                         | dsrl - shift right logical                                    |  |  |
| .word32 <n1>,<n2> enters 32 bit number(s)</n2></n1>                                | dsra - shift right arithmetic                                 |  |  |
| .word16 <n1>,<n2> enters 16 bit number(s)</n2></n1>                                | dsllv - shift left logical by variable amount                 |  |  |
| .double <n1>,<n2> enters floating-point number(s)</n2></n1>                        | dsrlv - shift right logical by variable amount                |  |  |
|                                                                                    | dsrav - shift right arithmetic by variable amount             |  |  |
| where <n> denotes a number like 24, <s> denotes a string</s></n>                   |                                                               |  |  |
| like "fred", and                                                                   | movn - move if register not equal to zero                     |  |  |
| <n1>,<n2> denotes numbers seperated by commas.</n2></n1>                           | nop - no operation                                            |  |  |
|                                                                                    | and - logical and                                             |  |  |
| The following instructions are supported                                           | or - logical or                                               |  |  |
| lb - load byte                                                                     | xor - logical xor                                             |  |  |
| lbu - load byte unsigned                                                           | slt - set if less than                                        |  |  |
| sb - store byte                                                                    | sltu - set if less than unsigned                              |  |  |
| lh - load 16-bit half-word                                                         | dadd - add integers                                           |  |  |
| lhu - load 16-bit half word unsigned                                               | daddu - add integers unsigned                                 |  |  |
| sh - store 16-bit half-word                                                        | dsub - subtract integers                                      |  |  |
| lw - load 32-bit word                                                              | dsubu - subtract integers unsigned                            |  |  |
| lwu - load 32-bit word unsigned                                                    |                                                               |  |  |
| sw - store 32-bit word                                                             | add.d - add floating-point                                    |  |  |
| ld - load 64-bit double-word                                                       | sub.d - subtract floating-point                               |  |  |
| sd - store 64-bit double-word                                                      | mul.d - multiply floating-point                               |  |  |
| l.d - load 64-bit floating-point                                                   | div.d - divide floating-point                                 |  |  |
| s.d - store 64-bit floating-point                                                  | mov.d - move floating-point                                   |  |  |
| halt - stops the program                                                           | cvt.d.l - convert 64-bit integer to a double FP format        |  |  |
|                                                                                    | cvt.l.d - convert double FP to a 64-bit integer format        |  |  |
| daddi - add immediate                                                              | c.lt.d - set FP flag if less than                             |  |  |
| daddui - add immediate unsigned                                                    | c.le.d - set FP flag if less than or equal to                 |  |  |
| andi - logical and immediate                                                       | c.eq.d - set FP flag if equal to                              |  |  |
| ori - logical or immediate                                                         | bc1f - branch to address if FP flag is FALSE                  |  |  |
| xori - exclusive or immediate                                                      | bc1t - branch to address if FP flag is TRUE                   |  |  |
| lui - load upper half of register immediate                                        | mtc1 - move data from integer register to FP register         |  |  |
| slti - set if less than or equal immediate                                         | mfc1 - move data from FP register to integer register         |  |  |
| sltiu - set if less than or equal immediate unsigned                               |                                                               |  |  |