#### Lab 0

#### Johannes Lemonde et Lucas Streit

September 25, 2018

#### 3 Questions to chapter 3

#### 3.1.1 Flow diagram for the delay subroutine

See at the end.

#### 3.1.2 Calculating delaycount in order to achieve 1 ms in delay's inner-loop

We have  $\frac{50~[MHz~op]}{4~[\frac{op}{loop}]~1000~[\frac{ms}{s}]} = 12'500[\frac{loops}{ms}]$ , so we would set delay count to 12'500. Experimentally, we figured out that it should be set to 860 approx (why?)

#### 3.2.1 What is BSP and what does it contain?

BSP means Board Support Package. It is the layer of software containing hardware-specific drivers and other routines that allow a particular operating system to function. It initialises the processor, bus, interrupts, clock, RAM, amongst others, and runs the boot loader.

The BSP project in Eclipse contains amongst others all the files necessary to access the LEDs, switches, buttons, etc.

#### 3.2.2 What is a soft-core processor? What type of processor is present? Features?

We talk about soft-core processors when we configure a reprogrammable logic, such as a FPGA, to function as a processor. For these laboratories, we use the NIOS II soft processor, which is a soft microprocessor core for Intel FPGAs. Clock frequency: 50 MHz; neither data nor instruction caches for NIOS II/e; No pipeline nor branch prediction for NIOS II/e.

## 3.2.3 How does the processor access peripherals? What is the main mechanism and what main piece of hardware stands between the processor and a peripheral?

Peripherals are accessed through the Avalon Switch Fabric, which lets several masters from the core to operate at the same time. In other words, in handles the requests from the masters to the slaves (peripherals).

## 3.2.4 What peripherals/IP cores are present in the provided architecture? Identify their symbolic names and base addresses.

Buttons, switches, green and red LEDs, seven segment displays, ... For the symbolic names and base addresses, see system.h (within the BSP project in Eclipse).

# 3.2.5 What types of memory are present in the provided architecture? Identify their symbolic names, base addresses and sizes. What are the typical access time for each of these memories?

On chip memory: ONCHIP\_MEMORY\_BASE, address: 0x0, size: 25600 words, typical access time: 1 cycle at 50MHz.

SDRAM: SDRAM\_BASE, base-address: 0x1000000, size: 8388608 words, typical access time (CAS latency): 3 cycles at 50 MHz.

#### 3.3.1 What macro is used to write to a parallel I/O port? How about to read from one?

We can use the macros IOWR\_ALTERA\_AVALON\_PIO\_DATA(base, data) and IORD\_ALTERA\_AVALON\_PIO\_DATA(base), defined in \$BSP\_PROJECT/drivers/inc/altera\_avalon\_pio\_regs.h, where base is the address of the port and data is the data to write.

#### 3.3.2 What command will you use to turn on all red LEDs?

We have to run the previous command with DE2\_PIO\_REDLED18\_BASE as the base and  $(2^{18} - 1)$  as the data. It would look like that: IOWR\_ALTERA\_AVALON\_PIO\_DATA(DE2\_PIO\_REDLED18\_BASE, 0x3FFFF);

#### 3.3.3 What command would you use to read the current status of all push-buttons?

IORD\_ALTERA\_AVALON\_PIO\_DATA(D2\_PIO\_KEYS4\_BASE);, where the function returns a 4 bit number, each bit corresponding to the status of one push-button, 0 meaning currently pushed and 1 meaning released.

## 3.3.4 Which header file do you need to include in your program to access the symbolic names? What about the read/write macros?

Basically, "system.h" defines the base-addresses of the peripherals (symbolic names), such as for instance D2\_PIO\_KEYS4\_BASE, and "altera\_avalon\_pio\_regs.h" defines the macros to read/write.

#### 3.6.1 What is the IRQ level for buttons?

#define D2\_PIO\_KEYS4\_IRQ 8 in the system.h file.

#### 3.6.2 Which other peripherals support interrupts?

The toggles (IRQ 6), the JTAG UART (IRQ 5), Timer 0 (IRQ 7), Timer 1 (IRQ 9). (found in system.h).

## 5.1.1 How are the timestamp timer and the performance counter functioning? Describe the main steps.

The timestamp is basically a counter which gets incremented – in this case – at every tick of the clock.

The performance counter is a peripheral containing several counters allowing to count the number of ticks within a section of code chosen for analysis, without

Results:

|                     |           |        | Time (ms)    | Error (r | ns) |
|---------------------|-----------|--------|--------------|----------|-----|
| Timestamp timer     |           |        | 54.222       | 0.273    |     |
| Performance counter |           |        | 54.202       | 0.258    |     |
| Flag                | Time (ms) | S      | ize (KB) (EL | F file)  |     |
| -O0                 | 50.195    | 80.876 |              |          |     |
| -O2                 | 10.898    | 80.356 |              |          |     |
| -Os                 | 10.950    | 80.308 |              |          |     |

#### 5.3.1 Why SDRAM rather than SRAM?

SDRAM is cheaper than SRAM, even if it is slower. In many cases, this is a deciding point. Moreover, it is possible to have much bigger blocs of memory, so it is possible to optimise locally more data, which is used together, in order to reduce the access times using a cache.

| SDRAM         | about 10ns |
|---------------|------------|
| External SRAM | 0.5 - 5 ns |
| On-chip RAM   | 5  ns      |
| Cache         | 0.1 - 5 ns |

#### 5.3.2 How could cache memory enhance our main function?

If we used some cache so that we could fit the hole matrix inside (or a significant part of it, taking care to loop in a clever order during the sum), then it could be faster since cache has better access times than SDRAM.

But anyway, at 50 MHz, there would be no difference between both approaches because the CPU period would be 20 ns (greater than both typical access times).

#### 5.4 Caches not used to their full advantage?

If you loop through a too great amount of data to be stored inside the cache (too small cache): if there are many cache misses. [On the other hand, cache is not used to its full advantage if the CPU clock frequency is low (as seen in previous question).]

As for the software design, let's take an example : if we made the sum through the columns for each column (switch i and j) in the function of the sum of the matrix, there would be uselessly many cache misses.

### 6 Questions to ask

Optimisations : -O0, -O2, -Os

C'est quoi SIZE ???

Sur quelle mémoire on charge le programme ? (comment choisir)

Qu'entendent-ils par plot (à la toute fin) ?

Performance counter : comment faire. Note à moi-même : vérifier que c'est pas un overflow.

