

# Design of Embedded Hardware and Firmware

# Introduction on "System On Programmable Chip" NIOS II – Avalon Bus

Andres Upegui Laboratoire de Systèmes Numériques hepia/HES-SO





### **Embedded system on Altera FPGA**

### Goals:

- To understand the architecture of an embedded system on FPGA
- To be able to identify the components required for building a specific interface
- To have an overview of the construction of a full system based on a standard softcore bus in a FPGA with block modules

### NIOS II -

#### **Embedded system NIOSII/Avalon Architecture**

Note: The same principles are available for Altera, Xilinx, Lattice or others FPGA



### NIOS II

- Softcore Processor from Altera
  - A processor implemented with Logic Elements (LUT+DFF) in a FPGA
  - A processor synthesized by a compiler and placed & routed on the FPGA
  - A processor described by a HDL langage(VHDL/Verilog/...)
- 32 bits Architecture
- 256 instructions available for user implementation
- 2 versions (... or 3...)
- Operating systems supported: linux, eCos, embOS, Euros RTOS, oSCAN, uCLinux, ThreadX, VxWorks





### **NIOS II Processor:**

### 3 basic configurable architectures

|                   | Nios II / f  | Nios II / s  | Nios II / e |
|-------------------|--------------|--------------|-------------|
| Pipeline          | 6 stages     | 5 stages     | None        |
| Branch prediction | Dynamic      | Static       | None        |
| Instruction Cache | Configurable | Configurable | None        |
| Data Cache        | Configurable | None         | None        |
| MMU               | Yes          | No           | No          |
| Hard Multiplier   | 1 cycle      | 3 cycles     | None        |
| Hard divider      | Yes          | Yes          | No          |
| Resources         | Env 2500 LEs | < 1400 LEs   | < 700 LEs   |
| Max freq          | 290 MHz      | 270 MHz      | 340 MHz     |





### **NIOS II Processor performance**

| <b>Processor Category</b>    | Cost- and Po     | wer-Sensitive | Real-Time Processors |                     |                 |
|------------------------------|------------------|---------------|----------------------|---------------------|-----------------|
| Devices                      | ARM<br>Cortex-M1 | V1 ColdFire   | Nios II<br>Economy   | Nios II<br>Standard | Nios II<br>Fast |
| Cyclone III                  | 80 at            | 84 at         | 30 at                | 90 at               | 195 at          |
| (MIPS <sup>(1)</sup> at MHz) | 145              | 90            | 215                  | 145                 | 175             |
| Cyclone III LS               |                  | 65 at         | 20 at                | 70 at               | 160 at          |
| (MIPS <sup>(1)</sup> at MHz) |                  | 70            | 150                  | 110                 | 140             |
| Cyclone IV GX                | -                | 70 at         | 30 at                | 70 at               | 190 at          |
| (MIPS <sup>(1)</sup> at MHz) |                  | 75            | 175                  | 110                 | 165             |
| Arria II GX                  |                  | 84 at         | 45 at                | 115 at              | 270 at          |
| (MIPS <sup>(1)</sup> at MHz) |                  | 90            | 300                  | 180                 | 240             |
| Stratix III                  | 150 at           | 104 at        | 48 at                | 140 at              | 340 at          |
| (MIPS <sup>(1)</sup> at MHz) | 230              | 112           | 340                  | 230                 | 290             |
| Stratix IV                   | -                | 135 at        | 50 at                | 155 at              | 340 at          |
| (MIPS <sup>(1)</sup> at MHz) |                  | 145           | 340                  | 240                 | 290             |
| Stratix V                    |                  | 135 at        | 50 at                | 170 at              | 320 at          |
| (MIPS <sup>(1)</sup> at MHz) |                  | 145           | 330                  | 270                 | 280             |



### **Accelerating computation: Custom user instructions**

 The ALU can be extended by user own instructions, until 256.



Instructions can be:

- Combinational
- Multi-cycle, synchronized by clk and stall
- Parameterized

They can have access to all the FPGA resources !!!





### Improving performance: hardware accelerator

- For cycles consuming operations, a hardware accelerator can be included/developed
- A Master unit which has access to Memory and Programmable Interfaces for accelerated operations or with hard real time constrains



### Classic Computer architecture (general and simplified)

In single master, address bus is totem-pole, data bus is tri-state





### Classic Computer architecture (general and simplified)

- Classical architecture
  - **≻**Processor
  - ➤ Memories
  - ➤ Input/Output (programmable) interface
  - >Address bus
  - ➤ Data Bus (tri-state)
  - ➤ General decoder

### SoPC architecture: Avalon bus

Multi-Master

. Avalon Bus Module Block Diagram - An Example System

**Nios CPU** 

Arbitrage « slave-side »

**Concurrent Master-Slave Access** 

Synchronous transfers



Slave

**DMA** Controller

### SoPC architecture

- Processor
- Memories
- Input/Output (programmable) interface
- Address bus
- Separated Data Bus In/Out → multiplexers
- Local decoder on the Avalon bus
- Bus transfers size adaptation is done at Avalon bus level
- Master/Slave
- Synchronous bus
- Wait state by configuration or dynamic
- Hold / Set up available





# Avalon « slave » signals

| Signal Type | Width | Direction | Required | Description                                                                                          |
|-------------|-------|-----------|----------|------------------------------------------------------------------------------------------------------|
| clk         | 1     | In        | (No)     | Global clk for system module and Avalon bus modules. All transactions synchronous to clk rising edge |
| nReset      | 1     | In        | No       | Global Reset of the system                                                                           |
| address     | 132   | In        | No       | Address for Avalon bus modules                                                                       |
| chipselect  | 1     | In        | Yes      | Selection of the Avalon bus module                                                                   |
| read        | 1     | In        | No       | Read request to the slave                                                                            |
| readdata    | 132   | Out       | No       | Read data from the slave module                                                                      |
| write       | 1     | In        | No       | Write request to the slave                                                                           |
| writedata   | 132   | In        | No       | Data from Master to Slave module                                                                     |
| irq         | 1     | Out       | No       | Interrupt request to the master                                                                      |



### « slave » signals

- The ChipSelect is generated by the Avalon bus and selects the module
- The Address[n .. 0] is used to access a specific register/memory position in the selected module.
   Only the minimum number of address is necessary.
- The Read and Write signals specified the direction of the transfers. They are provided by a Master and received by the slave modules
- ReadData(..) and WriteData(..) bus transfers the Datas from (read)/ to (write) the Slaves





### « slave » signals

- The Read and Write signals specified the direction of the transfers.
- They are provided by a Master and received by the slave modules
- The direction is the view of the Master unit
- ReadData(..) and WriteData(..) bus transfers the Datas from (read)/ to (write) the Slaves

# Avalon « slave » signals

| Signal Type                       | Width              | Direction | Required | Description                                                                                   |
|-----------------------------------|--------------------|-----------|----------|-----------------------------------------------------------------------------------------------|
| WaitRequest/<br>WaitRequest_n     | 1                  | Out       | No       | Assert by the slave when it is not able to answer in this clock cycle to read or write access |
| ByteEnable/<br>ByteEnable_n       | 1, 2, 4,<br>8,,128 | In        | No       | The bytes to transfer                                                                         |
| BeginTransfer                     | 1                  | In        | No       | Inserted by Avalon fabric at and only at first clock of each transfer                         |
| ReadDataValid/<br>ReadDataValid_n | 1                  | Out       | No       | For read transfer with <b>variable latency</b> , means data are valid to master               |
| BurstCount                        | 111                | In        | No       | Number of burst transfers                                                                     |
| BeginBurstTransfer                | 1                  | In        | No       | First cycle of a burst transfer, valid for 1 clock cycle                                      |



### « slave » signals

- BE (Byte Enable) signals indicate the bytes to transfer.
  - The number of BE activated are always a power of 2
  - They start at a multiple of the size to transfer
- A master address is a byte address
- A slave address is a word address
- The Avalon bus make the translation and performs multiple accesses if required.



# Avalon byte enable

| byteenable_n[3:0] | Write action        |
|-------------------|---------------------|
| 0000              | Write full 32-bits  |
| 1100              | Write lower 2 bytes |
| 0011              | Write upper 2 bytes |
| 1110              | Write byte 0 only   |
| 1011              | Write byte 2 only   |

Specify bytes to be transferred Active low signals in this representation





**Avalon** 

### Master to slave addresses: Master 32 bits, Slave 8 bits



### Master to slave addresses: Master 32 bits, Slave 16 bits



### Master to slave addresses: Master 32 bits, Slave 32 bits









### Master to slave addresses: Master 32 bits, Slave 64 bits





### **Avalon Bus**

### Slave view of transfers

- Transfers are synchronous on the rising edge of the Clk
- Between Clk, the timing relation between signals are NOT relevant

### **Avalon slave** read, 0 wait, asynchronous peripheral





# Avalon slave read, 1 wait

### Wait cycle specified by design



# Avalon read slave, 2 wait





# Avalon read slave, 1 set up and 1 wait





### Avalon read slave, wait request generated by slave device





## Avalon read slave, burst of 4 from Master A, 2 from master B



### Pipeline of master access

ReadDataValid activated by slave for each data



# Avalon write slave, 0 wait



# **Avalon write** slave, 1 wait





# Avalon write slave, wait request generated by slave





# Avalon write slave, 1 set up, 1 hold, 0 wait





# Avalon write burst transfer of 4, wait request generated by slave





### **Bus avalon**

### **Master view**

- The master starts a transfer (read or write)
- It provides the Addresses (32 bits on NIOSII)
- It waits on WaitRequest signal to resume the transfer

# **Avalon** master signals (1)

| Signal Type | Width     | Direction | Required | Description                                                                                                                                           |
|-------------|-----------|-----------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| clk         | 1         | in        | yes      | Global clock signal for the system module and<br>Avalon bus module. All bus transactions are<br>synchronous to c1k.                                   |
| reset       | 1         | in        | no       | Global reset signal. Implementation is peripheral-<br>specific.                                                                                       |
| address     | 1 - 32    | out       | yes      | Address lines from the Avalon bus module. All<br>Avalon masters are required to drive a byte<br>address on their address output port.                 |
| byteenable  | 0, 2, 4   | out       | no       | Byte-enable signals to enable specific byte lane(s) during transfers to memories of width greater than 8 bits. Implementation is peripheral-specific. |
| read        | 1         | out       | no       | Read request signal from master port. Not required if master never performs read transfers. If used, readdata must also be used.                      |
| readdata    | 8, 16, 32 | in        | no       | Data lines from the Avalon bus module for read transfers. Not required if the master never performs read transfers. If used, read must also be used.  |
| write       | 1         | out       | no       | Write request signal from master port. Not<br>required if the master never performs write<br>transfers. If used, writedata must also be used          |



# Avalon master signals (2)

| Signal Type   | Width     | Direction | Required | Description                                                                                                                                                                             |
|---------------|-----------|-----------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| writedata     | 8, 16, 32 | out       | no       | Data lines to the Avalon bus module for write transfers. Not required if the master never performs write transfers. If used, write must also be used.                                   |
| waitrequest   | 1         | in        | yes      | Forces the master port to wait until the Avalon bus module is ready to proceed with the transfer.                                                                                       |
| irq           | 1         | in        | no       | Interrupt request has been flagged by one or more slave ports.                                                                                                                          |
| irqnumber     | 6         | in        | no       | The interrupt priority of the interrupting slave port.<br>Lower value has higher priority.                                                                                              |
| endofpacket   | 1         | in        | no       | Signal for streaming transfers. May be used to<br>indicate an end of packet condition from the slave<br>to the master port. Implementation is peripheral-<br>specific.                  |
| readdatavalid | 1         | in        | no       | Signal for read transfers with latency and is for a master only. Indicates that valid data from a slave port is present on the readdata lines. Required if the master is latency-aware. |
| flush         | 1         | out       | no       | Signal for read transfers with latency. Master can<br>clear any pending latent read transfers by<br>asserting flush.                                                                    |



### **Avalon Master Basic fundamental transfers**





# Avalon read master, 0 wait



## Avalon read master, wait generated by slave/Avalon bus



# Avalon write master, 0 wait





# Avalon write master, wait generated by slave



### **Avalon read** transfers with latency (ex. 2 cycles)



Wait request here means:

delay address cycle Fixed latency (here 2)







### **Avalon read**

transfers with latency, and *readdatavalid* generated by slave



### Readdatavalid specify when data are ready



### **Avalon burst Write**



Address and BurstCount available for the whole transfer Write can be deactivated by the master The number of burstcount needs to be generated





### **Embedded System on FPGA (example)**





IN ENGINEERING

et d'architecture de Genève

### Conclusion

| So | me positive points of a softcore architecture                                                        |
|----|------------------------------------------------------------------------------------------------------|
|    | Fast implementation                                                                                  |
|    | Modular Architecture                                                                                 |
|    | Affordable system complexity                                                                         |
|    | Good documentation                                                                                   |
|    | Reuse of existing IP cores                                                                           |
|    | Ease of development of our own programmable interface on internal bus (i.e. Avalon in VHDL, Verilog) |
|    | Full system on FPGA, easily adaptable                                                                |
|    | Operating Systems available                                                                          |
| So | me negative points                                                                                   |
|    | Several complex tools to develop a system: hardware and software specific                            |
|    | Several debugging levels to deal with for hardware and software.                                     |

