# Loopy Driver Generator Documentation

Thomas Fischer

September 25, 2013

# Contents

| 1        | Intr | oducti  | on                         | 3        |  |  |
|----------|------|---------|----------------------------|----------|--|--|
|          | 1.1  | Origin  | s and Goals                | 3        |  |  |
|          | 1.2  | Termin  | nology                     | 4        |  |  |
| <b>2</b> | Get  | ting St | tarted                     | 5        |  |  |
|          | 2.1  | Setup   |                            | 5        |  |  |
|          | 2.2  | Board   | Description Language       | 6        |  |  |
|          | 2.3  | Genera  | ation Backends             | 11       |  |  |
|          | 2.4  | Proces  | 3S                         | 11       |  |  |
|          |      | 2.4.1   | Checking out the Generator | 11       |  |  |
|          |      | 2.4.2   | Building the Generator     | 12       |  |  |
|          |      | 2.4.3   | Running the Generator      | 12       |  |  |
|          |      | 2.4.4   | Executing your Application | 12       |  |  |
| 3        | Dri  | ver De  | scription                  | 16       |  |  |
|          | 3.1  | API     | -                          | 17       |  |  |
|          |      | 3.1.1   | Component                  | 17       |  |  |
|          |      | 3.1.2   | •                          | 18       |  |  |
|          |      | 3.1.3   | State                      | 18       |  |  |
|          | 3.2  |         |                            |          |  |  |
|          |      | 3.2.1   |                            | 20       |  |  |
|          |      | 3.2.2   | • 0                        | 21       |  |  |
|          |      | 3.2.3   |                            | 22       |  |  |
|          |      | 3.2.4   | ,                          | 23       |  |  |
|          | 3.3  | Protoc  |                            | 24       |  |  |
|          |      | 3.3.1   | Control Flow               | 24       |  |  |
|          |      | 3.3.2   |                            | 28       |  |  |
|          | 3.4  |         |                            |          |  |  |
|          |      | 3.4.1   |                            | 32<br>32 |  |  |
|          |      | 3.4.2   |                            | 33       |  |  |
|          |      | 3.4.3   |                            | 36       |  |  |

| 4 | Generator |                                   |    |
|---|-----------|-----------------------------------|----|
|   | 4.1       | Used Libraries                    | 37 |
|   |           | 4.1.1 JFlex & CUP                 | 37 |
|   |           | 4.1.2 Katja                       | 37 |
|   |           | 4.1.3 Apache Commons              | 38 |
|   | 4.2       | Generation Process                | 38 |
|   | 4.3       | Models                            | 38 |
|   |           | 4.3.1 Board Model                 | 38 |
|   |           | 4.3.2 C/C++ Model                 | 42 |
|   |           | 4.3.3 MHS Model                   | 44 |
|   | 4.4       | Extensions                        | 45 |
|   |           | 4.4.1 Adding Host Drivers         | 46 |
|   |           | 4.4.2 Adding Workflows and Boards | 47 |
|   |           | 4.4.3 Utility Classes             | 49 |
|   |           |                                   |    |

# Chapter 1

# Introduction

This document is intended to give an overview over the Loopy Driver Generator Framework for both, users and developers. While the first chapters are more addressed towards users of the generator, the later ones are more addressed towards developers.

The Loopy Driver Generator itself is intended to provide embedded system developers with a simple, convenient API to communicate with their designed hardware components. This interface provides methods for synchronous as well as asynchronous communication over different communication media, for example Ethernet.

To our knowledge, such an interface for communication with VHDL components has not been done so far, though extensive research exists concerning efficient communication between PCs and FPGAs over different communication channels, mainly Ethernet [3, 1, 2].

The architecture has been prototyped with a C++ frontend communicating with a Xilinx Virtex 6 ML-605 FPGA over Ethernet. The design of the driver generator enables easy extension to support other frontend languages, FPGA boards or transport media.

## 1.1 Origins and Goals

The HOPP Driver Generator originally was and currently is developed by the Software Technology Group of the University of Kaiserslautern in cooperation with the Microelectronic Design Research Group of the University of Kaiserslautern.

The non-functional design goals of this project are:

- Probably better filled out by the EIT department;)
- An easy-to-use driver and driver generator
- Well-documented api

- Modularity and extensibility
- Meaningful error description

## 1.2 Terminology

This section explains the terminology used in this document and the project in general.

**Driver** The complete software product is called the *driver*. It enables programmatic, software-side communication with the hardware platform.

**Board / board-side** The driver is split in two parts, one of which has to be uploaded and executed to the hardware platform itself. This part of the driver is referred to as *board-side* driver. Sometimes, the terms *server* and *server-side* might be used instead, since this driver acts similar to a server.

**Host / host-side** In contrast to the board-side driver, the *host-side* driver is located on the communicating computer. This part contains the actual API, embedded developers will work with. Sometimes, the terms *client* and *client-side* might be used instead, since this driver part acts more like a client.

# Chapter 2

# Getting Started

The purpose of this chapter is to explain how to properly build and use the generator. This includes generation of board-side and host-sided drivers for the specified board.

### 2.1 Setup

In the following, the tools required to build and execute the driver generator are introduced. Please note, that all tools have to be executable from the command line. This requires (for example) Windows users to adjust their PATH variable.

**Java** The generator is implemented in Java due to tool support and the environment of the Software Technology Group. Consequently, a JDK version 6 or above is required.

**Gradle** Gradle<sup>1</sup> is a build tool, similar to *Maven* or *Ant* (like *Make*, but - for the most part - easier to manage within larger projects). It supports the dependency management from Maven while retaining the flexibility of Ant. Plugins required by the build process are automatically downloaded by Gradle.

Mercurial Mercurial<sup>2</sup> is a distributed versioning tool, comparable with GIT, Bazar, or (to some degree) Subversion. The sources of the driver generator are located in a mercurial repository. If you acquired the sources (and this document) through other means mercurial is not required.

**Doxygen** Doxygen<sup>3</sup> is used for generation of a Java API-like html description of the driver API. While this generation is not required for the driver, it is highly recommended for easier integration of the generated driver.

<sup>&</sup>lt;sup>1</sup>available at http://www.gradle.org/

<sup>&</sup>lt;sup>2</sup>available at http://mercurial.selenic.com/

<sup>&</sup>lt;sup>3</sup>available at http://www.doxygen.org/

Xilinx Toolsuite In order to generate a .bit and an .elf file that can is used to program the FPGA, the Xilinx toolsuite is required. This includes ISE for generating IPCores out of VHDL files, XPS for composing these and synthesising the .bit file, and SDK for generation of the .elf file.

Interaction with these tools is reduced to a minimum by using the command line interface provided by Xilinx whenever possible. This in turn also implies, that the tools can not be used for the actual board design, but only for the synthesis process.

Both XPS and the Xilinx SDK have to be accessible from console using the commands xps and xsdk respectively.

C/C++11 Compiler The currently provided host-side driver is written in C++11. Hence, a C/C++11 compiler is required as well. For design of the host-side application build on top of the api, a development environment might also be desirable. The generated code has been successfully compiled using gcc 4.7.2. As development environment, Eclipse 4.2 (Juno) with CDT 8.1 or above is recommended.

## 2.2 Board Description Language

In this section, the *board description language* (abbreviated: *bdl*) is introduced. The language is used to specify a board, for which a driver and project files should be generated.

A board specification file consists of several *declarations*, that may appear in arbitrary order. For improved readability of the board description file, it is advisable to provide declarations in the same order as they are explained in this document.

In these declarations *blocks* and *code blocks* are used. A block wraps additional properties of a specific declaration. It is surrounded with curly brackets {}. A code block is a block that directly be induced in the driver, i.e., it contains code in the target language of the driver. The code is surrounded by curly brackets and colons {::}.

A board description is considered to be *correct*, iff all all required declarations are made, all referenced declarations occur and match, and no duplicates are encountered. The following explanations of BDL declarations contain further information of what properties have to hold in order for the board description to be correct. Naturally, syntax errors will also result in an incorrect board description.

Please note, that code within code blocks is not analysed and therefore not used for the determination of correctness. A board description is considered to be correct, even if the code block contains errors. It is therefore advisable, to use code blocks cautiously.

#### Import

Import declarations reference additional board description files, that also should be used to generate the driver. The driver generator will collect all imported files recursively and compile one large driver out of all these files. As a result, a file is considered to contain no imports but all declarations of the imported files. Circular imports are ignored. Correctness analysis is only performed on the complete, composed board model, not on individual files.

```
import "some/samplefile.bdf"
```

The declaration consists of the keyword import followed by a string containing the path of the file to be imported.

#### Medium

The medium declaration describes over which medium board and host should communicate with each other. There has to be exactly one medium definition in a complete board description.

```
medium ethernet { ... }
```

It consists of the keyword medium followed by the medium identifier and a block describing medium-specific properties. Depending on the chosen medium, several properties are possible and required.

**Ethernet** The Ethernet medium is selected using the keyword ethernet. The following block has to specify the mac address, ip address, subnet mask, standard gateway and port number.

```
medium ethernet {
    mac "00:0a:35:00:01:02"
    ip "192.168.2.10"
    mask "255.255.255.0"
    gate "192.168.2.1"
    port 8844
}
```

All theses properties are specified in a rather intuitive way. Missing any of these properties marks an error, since there are no default values defined.

**USB/UART** Connection over USB/UART is done with the keyword uart. No property block is provided for this communication medium as no further configuration is required.

```
medium uart
```

#### Scheduler

With use of the scheduler declaration, it is possible to override the default scheduler on the board. This can improve general driver performance for specific applications.

```
schedule {: ... :}
```

The declaration consists of the keyword schedule followed by a code block containing the code of the user-defined scheduler.

Note, that no guarantees can be given for a user-defined scheduler. For a more detailed description of the default scheduler and actions required by a user-defined scheduler, check out the board-side control flow graphs in Section 3.3.1.

#### **Options**

Several global properties of the board are specified directly without using distinct blocks. At this time, this includes the debug flag and global queue sizes (see Section 3.2.1 for more details about the different queue types). Queue sizes have to be positive, but may be 0.

```
debug
swqueue 128
hwqueue 32
```

The debug flag results in additional console output of the generated driver. Note, that this output is sent over UART and therefore significantly slows down the driver.

**Important note:** Debugging over UART **vastly** slows down the board side driver. The performance impact is so big, that it can even lead to ICMP timeouts and the complete Ethernet communication breaking down.

If not specified otherwise, debugging is disabled and the queue sizes are set to default (1024 for the software queue, 64 for the hardware queue).

#### **GPIO**

The GPIO declaration is used to add GPIO devices to the board design. These devices are integrated deeper in the board design and require explicit treatment.

```
gpio in buttons {:
  if(state == 1) reset();
  else if(state == 2) setLEDState(10);
:}
```

As all other declarations, the GPIO declaration begins with a keyword. The next part of the declaration is a direction specifier, telling the generator if the component is an input, output or dual component. Afterwards, the identifier of the component is required. A GPIO component with this identifier has to be available on the board, the driver is built for. Finally, an optional code block can be used to override board-side standard behaviour of the GPIO component. This code block is only valid for input components (or the input part of a dual

component). The default behaviour for input components is forwarding of the value to the host-side driver.

#### Core

A core is a template for components on the board. It contains behaviour specification in form of vhdl files and a matching description of the interface.

```
core adder 1.00.a {
    ...
}
```

A core is declared by the keyword core followed by the name of the core and a version string. These two values are used as unique identifier of the core and the combination of both may occur at most once within a board description. Properties of the core are described within a following declaration block. This block consists of two parts. First is a list of sources, that describe the implementation of the core.

```
source "cores/adder_1_00_a.vhd"
```

Source references are very similar to import declarations and only differ in the used keyword and allowed occurrences within the bdl file.

Important note: To build a driver with the Xilinx ISE workflow the driver generator requires the top model source file to contain a VHDL entity with the same name as the specified core. It is further recommended to use the same name for the source file itself. As a result the core, sourcefile and entity share the same name. This property is not checked by the driver generator, since it does require analysis of the provided VHDL code which is not implemented yet. It may still cause the synthesis process to fail. VHDL analysis will be a feature of a later version.

The second part of the core declaration block describes the interface of the core.

```
port in in1, in2, in3
port out out1 {
   width 16
}
```

An interface description consists of several port declarations. A port is either in-going, out-going or dual. The port specification looks somewhat similar to the gpio declaration. The keyword port is followed by a direction specifier and an identifier. Identifiers have to be locally unique and may therefore occur only once within a core. It is possible, to provide several, comma separated identifiers to declare several ports sharing the same properties. These properties are described in another block following the port declaration. This currently only includes the bitwidth of the port. The block can be omitted, resulting in standard values to be used. The default bitwidth of a port is 32-bit.

VHDL analysis can also be used to simplify the specification of the core interface, since it already is fully specified in the VHDL file.

Important note: The ports described using this syntax have to be AXI4 stream compliant and the ports of the vhdl component have to follow a strict naming pattern. The four ports that make up an AXI4 stream have to be suffixed \_data, \_ready, \_valid and \_last accordingly. For example, an in-going AXI4 stream port inA requires the basic ports inA\_data, inA\_ready and so on. Also carefully check directions and widths of these ports, since VHDL code is not analysed and will result in very late error detection.

Finally, the core requires bindings for the implicit clock and reset ports. These may be interleaved with the "normal" AXI stream ports.

```
clk aclk 100
rst aresetn 0
```

The keywords clk and rst begin these bindings. Following the keyword is the identifier of the port in the VHDL component. For the clock port, the frequency (in MHz) can be specified afterwards. The reset port requires a polarity flag, indicating on which signal the component is reset.

#### Instance

The instance declaration instantiates a core on the board. This effectively creates a component on the board with the behaviour and interface specified by the core. The interface declaration itself connects this component with other components.

```
instance adder 1.00.a adder1 {
   ...
}
```

An instance declaration begins with the keyword **instance**, followed by a reference to the used core and the identifier of the instance itself. The referenced cores are required to be declared within this board description, and used instance identifiers may occur at most once.

A property block is used to make connections between the ports of the core instance with other core instances.

```
bind in1 myAxis
cpu in2, in3 {
    swqueue 10
    hwqueue 5
}
cpu out1 {
    poll 2
}
```

The bind keyword connects a port with an axis. An axis is basically a connection between exactly two ports. Using the same axis identifier for more than two ports is illegal. The keyword cpu connects the specified port to the board-side driver. The host-side driver will provide methods for direct communication with these ports of the component. Again, a block is used for properties of these driver-attached ports. Specifiable properties include queue sizes (see

Section 3.2.1) and automatic value forwarding to the host-side driver (this is explained in more detail in Section 3.2.2). If not specified otherwise, the global queue sizes of the board are used and forwarding is enabled.

For both bindings, the referenced port has to exist within the core declaration.

#### 2.3 Generation Backends

The driver generator relies on so-called *generation backends* (or simply *backends*) for code generation. All relevant resources and steps to create the desired output are performed by the backend.

Three different kinds of backends exist:

- workflow backends describe the workflow of synthesis tools. The output of such a backend are all files comprising the board-side driver, that can be uploaded to a corresponding board. Depending on the workflow, different files are generated. For example, a Xilinx ISE workflow usually generates a .bit and an .elf file. For certain boards, additional files are required, e.g. the ps7\_init.tcl for the ZedBoard.
- host backends are responsible for generation of the host-side API. They provide all sources required for communication with the board-side driver and an easy-to-use interface.
- board backends model the target boards of the driver generator. They are different from the other generation backends, since they do not generate code themselves. Rather, they provide board-specific attributes to the actually generating backends.

#### 2.4 Process

If the tools described in Section 2.1 are correctly installed, the following steps should provide you with a working version of the driver generator.

#### 2.4.1 Checking out the Generator

First of all, the project has to be checked out (if you are reading this document, you have probably done that already). As of now, the project is only available at the mercurial repository of the Softech Group of the University of Kaiserslautern.<sup>4</sup> The the complete command for the initial checkout is:

hg clone https://softech.informatik.uni-kl.de/hg/public/loopy/
<target dir>

Later updates can be performed using hg pull -u (which will pull and immediately update the local copy afterwards.

 $<sup>^4</sup> The\ repository\ is\ located\ at\ \texttt{https://softech.informatik.uni-kl.de/hg/public/loopy/}.$ 

#### 2.4.2 Building the Generator

#### 2.4.3 Running the Generator

The HOPP Driver Generator can be called using a command line interface (CLI). The CLI offers several parameters to further configure the run of the generator, listed in Tables 2.1 and 2.2. Note, that individual backends may provide additional parameters, that are not listed in this table. The CLI parameters are exclusively used to modify the behaviour of the driver generator. The behaviour of the driver itself is only influenced by the provided .bdl file.

Additionally, a board description file is required by the generator. So a complete call looks like the following line:

java -jar driverGenerator.jar [OPTIONS] <bdl file>.

The output of the generator consists of three groups of files. The first group contains all files that make up the host side of the driver, that can be used to wrap communication with the board and its components. This usually is some sort of library in the chosen target language. The second group of files are those that make up the board side driver. So far, this is a single bit file, which already is initialised with the elf file. A third group consists of temporary files required for generation of the board-side driver. These files usually do not require user interaction, but can be used for debugging in case of errors.

In addition to these sources, documentation for both host- and board-side sources is generated, using doxygen.

**Logging** In addition to the usual logs, the driver generator does aggregate the output of the generation process in logfiles. These can be found in the temporary directory and should be consulted in case of an error.

#### 2.4.4 Executing your Application

Now the regular design process from Xilinx can be continued. The next step would be writing a host application that uses the generated host-side driver and API (see Section 3.1 for a overview over this API). The board-side driver is represented by the .bit and .elf file resulting from the command line calls preformed by the driver generator. After the FPGA has been programmed

 $<sup>^5\</sup>mathrm{This}$  should result in the usage help and an error, since no <code>.bdl</code> file has been specified.

| -s         | server    | Select the board backend. The board backend is responsible                                                          |
|------------|-----------|---------------------------------------------------------------------------------------------------------------------|
|            | board     | for generation of the board-side driver and is dependent on                                                         |
|            |           | the target platform.                                                                                                |
| -c         | client    | Select the host backend. The host backend is responsible for                                                        |
|            | host      | generation of the host-side driver. Usually, a single backend                                                       |
|            |           | is provided per target language.                                                                                    |
| -S         | serverDir | Specifies the board directory. The board-side driver, i.e.,                                                         |
|            | boardDir  | the .bit and .elf files will be generated in this directory.                                                        |
|            |           | If none is specified, a directory "server" will be generated inside the current working directory.                  |
| -C         | clientDir | Specifies the host directory. All files for the host-side driver                                                    |
|            | hostDir   | will be generated into the specified directory. If none is                                                          |
|            |           | specified, a directory "client" will be generated inside the                                                        |
|            |           | current working directory.                                                                                          |
| -t         | temp      | Specifies a directory for temporary files created by the gen-                                                       |
|            |           | erator. This primarily involves files used by the Xilinx tool-                                                      |
|            |           | suite to create the .bit and .elf files. If none is specified, a                                                    |
|            |           | directory "temp" will be generated inside the current work-                                                         |
|            |           | ing directory.                                                                                                      |
| -q         | quiet     | Sets the log level to quiet. In this mode, the generator will not provide any console output, except for exceptions |
|            |           | preventing generation.                                                                                              |
| <b>-</b> ₹ | verbose   | Sets the log level to verbose. The driver generator will                                                            |
|            |           | print out additional console output, that can be used to                                                            |
|            |           | track errors in .bdl files and (potentially) the generator.                                                         |
| -d         | debug     | Sets the log level to debug. The driver generator will print                                                        |
|            |           | even more console output, that is used solely for debugging                                                         |
|            |           | the driver generator.                                                                                               |
|            |           |                                                                                                                     |

Table 2.1: Summary of current, global CLI parameters, Part I

| -p       | parseonly | Only executes the frontend of the generator, checking for simple parser errors in the provided .bdl files. Can be used to debug the board description.                                                                                                           |
|----------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -ndryrun |           | Also performs analysis steps of the backends, but does not generate the actual drivers. Also does not run Xilinx toolsuite. Can be used to further debug the board description.                                                                                  |
|          | nogen     | Skips generation phases of the backends. Does still deploy<br>all source files though. This is useful for debugging the gen-<br>erated drivers and for rebuilding only the host-side driver.                                                                     |
|          | sdkonly   | Executes only the Xilinx SDK run building the .elf file without generation or deployment steps for the .bit file. Use this flag, when changing parameters that only influence the software part of the board, e.g. software queue sizes or polling flags.        |
|          | gui       | Show the GUI of the generator instead of running directly with the provided parameters. The generator will still parse parameters provided and configure these in the interface.                                                                                 |
|          | config    | Run the generator using a provided configuration file. Usage of other values on the command line call will override entries from the configuration file. Using the GUI parameter in addition will start the GUI with the parameters from the configuration file. |
| -h       | help      | Lists all CLI parameters and a short explanation. The generator will abort after parsing this parameter and not generate anything.                                                                                                                               |

Table 2.2: Summary of current, global CLI parameters, Part II

using these files, your program should be able to communicate with VHDL components on the FPGA through the API provided by the host-side driver.

Note, that it is required to call the startup method before actually sending data over the driver. It is also recommended, to shut the driver threads down properly by using the provided shutdown method. Both these methods are contained within api/setup.h.

## Chapter 3

# **Driver Description**

This section should provide a conceptual overview of the driver parts. For pure users of the driver generator, only parts of the client side are really relevant. Developers might also be interested in the server part.

For a more detailed documentation of the code and provided methods, Javadoc style comments are provided, which can be transformed into an html or tex representation similar to the Java API specification using doxygen (see Chapter 2 for details on how to enable documentation generation).



Figure 3.1: A high-level view of the data flow from a host application to vhdl components on the board through the generated driver

The overall architecture of the driver is depicted by Figure 3.1. Data is sent from an embedding host application to the host-side driver. This driver communicates over some transport medium with the board-side driver, which in turn distributes the received data to corresponding VHDL components on the FPGA. Results are sent back through the same chain.

### 3.1 API

This section describes, how the host-side API, the hardware designer programs his application against, should look like. This helps designers in writing their application and describes how host-side drivers in other languages should be implemented in general.



Figure 3.2: Exposed classes of the host part

The exposed architecture of the host-side driver is depicted in Figure 3.2. A board is described using multiple *components*. A component is a designed hardware unit, which can have several *ports*, over which data can be sent to or from the component.

#### 3.1.1 Component

A component is a piece of designed hardware. It has multiple ports over which communication can take place, i.e. data is sent from or to the component. Usually components receive data, process it and send back some results, though on another port.

The host driver contains an abstract generic component, describing components in general. For each user-defined core, a new subclass is created, which contains the ports specified in the core description. For each instance of a core on the board, an object of the cores subclass together with all its ports is instantiated in the driver respectively. Communication with the boards components happens through these port objects.

Details about the implementation of components can be found in Section 3.4.1.

**GPIO** components These components are specialised, predefined I/O components, used to directly input or output a signal on the board. The GPIO representation in the driver enables the host-side application to write or read this signal. A GPIO component is either used as input or output component. An input component, enables input of a signal. An example for such a component is the pushbutton component on the Virtex 6. An output component

does the opposite and displays a signal on the board. The Virtex 6 has an LED component which is classified as output component. The signal to be displayed can be written to such a GPIO component.

Since such components only hold a single state and perform no processing, communication is not handled via ports. Instead, it is possible to directly read or write the state of the component.

#### 3.1.2 Port

A port marks an AXI stream interface used to send data to or receive data from components. A port is always assigned to a single component, but a component can have multiple ports. Ports can be sending (in-going), receiving (out-going) or bi-directional (dual). These designations seem counter-intuitive at first, since they do not describe the ports on the host-side, but the ports of the driver itself. The user should work with the driver, as if it were the actual board. Consequently, data is sent to an in-going port an received from an out-going port. Values sent to or received from a port have an arbitrary, but fixed bitwidth. Ports are the only way to communicate with components or the board in general (beside aforementioned GPIO components).

Ports allow *synchronous* as well as *asynchronous* communication. A synchronous write to a port waits for the message to be delivered to the component. A synchronous read waits for a value to be received. Asynchronous operations do not wait, but return immediately. Instead, a *task* is scheduled for the operation, which will be performed asynchronously. While the order between tasks and therefore values to a single port is maintained, the order between tasks executed at different ports may differ from the order they were scheduled in.

**Important note:** Synchronous writes are currently not fully supported. The driver does not block until the component has received the written value, but only until the board-side driver has acknowledged and stored the value. <sup>1</sup>

#### 3.1.3 State

Another important part of the API is the *state*, which describes the current progress of a read or write operations. A state is returned by asynchronous operations and can be used to keep track of the operations progress. A *write state* indicates, how many values have already been written to the board (and how many have not), a *read state* indicates, how many values have already been read to the given memory area (and how many still remain). Synchronous

<sup>&</sup>lt;sup>1</sup>To understand the reason for this, one has to consider hardware queues described in the following chapters. The board-side driver is not directly connected to the target component. It only knows, that a value was successfully stored in the hardware queue. A software queue acknowledgement is used instead of this hardware queue acknowledgement, since it would provide no additional value for the user. Real component acknowledgement would require feedback of the empty flag of the hardware queue itself, either over another AXI stream port (reducing the total number of usable ports to 15 for the microblaze) or interrupt flags (which - from a software-engineers point of view - is a horrible way to do things).

operations do not return a state, since such an operation is always finished, once it returns.

#### 3.2 Architecture

This section provides an overview of the drivers architecture, that should be followed by all implementations.



Figure 3.3: General architecture of the host-side driver

The architecture of the host-side driver is depicted by Section 3.2. The host-side API consists of the components and ports described in Section 3.1, as well as modules handling communication with the medium, i.e., a reader and a writer module. An application communicates with the driver through components and ports.

In-going ports are connected to the writer, while out-going ports are connected to the reader. Writer and reader are not exposed to the application and steps necessary for communication are performed internally.

The architecture of the board-side driver is depicted by Section 3.2. The board side driver is basically a thread running on the processor of the board. It handles communication with the medium similar as the host-side driver through a writer and a reader module. Another module handles communication with the components implemented on the board. In our case, the components are attached with AXI stream interfaces, but using other interfaces is generally possible.



Figure 3.4: General architecture of the board-side driver

#### 3.2.1 Queueing

Due to restricted resources on the board and desired parallelism between the different components and ports, queueing is an important aspect of the driver and different kinds of queues are introduced.

The first and most important type of queues are the host-side *task queues*. Such a queue is introduced for each port and stores operations performed by the user. A write operation does not immediately result in a message sent to the board.

An asynchronous write returns without its values being sent to the board. Even worse, the board may not be able to receive (all) values at this time and require the host to (re)transmit values at a later time. To enable delayed transmission or retransmission of value without stalling the application, queueing of these values is necessary. This is done within the so-called *task queues*. For in-going ports, theses queues store write operations, all their values, and the number values, that have already been acknowledged by the board. Out-going ports also have a task queue, storing the number of already read values and a memory area, to where they have been read.

The write task queues provide a first mechanism of *flow control*. Values are queued up at the host, when the board buffers have been filled and no more values can be received, until the board signals that it can receive more values now. Additionally, they offer *congestion avoidance* to a small degree,

since several small write operations can be packaged together in one message. This reduces the overall number of messages being sent and in accordance the communication overhead produced.

Host-side out-going ports define a second queue, the *value queue*. These queues cache values received by the board. Host-side caching of values reduces read cycles by the application, since values are readily available when a read operation is performed. Resource consumption on the board can be lowered, since only a small number of values have to be cached board-side. Read queues are usually assumed to be infinitely long and values are *forwarded* from the board into theses queues automatically (cf. Section 3.2.2).

With only host-side queues, the board-side driver would only be capable of receiving and processing single values at a time, which have to be directly written to a component. Naturally, sending each value individually results in an increase of messages and therefore communication overhead. To enable reception of multiple values at once, a board-side queue (or software queue) is introduced for each in-going port on the board. Received values are stored in these queues first and forwarded to the target component later on, if the component can process further values. In contrast to the client-side queues, are size-restricted and can only hold a certain number of values, specified in the board description file. Similar queues are introduced for out-going ports, to reduce traffic caused by values being sent back to the host-side driver. These queues cache results from components which are then send in one message.

Since the FPGA usually only has a single processor and is not capable of multi-threading, the board-side driver can only perform a single operation at a time. If a component finished processing and requires a new value to continue, it has to wait until the driver serves the corresponding in-going port. To reduce this downtime and increase overall throughput, a queue is implemented in hardware and placed in between the component and the stream interface of the processor. These hardware queues provide the component with a new value as soon as required. Re-filling of the hardware queue may take several cycles, but values can then be written en block without regarding the computation speed of the attached component. Since hardware queues are actually less efficient in terms of resource consumptions on the board than the software queues, they should be even smaller in size. Again, similar queues on out-going ports cache results, so the component does not have to wait for the driver thread to read the value from the port before the next computation.

#### 3.2.2 Forwarding

Values generated on out-going ports on the board are usually automatically forwarded to the read queues of the host-side driver. This leads to shorter read cycles, since values are usually available at the host-side driver and no polling cycle has to be performed. In addition, board-side queues at out-going ports can be smaller, since values can immediately be sent as soon as available without waiting for a poll. They are instead stored on the host, which is assumed to have more memory available, until the application actually requests the computed

values. However, depending on the application and the specific port, forwarding can also lead to problems:

- ports generating a constant, infinite stream of values, e.g. random number generators, will never stop sending values
- considering status ports, a read operation should return the current status, not some stored value
- forwarding in general can lead to unnecessary traffic, since values are sent without being required and as soon as available

To counter these problems, loopy also supports traditional polling ports. These are basically ports without value forwarding. Instead, the host-side driver explicitly requests values, as soon as a read operation is performed on such a port. While this results generally in longer response cycles compared to forwarding ports, values are only sent if required. This obviously is imperative for status ports or ports generating an infinite amount of values. In other scenarios, polling ports may reduce medium traffic in certain cases, but also results in poll messages being sent by the host-side driver, increasing traffic in other cases. Therefore, switching of such a port into polling mode should be considered carefully.

While it does not always make sense (e.g. in the case of status ports), polling ports may still have a read queue. This is especially useful in the scenario of a random number generator, where values are endlessly generated without additional input and not constantly required, but are still be processed en block. Compared to forwarding ports, the read queue of a polling port is limited. The driver will automatically keep the read queue filled. Reading of a set of values will cause the host-side driver to request exactly this amount of values, re-filling the queue.

In contradiction to the usual propagation of queue size parameters, all queue sizes of polling ports (including board-side queues) are set to 0. If other queue sizes are desired, they have to be explicitly declared in the port instance definition. This is done to prevent user errors due to incorrect board-side queue sizes and convenience, since the most common application of polling ports are status ports.

Note, that forwarding and polling have no effect on the host-side API. Only the response times and memory behaviour of the driver may change.

#### 3.2.3 I/O Threads

As stated in Section 3.1.2, the driver does support asynchronous writes and reads. The order of messages to the same port has to be maintained, while the order of messages to different ports can be changed. The processing of operations at ports is independent from other ports. In particular, a synchronous operation on a port should not influence an asynchronous operation on another port.

Consider the following example to understand why earlier, asynchronous operations should still be processed during a synchronous operation. A board

specification contains an adder component with two in-going ports and an out-going port. If a values are written at both ports, they will be consumed and a result will be returned at the out-going port. Consider further, that a value is currently stored at port A. The user now writes another value to port A asynchronously, followed by writing three values to port B synchronously. The value for port A cannot be written directly, since port A is still blocked with a value. However, the first value for port B can be written, and both values are consumed. Now the second value for port B can be written, until the second value is consumed, which requires another value at port A. If write operations to different ports are performed sequentially, the asynchronous write to port A will never reach this port, since the application is still blocked from the write to port B. Allowing asynchronous writes to be processed in parallel resolves this situation. The value for port A can be written in between the writes to port B or at any point afterwards.

Parallel, independent processing of operations can be realised by dedicated I/O threads, handling communication between host-side and board-side driver. These threads modify the client-side queues described above in Section 3.2.1.

The most intuitive solution to enable independent processing of requests on ports is the introduction of a single thread for each port. However, this results in a possibly large number of threads. Depending on the host platform, a smaller number of threads might be preferable. It is not required to parallelise all ports, but only continue reading and writing of values while the API is blocked by a synchronous operation. This can be achieved using a single, dedicated I/O thread, that handles all communication between host-side and board-side driver. Since reading from the medium involves longer periods of listening for messages, which would unnecessarily delay write operations, it is preferable to introduce separate threads for writing and reading. Consequently, three threads are running on the host: the application thread itself calling the API, and the two I/O threads handling communication between the host-side and board-side driver.

The threads have to be started with a special method startup() before using the API and should be shut down afterwards with another method shutdown().

#### 3.2.4 Bitwidth Translation

As described in Section 3.1.2, ports allow arbitrary bitwidth. As the size of the AXI stream interface directly connected to the FPGAs processor is usually fixed, width translation has to occur somewhere in between. Performing this translation on-board consumes limited board resources. Doing the translation on the client however increases communication overhead, since padding bits have to be sent for non-32-bit-aligned values. Depending on the application, both solutions can make sense.

Generally, the translation is performed by splitting and padding of values. If the value is smaller than a multiple of the target bitwidth, it is padded with

leading zeroes. The value is then split into fragments with the target bitwidth. Reassembly of the value always has to occur in hardware after the values have been sent over the FPGA processors AXI stream interface. To increase memory efficiency of the hardware queues, reassembly is performed before these queues.

Since bitwidth sizes are expected to mostly fit multiples of the processors interface (32-bit in the case of a virtex6 microblaze), we chose the approach of host-side translation. In this case, the bitwidth is hidden from the I/O threads and the write and read states provide a view in 32-bit as well as a view in their actual bitwidth.

#### 3.3 Protocol

This section covers the transmission protocol between client and server applica-

The protocol is based on the assumptions made above about client and server, specifically the used buffers and client I/O threads allowing parallelism with sending and receiving messages. Furthermore, the underlying medium and protocol are expected to be reliable and order-preserving.

#### 3.3.1 Control Flow

This section describes the state of the driver and summarises messages that are sent during each transition. To simplify the control flow graphs, the following variables and functions are introduced:

- v Values of some sort (e.g. an array of integers).
- a Memory addresses (e.g. an array of addresses).
- n The size of the board-side queue (statically known).
- i An unsigned (i.e. positive) integer number, smaller than n.
- size(q) The number of values currently held by queue q.
- empty(q) true, if size(q) == 0, false otherwise.
- full(q) true, if the queue is full, false otherwise.
- store(q, v) Stores the values v into queue q.
- take(q) Returns the first value of a queue q. Removes the value from the queue.
- drop(q,i) Drops the first i values of q.
- peek(q,i) Obtain the first i values of q. Returns a smaller number of values, if only less are available, i.e., size(q) < i.
- asgn(a,v) Assigns a value v to an address a.

Messages are exchanged between the application and the client-side driver, as well as between client-side and board-side driver. The following messages are the most important ones influencing the state of the driver:

- wrt(v) Request from the application to write values v to the port.
- read(a) Request from the application to read values to the addresses a.
- **notify** Notifies the application that all tasks queued up at a port have been processed.
- data(v) A data message containing values v for or from a port.
- ack(i) Acknowledges successful reception of i values.
- poll A request message for additional data.

The communication between host-side and board-side driver usually happens in the context of a port. Since messages sent to or from ports are independent from each other, each port maintains its own state and can perform transitions independent of the other ports. The following paragraphs describe the control flow of these ports. Note, that the host-side and board-side state of the same port directly interact with each other, i.e., exchange messages with each other.

The state of a bi-directional port is represented by the concatenation of the corresponding states for in-going and out-going ports. The overall state of the host-side, or board-side driver respectively, is constructed by concatenation of the individual states of all ports.

As stated in Section 3.2.3 not each port has its own thread. This is especially true for the board-side driver application, which consists of only a single thread. Instead, a scheduler decides which port may perform an transition. Usually, the schedulers for the individual threads simply iterate over all ports as long as there are values (with some upper bound). The only exception is the client-side reader thread, which consumes messages as soon as they arrive but in turn does not send messages on its own.

#### **In-going Ports**

On the host, the state of an in-going port is physically represented by two variables q and s. q denotes the task queue, while s stores the number of values in transit, i.e. those, that have been sent but not yet acknowledged. A more abstract view on the state of an in-going port is provided in Section 3.3.1. Each state represents a combination of these variables. Changes to these variables are not explicitly denoted in the diagram, but should be intuitive considering above method description.

If a write from the application occurs (triggered by the user performing a write operation), the write operation and its values are stored in the task queue. If the task queue is not empty, the writer thread will take a peek at the first n values and send them to the board-side driver, also setting the transit counter correctly. It then waits for an acknowledgement of these values. Acknowledged values get removed from the task queue. If all sent values got acknowledged in one go, the writer continues with the next set of values. If not all values got acknowledged, the board-side queue has been filled. In this case, the writer thread waits for a board-side data poll. Once the task queue has been cleared, the application is notified. If the last performed operation on this port was a blocking one, the application may now continue.



Figure 3.5: Host-side control flow graph of an in-going port

Several loop transitions have been left out in order to simplify the graph. Messages ack or poll in other states than specified in the graph will simply be ignored. Application writes in any state other than the ones explicitly marked will result in the values to be appended to q. Reception of a debug message at any state results in immediate printing of the message to the configured logger.

The state of a port on the board is represented by the corresponding software queue  ${\tt q}$  and hardware queue  ${\tt r}$ .

An in-going port, as shown in Section 3.3.1, sends acknowledgements for received data packages and stores received values. If values have been stored and the hardware queue is not already full, the scheduler might switch the port to consuming messages. In this state, values are shifted from the software queue to the hardware queue, until either the hardware queue is filled or the software queue is emptied. If the software queue was full before shifting the first message, a poll is sent in addition. After shifting all values possible, the port switches back to listening for more values. Note, that these states are not actually represented within the ports themselves, but only by the current position of the scheduler.



Figure 3.6: Board-side control flow graph of an in-going port

#### **Out-going Ports**

The state of an out-going port on the host is represented by two queues q and r. q is the read task queue, which contains memory addresses, where values should be read to. r is the value queue, which stores values read from the medium but not requested from the application so far.

Similar to in-going ports, read requests from the application are stored in the task queue. Values received from the board are stored in the value queue. While both are not empty, values are shifted from r into q. The application is notified, once all read requests have been served (see Section 3.3.1). If the port is forwarding, values generated on the board are automatically forwarded to the host-side driver and the host-side queues are assumed to be unbounded. Consequently, no form of flow control is required.

On the board, out-going ports are represented by their software and hardware queues  $\mathbf{q}$  and  $\mathbf{r}$ , similar to in-going ports. The board representation is comparably simple (Section 3.3.1), since no incoming messages have to be processed. Instead, values are simply shifted to the software queue as long as possible, i.e., until either the hardware queue is emptied or the software queue is filled. All collected values are then transmitted to the host-side driver.

#### **Polling Ports**

Polling ports mark a more complex out-going port, that does not automatically forward values. Instead, values are only forwarded once a poll message is received.



Figure 3.7: Host-side control flow graph of a forwarding, out-going port

The change for a host-side port is negligible. A poll is sent to the board-side driver, whenever a read request is dispatched from the application. Flow control is completely handled on the sending side using these poll messages.

The board-side graph gets slightly more complex (Section 3.3.1), since it now also has to process an incoming message. The state of the port gets augmented with a *poll counter* s, storing the number of requested, yet unserviced values. The counter is initialized with the host-side queue size. Poll requests are simply added to the counter. If the poll counter is greater than zero, the port may send values. The sending state is comparable to the forwarding port. It shifts values until either the software queue is full, the hardware queue is empty, or the poll counter reaches 0.

The port is only allowed to send as many messages as requested, but is also restricted by the software queue size n. Consequently, it can only send max(s,n) values per step.

#### 3.3.2 Message Encoding

This section covers the translation of the above messages into messages on the communication medium. Messages can be split into header and payload. The header describes the payload to follow, the payload contains a number of 32-bit values that are sent to or from a component.



Figure 3.8: Board-side control flow graph of a forwarding, out-going port



Figure 3.9: Board-side control flow graph of a polling, out-going port

The first 8 bit of the header are reserved for the *protocol version*, the message was encoded with. The only version currently available is version 1.

**Procotol Version 1** This version reserves the next 4 bit of the header for the *type* field, which determines which kind of message is represented. Depending on the type, the 4-bit *ID* field is used as identifier for either ports, gpio components or error types. Finally, the 16-bit field *size* marks either the size of the payload in 32-bit values, or, for messages with only small data content, is used directly to store the data without utilizing the payload field, i.e. a payload size of 0. This leads to messages, that generally look as depicted in Figure 3.10.

Values in this protocol version are expected to be properly aligned with 32-bit, meaning that, as described in Section 3.2.4, values are padded host-side to a multiple of 32-bit and then split into 32-bit blocks. Note, that due to the size being only a 16-bit field, only  $2^{16}-1$  such values can be written or read with a single message. The API has no such restriction and host-side queues are not bound by any size constraint either. This scenario has to be treated in the I/O thread, by splitting tasks accordingly.



Figure 3.10: Bitorder of the message on the medium

The control flow graphs in Section 3.3.1 specify what types of messages are required. A short overview of the messages together with their type encoding and their fields is provided in Table 3.1. The following paragraphs provide a more detailed description of the message and their meaning.

| Message | Type | ID       | Size         | Payload |
|---------|------|----------|--------------|---------|
| Data    | 1001 | Port ID  | Payload Size | Yes     |
| GPIO    | 1110 | GPIO ID  | GPIO State   | No      |
| Ack     | 1111 | Port ID  | Ack Count    | No      |
| Poll    | 1010 | Port ID  | Poll Count   | No      |
| Reset   | 0000 | Unused   | Unused       | No      |
| Debug   | 0111 | Severity | Payload Size | Yes     |

Table 3.1: Overview of messages in protocol version 1

Data Message A data message marks either a set of values for a specific component being sent from the host to the board or a set of values from a specific component being sent from the board to the host. As such, it requires an identifier for the target or source component as well as the size of the contained payload in words (i.e. 32-bit values). Values sent with a data message are expected to always be full 32-bit values.

The direction the data message is sent in, determines if it is addressed at an in-going or out-going port. Data messages from the host are directed at in-going ports, data messages from the board are directed at out-going ports. Consequently, 16 in-going and 16 out-going ports can be addressed using a 4-bit ID field.

**Acknowledgement** The acknowledgement confirms reception of a number of values by a specific component. For this purpose, no payload is required. Instead, the number of acknowledged values is encoded within the *size* field.

Data Request (Poll) A data request is used to inform the host, that additional values can now be received. This is necessary if the queue was full

beforehand, i.e. a partial acknowledgement was (most likely) sent. This poll does require neither a payload nor a size, but only the identifier of the component, that can now receive values.

The data request is also used at polling ports. Here, it notifies the board, the the host requires values from the port. The size field is used to specify how many values are requested.

As with data messages, the direction of the poll message determines if an in-going or out-going port is addressed. However, the translation is inverse to the data message. A poll from the host is addressed at an out-going port, a poll from the board at an in-going port.

**GPIO** Message A GPIO message is a special type of data message, addressed to a GPIO component. GPIO components use their own address space, disjunct from the addresses used by "normal" components. They are not connected via AXI Stream interfaces but direct memory addresses, consequently they do not influence any port restrictions.

Furthermore, GPIO components do only store their current state and perform no calculation like VHDL components. The state is represented by an 8 bit value and is encoded directly in the size field instead of the payload. Storing only the current state also means, that no queues exist for GPIO components and no acknowledgements are required. The new state is simply written into (or read from) memory

Reset Message This message is not specified in the protocol above. A reset message sent by the host-side driver resets the state of the board-side driver, clearing all queues and setting the reset flag for all components. The board-side driver acknowledges a successful reset by answering with a reset message. The target and size fields are unused by the reset message. Reset messages are not implemented so far. The board requires a manual reset, sometimes even a full reboot.

**Debug Message** This message also is not specified in the protocol. It marks a notification of some sort, sent by the board. This can be debug output of the driver running on the board, a warning message about skipped messages or an unhandled error, that occurred on the board. For these messages, the payload contains a string. The target bits are used to differentiate between the debug message type. The bit encoding is shown in Table 3.2. Values in between have been left unused for future use (e.g. finer grained warnings or info messages).

While it is possible to provide debug output over the JTag cable, the board is programmed with, this quickly slows down computation with larger debug outputs. Using the Ethernet connection also for debug output vastly accelerates such computations.

Debug messages are generally disabled in the driver due to board-side memory issues when sending too many debug messages. Instead, debug messages are sent over UART despite the occurring slowdown.

| Severity | Encoding |
|----------|----------|
| Info     | 0011     |
| Warning  | 1000     |
| Error    | 1101     |

Table 3.2: Debug message encoding

### 3.4 Current Driver Implementations

The current implementations include a C++ host-side driver and a board-side driver for the Virtex-6 ML605 board. These implementations are described in detail here, giving driver developers an idea, how an implementation of this architecture can look like.

#### 3.4.1 C++ Host-Side Driver

The following sections will highlight selected, important aspects of the C++ implementation of the host-side driver. For a more detailed description of individual classes and methods, please refer to to the API specification generated by doxygen (remove the EXCLUDE\_SYMBOLS parameter in doxygen.cfg and re-run doxygen to get documentation for all classes) and code level documentation.

#### Structure

The host side driver is structured roughly into three groups of files and classes, located in different folders. The folder *api* contains everything, the application should have access to, i.e., components and ports. The folder *io* contains files handling communication between host-side driver and the medium. These files are not required directly by an application. The third group of files contains utility classes used by both io and api. These files are simply located in the root folder of the driver.

The api files used by the application have been described in some detail in Section 3.1 already. A detailed description of methods generated for writing and reading values can be found in the api specification of a generated driver. It is advisable, to provide these operations for single values and for groups of multiple values. How these values are grouped, depends on the language. The C++ implementation provides these operations for arrays (together with a size parameter) and std::vectors of values.

#### I/O Handler

The driver implements two separate threads for write and read operations, which has been presented as preferred solution in Section 3.2.3. The reading thread utilizes the select method defined in TCP, which waits for incoming messages

without consuming CPU resources. The writing thread iterates over all ingoing ports and sleeps afterwards. A global writer lock ensures, that the writer is indeed notified correctly if new data, an acknowledgement or a poll arrive.

#### Communication medium

The communication medium is part of the I/O handler of the driver and wraps lower-level communication (essentially transport layer and below) between host and board driver. The communication medium abstracts from the actually used technology and provides a homogeneous api for the I/O threads. Network interface specific initialisation is generated as well and is not required by the user (other than annotating configuration details in the board description).

Currently, three communication mediums are envisioned:

- Ethernet Lite, which is already implemented
- USB/UART, which is considered as a second interface, but is not implemented yet and
- PCI Express, which will not implemented in the initial driver generator at all.

#### 3.4.2 Virtex 6 ML 605 Board-Side Driver

The following sections will highlight selected, important aspects of the implementation of the Virtex-6 ML605 board-side driver. For a more detailed description of individual classes and methods, please refer to to the API specification generated by doxygen and code level documentation.

#### Structure

The board-side driver is structured similarly to the host-side driver. There are three groups of files, located in dedicated folders.

The first group, files handling communication over the medium - is located in the *medium* folder. The content of this folder varies, depending on the medium, the board should be attached to. Medium-specific setup and communication is handled here. A contained folder *protocol* includes files for protocol encoding and decoding (independent from the attached medium). An incoming messages is passed to the protocol decoder and subsequently delegated according to its header.

A second folder *components* contains files concerning hardware components of the board and communication with these components. This includes VHDL components attached with AXI stream interfaces as well as GPIO components or the interrupt controller of the board.

The third group of files contains utility functions and structures, used by both other groups. These files are located directly in the root folder. This includes software queues and the setup required for these.

The main method for the board-side driver thread is located in the main.c. It calls initialisation procedures of the medium and all components and starts the scheduling loop.

#### Communication Medium

**Ethernet Lite** Communication over Ethernet is based on the lightweight IP stack, originally developed by Adam Dunkels<sup>2</sup>. It is completely executed on the CPU of the board.

It is possible to implement more efficient Ethernet communication using dedicated VHDL components instead of running the lwip stack on the general purpose CPU. Such a component could easily be integrated in form of a new Communication Interface. An example for a dedicated VHDL communication component together with a client API is described in [1, 2]. This introduces additional communication between the programmable logic and the processor but might still be faster than the current software solution (especially on boards using a comparably slow soft processor). It might also increase network throughput in general, since there exist components capable of Gigabit Ethernet connections, while our current implementation only supports 100MBit.

USB/UART Communication over USB/UART is rather slow, but it provides a simple method of communication most boards are capable of. USB/UART support has been deferred to a later version since support of the zed board has been given higher priority.

PCI Express Communication using a PCI Express interface offers the highest bandwidth of the proposed media. However, it is only available on few boards. While loopy can easily be extended to support additional interfaces and we encourage developers to do so, we do not support PCIe out of the box. According to [1], there exists a Xilinx wrapper for PCIE communication. This component can probably be used when implementing PCIe communication in a similar manner than with dedicated Ethernet components as depicted above.

#### Scheduler

The default scheduling loop performs the following operations:

```
unsigned int pid;
unsigned int i;

while(1) {
    // receive a package from the interface
    // stores data packages in sw queue
    medium_read();
```

 $<sup>^2</sup>$ The lwip stack is documented with a wiki available at http://lwip.wikia.com/wiki/LwIP\_Wiki

First, it checks for incoming messages and process their contents. Most of the time, this includes storing values in the in-going software queue and acknowledging them. More details about message handling can be found in Section 3.3.

```
// write data from sw queue to hw queue (if possible)
for(pid = 0; pid < IN.STREAM_COUNT; pid++) {
  for(i = 0; i < inQueue[pid]->cap; i++) {
    // go to next port if the sw queue is empty
    if(inQueue[pid]->size == 0) break;

    // try to write the first value, skip if the hw queue is full
    if(axi_write(peek(inQueue[pid]), pid)) break;

    // remove the read value from the queue
    take(inQueue[pid]);

    // if the queue was full beforehand, poll
    if(inQueue[pid]->size == inQueue[pid]->cap-1) send_poll(pid);
}
```

The loop also shifts messages from in-going software queues to in-going hardware queues and vice versa from out-going hardware queues to the out-going software queue.

```
// read data from hw queue (if available) and cache in sw queue
// flush sw queue afterwards
for(pid = 0; pid < OUTSTREAM.COUNT; pid++) {
    for(i = 0; i < outQueueCap[pid] && ((!isPolling[pid]) ||
        pollCount[pid] > 0); i++) {
        // try to read, break, if it fails
        if(axi_read(&outQueue[outQueueSize], pid)) break;

        // otherwise increment size counter
        outQueueSize++;

        // decrement the poll counter, if the port was polling
        if(isPolling[pid]) pollCount[pid]--;
}

// flush sw queue
flush_queue(pid);
    outQueueSize = 0;
}
```

Finally, once the out-going queue is filled or no more values are available, it warps values into a message and write this message to the medium.

The loop can be overridden by the user, but is required to perform all these operations at some point for the driver to work correctly. Overriding the default scheduler can increase the performance of the generated driver for specific applications.

#### **Bitwidth Translation**

As explained in Section 3.2 and Section 3.3, transmitted values are padded to a multiple of 32-bit for transmission and have to be re-translated at the board-side driver. Since AXI stream ports of the microblaze processor on the Virtex 6 (as well as the general purpose AXI ports of the Zynq platform) are fixed at 32-bit, this translation has to occur after the software part of the board-side driver. Consequently, a bit-translator component is put in-between the hardware queues on the board and the interface to the Microblaze. This translator is omitted, if the values expected by the attached component are indeed 32-bit values.

#### **Port Count**

The microblaze on the Virtex 6 only allows 16 AXI master interfaces and 16 AXI slave interfaces. As a result, only 16 in- and out-going ports can be specified when generating a Virtex 6 board driver. Circumventing this restriction is possible by implementation of a multiplexer which delegates values to one of several components, but this is left to the user.

#### 3.4.3 ZedBoard

The second supported board is the Avnet Zedboard. In contrast to the Virtex 6, this board has its own processor, a dual core ARM, and does not depend on a Microblaze soft-processor to run the software. Most peripheral devices like the Ethernet adapter are connected directly in the processing system outside the programmable logic.

While removing these components from the design certainly frees some FPGA resources, the ARM processor does not have the same amount of connections to the programmable logic as the Microblaze. Communication between programming logic and processing system is generally performed through two AXI slave and AXI master ports. In addition, there are four high performance slave ports for fast, memory-based transactions.

The board-side driver of the ZedBoard uses a CDMA connected to a single general purpose AXI master and a high performance AXI slave. This CDMA component delegates data to AXI to AXI stream converters when a write occurs at the processor. These components in turn restrict the FPGA space on the board.

Despite consuming memory, it may be considered to target a similar implementation on the Virtex 6. An arbitrary amount of connected slaves to the CDMA increases flexibility in terms of the port count, component acknowledgements for synchronised writes can be easily implemented without losing ports for user components, and the CDMA keyhole feature could improve write performance for larger blocks of values.

There are still several issues in the ZedBoard implementation. GPIO components break the driver and the generated architecture using a CDMA is not complete. Ethernet communication is working and the complete board-side driver code is can be executed without attached programmable logic.

# Chapter 4

# Generator

This chapter is used to explain the code generator itself. This information is intended for future developers of the driver generator, **not** for mere users (you are welcome to read it anyway, if you're interested, but it will not provide you with additional usage information).

## 4.1 Used Libraries

This section gives a short overview over the used libraries and explains what for and why they are used. These libraries are included in the repository and build jar file and require no further user interaction. Still, as the generator code depends on them, they are introduced here for future developers.

#### 4.1.1 JFlex & CUP

Flex is a scanner generator, CUP a parser generator. Both together with a few wrapping Java classes make up the frontend of the generator and are used to parse .bdl files into abstract syntax trees.

#### 4.1.2 Katja

The driver generator uses the Katja tool, developed by the Software Technology Group of the University of Kaiserslautern. This tool generates models for frontend and backend of the Loopy generator<sup>1</sup>. To be more specific, it provides the AST build up by the CUP parser as well as several models used by the generation backends. The models are described in detail in Section 4.3. Katja is available under GPLv3.

 $<sup>^1{\</sup>rm Since}$  these models will be described using their Katja specifications, it is strongly recommended to read through the Katja specification provided in form of three technical reports at https://softech.informatik.uni-kl.de/Homepage/Katja

## 4.1.3 Apache Commons

The Apache Commons libraries provide multiple useful features for all kinds of applications, that are not already integrated in the Java API, for example easy-to-use file copy operations. Loopy uses two packages of this project. The IO library is used for file operations and provides methods for working with file names. The Lang library provides extended functionality for Javas base classes, especially Strings. Apache uses its own license<sup>2</sup> for Apache libraries. While this license is not fully compatible with the GPLv3, Apache licensed products can still be used within GPL projects<sup>3</sup>.

# 4.2 Generation Process

The overall process of the driver generator is described by Figure 4.1. The source .bdl file describing the system board design is first translated into an internal representation of the board, i.e. an abstract syntax tree. This AST is used as input for the generator, which in turn outputs models of all source files. These models are translated into files by the respective unparsers.

Several backends exist to create different types of models. Host backends generate models for source code running on a host machine, board backends generate models for source code running on a board respectively. These backends implement a Visitor, which visits all components of the board, and manipulate the source model accordingly. Currently, the only available backends are the C++ host backend and the Virtex 6 ML 605 board backend.

## 4.3 Models

The driver generator defines several models, which are used to define input and output artefacts. All models are generated from Katja grammars. These grammars will be used in the following sections, to explain the models.

#### 4.3.1 Board Model

The board model is the model underlying the board description language. It is generate from a .bdl file using a JFlex scanner and CUP parser. The board model is the only input available to the driver generator and is translated by backends into other models. If the used board description is not correct, the frontend will abort and provide the cause in an error message. The required properties for correctness are described in Section 2.2.

```
BDLFile (Imports imports, Options opts, Cores cores, GPIOs gpios, Instances insts, Medium medium, Scheduler scheduler)
Position (String filename, Integer line)
```

<sup>&</sup>lt;sup>2</sup>http://www.apache.org/licenses/LICENSE-2.0

 $<sup>^3 {\</sup>rm see}$  http://www.apache.org/licenses/GPL-compatibility.html for details about GPL/Apache compatibility



Figure 4.1: A rough sketch of the translation process so far

The board model is comparably flat, containing most elements already on top level. Most of the components of the BDL model are augmented with a position, storing the occurring document and line within the document. This allows to provide not only the reason for an error when parsing and analysing .bdl files, but also the position of the errors within the file.

```
Imports * Import
Import (String file, Position pos)

Options * Option
Option = HWQUEUE (Position pos, Integer qsize)

| SWQUEUE (Position pos, Integer qsize)
| BITWIDTH(Position pos, Integer bit)
| POLL (Position pos, Integer count)
| DEBUG (Position pos)
```

Imports reference another file by a string. These files are concatenated to a single, big file before processing. Caching of file names resolves circular imports. Note, that files do not necessarily have to be "complete" on their own but may miss several required parts and be only complete when regarding import in another file.

Options specify several parameters to configure either the drivers both board- and client-side. The queue options HWQUEUE and SWQUEUE specify the size of hardware- or software queues. The POLL parameter marks a polling port. An integer parameter can be used to specify the size of a client-side queue for

caching of a few values. See Section 3.2.1 for an introduction to the different queue types. BITWIDTH specifies the width of a port, DEBUG enables debug mode for the generated driver.

The queue and debug options are available on top level, others can be used later on in the document.

```
Cores * Core
Core (String name, String version, Position pos, Imports source,
Ports ports)

Ports * Port
Port = CLK(String name, Position pos, Integer frequency)
| RST(String name, Position pos, Boolean polarity)
| AXI(String name, Position pos, Direction direction, Options opts)

Direction = IN() | OUT() | DUAL()
```

A core is identified by a name and version String. It marks a template for components on the board and therefore has to specify its VHDL sources and its port interface. The interface is already described in the VHDL sources and could also be deduced in a later driver generator version by parsing these sources instead of only copying them.

A port is either a clock port, a reset port, or an AXI stream port. The first two declare how the respective ports in the VHDL sources are named and provide additional parameters for these ports. Both clock and reset port are used by the board-side driver for control flow and never visible from the host-side API. AXI stream ports use the name as identifier. Such a port has a direction, which is either in-going, out-going or both. Note, that bi-directional ports are not supported by the AXI-Stream interface this driver is designed for, and are just in the frontend for completeness. However, no backend currently supports those ports. Ports can be configured further with options, though currently the only meaningful option in context of a port is the bitwidth option. Queue options might make sense here as a more general declaration compared to instance bindings but more specific than the global ones. All bindings to this port get the size automatically... Same applies to poll (i.e., might be useful to flag the port HERE as polling resulting in ALL port instances to be polling).

The Instance marks the instantiation of a core as component on the board. It also has a name and position, and references the core by string and version id. Existence of the core is checked in the frontend. BINDINGS are used to connect ports of instantiated cores to each other. The normal AXIS is used to

connect instances of user-defined cores. The axis identifier marks the name of a direct connection between two AXI stream ports. An axis identifier may occur at most once, though it is allowed to leave a connection open or not connect the port at all (however, this will result in a warning from the frontend). The bitwidths of ports connected this way are required to be identical. A CPUAxis marks the connection of a port directly to the processor on the board. If an axis is connected that way, it can be written to or read from by the driver and methods are generated in the host-side API. Additional options can be used for cpu connections. This includes the queue options as well as the poll switch. These values are preferred as queue size for the connection, if specified. Otherwise, more global values are used. Also, bitwidth translation for a cpu connected port is performed automatically.

Note, that for both bindings, the referenced port has to exist. If it doesn't, the frontend will stop and return an error.

```
GPIOs * GPIO
GPIO (String name, Direction direction, Position pos, Code callback
)
Scheduler (Position pos, Code code)
Code = DEFAULT() | USER_DEFINED(Strings content)
```

GPIO declarations specify, if a certain GPIO device is available on the board design. They are somewhat similar to instances, but do not require a core declaration (this is provided by the board design company) and can only be instantiated once. They also have a direction specifier. Per default, out-going GPIO components can be written to from the host-side API, while the state of an in-going GPIO is transmitted to the host, whenever it changes. It is however possible, to override the behaviour in case of a state change by supplying the GPIO declaration with user-defined callback code.

Similar to the callback method, the default scheduler behaviour can be overriden.

```
Medium = NONE()
        DefinedMedium
DefinedMedium = ETHERNET(Position pos, MOptions opts)
                 (Position pos, MOptions opts)
         UART
         PCIE
                 (Position pos, MOptions opts)
MOptions * MOption
MOption = MAC
                   Position pos, String val)
          IΡ
                  (Position pos, String val)
          MASK
                  (Position pos, String val)
          GATE
                 (Position pos, String val)
          PORTID (Position pos, Integer val)
```

The Medium describes how the board-side and host-side drivers are connected to each other. This can be done via Ethernet, UART or PCIE. Medium options specify several medium-specific properties. It is possible, that no medium is specified within a file, and the file is instead imported by another .bdl file. Still,

a medium has to be defined in a .bdl file structure, either directly in the toplevel .bdl file or an imported file. Note further, that there are no default values for a concrete medium, i.e. corresponding medium options are required as well.

# 4.3.2 C/C++ Model

The C/C++ model provides data types representing a C/C++ program. Note, that the model is neither complete nor always valid, i.e. not all C programs can be described using this model and it is possible to specify a model not translating into valid C. Still, the model simplifies the process of code generation. The model is used for generating C as well as C++ code. C model files are generated by the C++ host backend as well as the ISE project backend.

```
MFile ( MDocumentation doc, String name, MDefinitions defs,
    MStructs structs, MEnums enums, MAttributes attributes,
    MMethods methods, MClasses classes )

MClass ( MDocumentation doc, MModifiers modifiers, String name,
    MTypes extend, MStructs structs, MEnums enums, MAttributes
    attributes, MMethods methods, MClasses nested )

MModifier = PRIVATE() | PUBLIC() | CONSTANT() | STATIC() | INLINE()
```

First of all, files can be documented using an MDocumentation element. If the documentation is not empty, a <code>@file</code> tag is attached to indicate this as a file documentation for doxygen. A file has a name and consists of several definitions, structures, enums, attributes and methods. Files also can contain several classes. These classes again have a name and can contain all these components including other classes. In addition, classes can contain modifiers and inherit components from other classes. Classes also can be documented. The allowed modifiers are private, public, constant, static, and inline. Note, that not all of these modifiers are class modifiers, and several combinations of modifiers are invalid (e.g. private and public). The model relies on the developer to choose modifiers according to the modified program part.

Definitions, structs and enums mark rather trivial tuple productions. A definition simply assigns a name to a value. Enums list a number of possible values. All three can be documented.

```
MAttribute ( MDocumentation doc, MModifiers modifiers, MAnyType type, String name, MCodeFragment initial )
MCodeFragment ( String part, MIncludes needed )
```

Attributes are similar to definitions, but are typed and may also be left unassigned, using an empty code fragment. The MIncludes is required if the type of the attribute is not defined within this c file itself. They also can be documented.

```
MMethod ( MDocumentation doc, MModifiers modifiers, MReturnType returnType, String name, MParameters parameter, MCode body )

MReturnType = MAnyType | MVoid() | MNone()

MParameter ( MParamType refType, MAnyType type, String name )

MParamType = VALUE() | REFERENCE() | CONSTREF()

MCode ( Strings lines, MIncludes needed )
```

Methods have a return type and a list of parameters. The method body is also more complex than a simple code fragment and can consist of several lines, which are not checked any further in this model. The return type can be any C type as well as void. For constructors in C++, the return type MNone is used. Parameters also have a type and name. Furthermore, the mode of parameter passing has to be specified.

```
MAnyType = MType ( String name )
| MArrayType ( MAnyType type , Integer length )
| MPointerType ( MAnyType type )
| MConstPointerType ( MAnyType type )
```

The type system of the model supports arrays as well as (const) pointers. The basic type is the MType, which consists only of a string that has to reference an existing C type, e.g. int, struct student or enum day. This type can then be extended using pointer or array types. So MArrayType(MType("int"), 5) would mark an integer array of length 5. Note, that these types are nested semantically rather than syntactically. Consequently, a pointer type of a const pointer type of type integer will be translated into int \*const\* a.

For generation of the API, Javadoc style documentation has to be added to the model. An MDocumentation element contains documentation for the following block as well as possibly several tags. Tags can be parameter or return value descriptions of a method/procedure, descriptions for exception behaviour, deprecation descriptions or references to other elements of the code. The JAVADOC\_AUTOBRIEF option is set in the generated doxygen config files, resulting in the first sentence (concluded by a dot and following space or newline) will be used as short description for the documented program part. Since neither the driver generator nor doxygen perform sanity checks, we rely on the user to only introduce meaningful tags for a documentation element. Note further,

that the order of tags influences the order of elements in the generated API description.

#### Unparser

Different unparsers are used, to translate the C/C++ model in actual code. For each instance of the model, a header file and a corresponding source file, either C or C++, has to be generated. These unparsers are called depending on the particular instance of the model. Files intended to be loaded to the board have to be plain C, files intended for the client side can also be C++ files. The unparsing itself is realized using the visitor pattern. Each visit method appends code to a string buffer depending on the visited element.

**Header Unparser** The header unparser is used for both, unparsing C as well as C++ code. Consequently, it doesn't filter any constructs, but accepts everything specifiable with the model.

This unparser generates only signatures for all methods and only the declarations of attributes and enums. However, the header file will contain all includes referenced within the model.

Plain C Unparser Since plain C doesn't have any concept of classes, using a model with classes in this unparser will result in exceptions. Otherwise, all components and combinations are accepted. Source files will automatically include their corresponding header file. Furthermore, all includes flagged as private are unparsed.

C++ Unparser For the C++ unparser, like with the header unparser, all components and more combinations are allowed. Still, some combinations of elements are forbidden (e.g. combination of different visibility modifiers in for the same procedure). Includes are handled identical to the C Unparser.

## 4.3.3 MHS Model

The MHS model encapsulates several project files required by Xilinx workflows, namely .mpd files describing ip cores, the .mhs file describing the overall board design in XPS projects and the .mss file describing the contents of a board support package in Xilinx SDK. The model is comparably simple and, similar to the C model, does not guarantee correctness of files created using the model. However, the model is sufficient for the purpose of this driver generator.

```
MHSFile (Attributes attributes, Blocks blocks)
Block (String name, Attributes attributes)
Attributes * Attribute
Blocks * Block
```

An .mhs file consists of a list of attributes and blocks. A block in turn has a name and contains attributes of the block.

```
Attribute ( Type type , Assignments assign )
Type = OPTION( ) | BUS_IF( ) | PARAMETER( ) | PORT( )
Assignments * Assignment
```

An attribute consists of a attribute type and a list of assignments. Available types are options, bus interfaces, parameters and ports.

```
Assignment ( String name, Expression exp )
Expression = Value | AndExp
AndExp * Value
```

An assignment assigns an expression to a name. Such an expression is either directly a value, or a list of values concatenated using &.

```
Value = Ident ( String val )

| STR ( String val )

| MemAddr ( Integer val )

| Number ( Integer val )

| Range ( Integer u, Integer l )
```

A value can be one of the above types. Idents are used to reference other elements of the file, mostly ports or bus interfaces. Strings are similar, but are automatically surrounded with quotation marks. Numbers are simple integer values used either as boolean flag (with only 0 and 1 as valid values) or counter. In contrast, a memory address is printed as hexadecimal number in the file and adequately prefixed with "0x". A range uses an upper and lower bound and is usually used to mark the size of a bit vector.

These types are only used to make model generation clearer and improve type safety a little bit for changing existing mhs entries in the generator. Otherwise, an ident value could be used for all the other types.

#### 4.4 Extensions

So far, the driver generator can only generate C++ host APIs and only works with a virtex 6 board. To support more languages on the host side or different boards, extensions are necessary. The driver generator is designed with such extensibility in mind. Steps necessary for generation of a specific driver (be it host or board side) are wrapped inside a backend. These backends usually incorporate translation of the board model into some output model and unparsing of said output model. An example for such a backend ist the C++ host backend. This section describes the steps necessary to extend the driver generator with additional backends and introduces some utility classes that can make this task easier.

There are three kinds of backends in the driver generator: host backends, board backends and workflow backends. Host backends, like the C++ host backend, generates a host-side driver, which only depends on the interface of the board, but not the architecture. Additional host backends can provide a different user API, which can also be written in another language. Workflow

backends are responsible for generation of the .bit file, with which the FPGA is programmed. An example is the ISE 14.1 workflow backend. This backend uses the Xilinx ISE workflow to create the .bit file. In order to use other workflows, additional backends have to be defined. The third group of backends, i.e. board backends, support the workflow backends in their generation process by providing board-specific information required by the chosen workflow. A board backend may provide information for multiple, compatible workflows. For example, a .bit file for the ZedBoard can be generated using an ISE workflow as well as a Vivado workflow. Consequently, the ZedBoard backend should provide information for both workflows. To support additional boards, new board backends providing this board-specific information and files have to be created.

# 4.4.1 Adding Host Drivers

To add a host backend to the generator, the backend class is required to implement the respective interface (de.hopp.generator.host.HostBackend). The interface declares all methods necessary for project generation and clean integration of the backend. The recommended way to do this is extending the AbstractHostBackend, which describes a backend without parameters. If additional parameters are required, the provided methods can be overridden.

The getName() method simply should return the name of the backend. This name will be used as identifier for backend selection in the command line interface of the generator. printUsage is used for printing cli parameters which are used to configure the backend. parseParameters is called by the generator during parameter parsing and can take away parameters from the list. Remaining parameters have to be stored in the configuration before returning it. This is done in the abstract client backend. Finally, generate transforms the model and generates files. It is recommended, to split transformation and generation into separate classes.

```
public class Java extends AbstractHostBackend {
  public String getName() {
    return "java";
  }
  public void generate(...) {
    ...
  }
}
```

The above code fragment describes a possible Java client backend. For actual model translation and file generation the generate method has to be implemented. Refer to the code of the provided C/C++ backend to see how this can be done.

After defining the backend, it has to be added to the list of backends known to the generator. This is done by adding an instance of the backend to the client backend enum (de.hopp.generator.frontend.Host).

```
public enum Host {
```



Figure 4.2: Backend Architecture of the Driver Generator

```
CPP(new CPP()),
   Java(new Java());

// the rest of the file remains unchanged
   ...
}
```

## 4.4.2 Adding Workflows and Boards

Introduction of a new board backend is different from introduction of host or workflow backends. While there exists a generic interface for board backends, this should not directly be implemented by board backends. Rather, each workflow provides a board interface specifically designed to provide information required by the workflow. A concrete board backend can implement such a board interface, if the corresponding workflow should be supported. Vice versa, these board interfaces are used to evaluate compatibility of the selected board and workflow. Integration in the board enum is equivalent to host backends.

Addition of workflow backends is nearly identical to host backends. Naturally, the enum an interface to subtype differ. Furthermore, as described above, a specialised board interface should be introduced for each new workflow, that provides all board-specific information required for the workflow and provides integration in the compatibility check of the generator.

Figure 4.2 shows the integration of the Virtex6 board and the ISE 14.1 workflow in this structure. Generally, a workflow uses a specialised board interface and GPIO components tailored to the workflow providing all necessary information. Since many similar ISE versions exist, an intermediate layer is introduced comprising code common to all ISE backends. This also applies to the board aind GPIO interfaces. Future versions of Loopy supporting more diverse ISE versions could make the introduction of more hierarchy levels necessary. Currently, the abstract ISE row models only ISE major version 14.

For ISE 14.1, a dedicated workflow backend and board interface are introduced using the core versions available with this ISE release. A dedicated GPIO component or interface is not required here, since implementation of these components only differs in the XPS core and SDK driver versions, which are both maintained by the board backend. Finally, the Virtex 6 board backend implements the ISE 14.1 board interface. GPIO classes of the Virtex 6 implement the more abstract ISE GPIO component interface.

All current interfaces and enums and are summed up in Table 4.1.

| Backend      | Type      | Location                                     |
|--------------|-----------|----------------------------------------------|
| Host         | Interface | de.hopp.generator.backends.host.HostBackend  |
|              | Enum      | ${\it de. hopp. generator. front end. Host}$ |
| Workflow     | Interface | backends.workflow.WorkflowBackend            |
|              | Enum      | $\dots$ frontend. Workflow                   |
| Board        | Interface | backends.board.BoardBackend                  |
|              | Enum      | frontend.Board                               |
| ISE Board    | Interface | workflow.ise.ISEBoard                        |
| ISE GPIO     | Interface | work flow. is e. gpio. GPIO Component        |
| Vivado Board | Interface | workflow.vivado.VivadoBoard                  |

Table 4.1: Overview of Classes and Enums required for Backend Extension

As an example, the integration of ISE 14.1 support for the Virtex 6 board will be discussed in more detail here. For this purpose, the Virtex 6 board backend is required to implement the ISE 14.1 backend as depicted in Figure 4.2.

The ISE board interface requires the implementing board to provide visitors for generating an .mhs file for hardware synthesis using XPS and an .mss file for building the board-side software. For both, an abstract visitor is provided, that requires only small board-specific adjustments. The abstract visitor also relies on the board backend to also provide a GPIO object with additional information for each supported GPIO component. It is possible to override the implementation of GPIO components completely in the subclassed visitors, yet providing a GPIO object for each component installed on the board is deemed much easier.

While providing a lot of board-independent definitions, neither the abstract .mhs nor SDK generator are complete. Both require some board-specific ad-

ditions. For the .mhs generator this concerns hardware parts for the selected medium and design-independent but board-specific building blocks. In case of the Virtex 6, which does not have a processor, this includes addition of a microblaze soft-processor and related components. In the SDK generator, communication with components attached via AXI4 stream interfaces has to be explicitly specified. The microblaze of the Virtex 6 provides commands for communication over the integrated stream interfaces, hard processors usually do not have these interfaces and consequently no such methods are provided. Instead, a memory-mapped approach is chosen for such boards.

## 4.4.3 Utility Classes

There are several utility classes which supply basic methods for file generation and model translation.

The class de.hopp.generator.utils.BoardUtils contains utility methods for model analysis. This includes getters for referenced elements (e.g. getting the port referenced by a binding) or specific attributes, that are not explicitly declared at the component but can be derived form the context (e.g. the size of the queue of a binding).

Methods for printing of several model file types, i.e. C/C++ and MHS models, are provided in the utility class de.hopp.generator.backends.BackendUtils. These methods make use of the unparsers described in 4.3.2. Another utility class for working with the C/C++ model is de.hopp.generator.utils.CPPUtils. This class allows manipulation of the model by adding additional components to an existing component. Similar, de.hopp.generator.backends.workflow.ise.xps provides such methods for the mhs model.

Finally, de.hopp.generator.util.Files wraps deployment of files in order to provide a basic implementation for incremental builds. Files are only deployed, if they do not already exist or the contents that are to be deployed differ from the old content. This is only feasible, because a small number of new files is deployed. If no new files are deployed, the underlying bit file generation does nothing since the dates of all source files are untouched or can directly be skipped by the generator.

# **Bibliography**

- [1] Alachiotis, N., Berger, S.A., Stamatakis, A.: Efficient PC-FPGA Communication over Gigabit Ethernet. In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology. pp. 1727–1734. CIT '10, IEEE Computer Society, Washington, DC, USA (2010), http://dx.doi.org/10.1109/CIT.2010.302
- [2] Alachiotis, N., Berger, S.A., Stamatakis, A.: A Versatile UDP/IP based PC-FPGA Communication Platform. In: ReConFig 2012 (2012)
- [3] Lofgren, A., Lodesten, L., Sjoholm, S., Hansson, H.: An Analysis of FPGA-based UDP/IP Stack Parallelism for Embedded Ethernet Connectivity. In: NORCHIP Conference, 2005. 23rd. pp. 94 97 (November 2005)