Skip to content

maltanar/fpga-booleanring-bfs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid BFS on Xilinx Zynq

by Yaman Umuroglu (yamanu@idi.ntnu.no)

License and "Academic Disclaimer"

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Academic software written to get publishable results before a deadline has a tendency to be far from pretty, and this work is no exception. Furthermore, it is a hardware-software system that must be built using propietary FPGA synthesis tools (from Xilinx) and deployed/run on a particular FPGA-CPU platform (the ZedBoard) that might all but disappear from the face of Earth in a couple of years. All this makes it unlikely that the source code here will be actively maintained or supported.

Overview

The "gist" of the hybrid method is to take advantage of the varying amount of parallelism (due to exponential growth and decay of frontier size) during breadth-first search on a small-world network: (very) few nodes are visited during the first and last few steps, and not much parallelism is available, so these steps are carried out on the CPU. The middle steps visit many more nodes and have greater parallelism, and hence are explored on a high-throughput FPGA accelerator. Since the Zynq offers both Cortex-A9 CPU cores and FPGA fabric on the same chip (and sharing the same DRAM), there is little penalty associated with switching execution modes.

The hardware accelerator itself comes in two flavors (called "dense frontier" and "sparse frontier"), which perform the exact same function but in different ways. The sparse frontier version is quite similar to traditional BFS with current/next queues, adding only unvisited nodes to the next queue. The dense frontier version visits all nodes and edges in the graph at every BFS step. Although this may sound quite wasteful, it has a much simpler DRAM access pattern and can utilize most of the available DRAM bandwidth, which actually outperforms the sparse version (see paper for more details).

Building the Hardware

Most of the hardware is described in Chisel (https://chisel.eecs.berkeley.edu/) and can be turned directly into synthesizable Verilog. Vivado 2014.1 was used wrap the Verilog modules as IP blocks, and for creating the complete accelerator system by adding Xilinx IP blocks (for the interconnect, FIFOs and result memory BRAM). Due to the mixed use of Chisel and Xilinx IP blocks, building the project from scratch is quite involved. Two pre-packaged Vivado projects (for dense and sparse variants) are available from TODO for convenience. For building from scratch or making changes, the following steps should be followed:

  1. Generate Verilog sources for each component using Chisel. There is currently no Makefile or similar, so each component must be generated by editing the Main.scala file and uncommenting its corresponding chiselMain line.
  2. Import Verilog sources as IP blocks into Vivado. This is rather straightforward by using the built-in functionality in Vivado (Tools -> Create and Package IP). The AXI MM and AXI stream interfaces use the naming convention expected by Vivado, so the "auto-infer ports" command can be used. All Chisel-generated component resets are positive and synchronous but Vivado insists on inferring inverted reset, so this should also be fixed during the import. (I'm also considering writing a blog post on how the Chisel-to-Vivado-IP process has been working out to explain this in more detail.) Alternatively, you can use the readily-imported versions of all the components from the ip-cores/ folders.
  3. Put together the complete Zynq hardware system in Vivado. The block-diagrams/ folders contain PDF files that describe how the IP blocks are connected (the pre-packaged projects can also be used for reference). The hardware system should now be ready for synthesis.

Running

The hardware accelerator uses memory-mapped registers for control, and has direct access to the Zynq's DRAM via the high performance AXI ports. See the driver/ folders for an example bare-metal application (with serial port I/O for interactions) for benchmarking the hybrid BFS. The graphs are provided to the accelerator in the Compressed Sparse Column format (see example under example-data/), which can be put into DRAM in any desirable method. The bare-metal example code also has support for reading the graph files/matrices directly from the ZedBoard's SD card.

About

Hybrid BFS on Xilinx Zynq

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published