Hybrid BFS on Xilinx Zynq

License and "Academic Disclaimer"

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Academic software written to get publishable results before a deadline has a tendency to be far from pretty, and this work is no exception. Furthermore, it is a hardware-software system that must be built using propietary FPGA synthesis tools (from Xilinx) and deployed/run on a particular FPGA-CPU platform (the ZedBoard) that might all but disappear from the face of Earth in a couple of years. All this makes it unlikely that the source code here will be actively maintained or supported.

Overview

The "gist" of the hybrid method is to take advantage of the varying amount of parallelism (due to exponential growth and decay of frontier size) during breadth-first search on a small-world network: (very) few nodes are visited during the first and last few steps, and not much parallelism is available, so these steps are carried out on the CPU. The middle steps visit many more nodes and have greater parallelism, and hence are explored on a high-throughput FPGA accelerator. Since the Zynq offers both Cortex-A9 CPU cores and FPGA fabric on the same chip (and sharing the same DRAM), there is little penalty associated with switching execution modes.

The hardware accelerator itself comes in two flavors (called "dense frontier" and "sparse frontier"), which perform the exact same function but in different ways. The sparse frontier version is quite similar to traditional BFS with current/next queues, adding only unvisited nodes to the next queue. The dense frontier version visits all nodes and edges in the graph at every BFS step. Although this may sound quite wasteful, it has a much simpler DRAM access pattern and can utilize most of the available DRAM bandwidth, which actually outperforms the sparse version (see paper for more details).

Building the Hardware

Most of the hardware is described in Chisel (https://chisel.eecs.berkeley.edu/) and can be turned directly into synthesizable Verilog. Vivado 2014.1 was used wrap the Verilog modules as IP blocks, and for creating the complete accelerator system by adding Xilinx IP blocks (for the interconnect, FIFOs and result memory BRAM). Due to the mixed use of Chisel and Xilinx IP blocks, building the project from scratch is quite involved. Two pre-packaged Vivado projects (for dense and sparse variants) are available from TODO for convenience. For building from scratch or making changes, the following steps should be followed:

Generate Verilog sources for each component using Chisel. There is currently no Makefile or similar, so each component must be generated by editing the Main.scala file and uncommenting its corresponding chiselMain line.
Import Verilog sources as IP blocks into Vivado. This is rather straightforward by using the built-in functionality in Vivado (Tools -> Create and Package IP). The AXI MM and AXI stream interfaces use the naming convention expected by Vivado, so the "auto-infer ports" command can be used. All Chisel-generated component resets are positive and synchronous but Vivado insists on inferring inverted reset, so this should also be fixed during the import. (I'm also considering writing a blog post on how the Chisel-to-Vivado-IP process has been working out to explain this in more detail.) Alternatively, you can use the readily-imported versions of all the components from the ip-cores/ folders.
Put together the complete Zynq hardware system in Vivado. The block-diagrams/ folders contain PDF files that describe how the IP blocks are connected (the pre-packaged projects can also be used for reference). The hardware system should now be ready for synthesis.

Running

The hardware accelerator uses memory-mapped registers for control, and has direct access to the Zynq's DRAM via the high performance AXI ports. See the driver/ folders for an example bare-metal application (with serial port I/O for interactions) for benchmarking the hybrid BFS. The graphs are provided to the accelerator in the Compressed Sparse Column format (see example under example-data/), which can be put into DRAM in any desirable method. The bare-metal example code also has support for reading the graph files/matrices directly from the ZedBoard's SD card.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
dense-frontier		dense-frontier
example-data/rmat-16-8-csc-1		example-data/rmat-16-8-csc-1
sparse-frontier		sparse-frontier
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dense-frontier

dense-frontier

example-data/rmat-16-8-csc-1

example-data/rmat-16-8-csc-1

sparse-frontier

sparse-frontier

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Hybrid BFS on Xilinx Zynq

License and "Academic Disclaimer"

Overview

Building the Hardware

Running

About

Releases

Packages

Languages

maltanar/fpga-booleanring-bfs

Folders and files

Latest commit

History

Repository files navigation

Hybrid BFS on Xilinx Zynq

License and "Academic Disclaimer"

Overview

Building the Hardware

Running

About

Resources

Stars

Watchers

Forks

Languages