`matmul`

matmul is a simple matrix multiplier for AMD / Xilinx Alveo boards fully written in Chisel. This is a proof-of-concept to demonstrate a Vitis-less, RTL-only flow written in Chisel that fully exploits the available resources on the Alveo boards, enables multi-clock designs and applies advanced RTL design techniques to Chisel.

The matmul repo is open-source (GPL3) and shipped with this article submission.

Details and kudos

MatMul can use two different floating point arithmetic packages :

a self-aligned format (SAF) for float computations, based on Tarek Ould-Bachir and Jean-Pierre David's article on the subject. A big part of the SAF-IEEE754 conversions and SAF computations were stolen from CuFP.
Berkeley's hardfloat Chisel package which uses an custom internal recoded floating point format.

The VCD generation hack was taken on edwardcwang's repo.

Usage

Building the project

Dependencies

OpenJDK17
SBT (Scala Build Tool)
C libs: GSL with CBLAS support
Vivado 2024.1
Xilinx's dma_ip_drivers
GCC
make

Building `dma_ip_drivers`

Clone the dma_ip_drivers repository and build the kernel module (refer to the README and pull requests / issues).

Tested commit: 0b9793ec13cab9c7631910e7f911713b68b272ed

Building `matmul`

Build the Chisel project: make [VAR=VAL...] hw
Build the host code: make [VAR=VAL...] host
Build the bitstream: make [VAR=VAL...] bitstream
All at once: make [VAR=VAL...] all Note that the default target is equivalent to make host.

Refer to the Make variables section to customize the build.

Make variables

Several make variables are exposed to allow you to customize the build:

VAR	Default value	Description
`BUILDDIR`	build	Top-level build directory
`CHISEL_OUTDIR`	matmul-x_-<FREQ_MHZ>	Chisel outputs directory (under `$(BUILDDIR)`/)
`M_HEIGHT`	16	Matrix height (number of PE)
`M_WIDTH`	16	Matrix width (PE memory size)
`FLOAT`	saf	Float implementation (`saf` or `hardfloat`)
`PLL_MULT`	9	Multiplication coefficient for the base frequency (156.25)
`PLL_DIV`	10	Division coefficient for the base frequency (156.25)
`OOC`	1	Enable (1) or disable (0) Vivado out-of-context synthesis
`DCP`	dcp	Name of the subdirectory of `$(BUILDDIR)/$(CHISEL_OUTDIR)` that will contain design checkpoints
`RPT`	rpt	Same as above for Vivado reports
`LOG`	log	Same as above for Vivado logs
`VIVADO_PART`	`xcu200-fsgd2104-2-e`	Vivado part (the default is the only one tested and the XDMA IP is configured for it)
`NRPOC`	`$(nproc)`	Number of processors to use
`SBT_MEM`	65535	Amount of memory to lend to SBT
`EXE_NAME`	matmul-host	Name of the output host executable

Running the project

Flashing the board

The Alveo board's JTAG USB port must be plugged to one of the host's USB ports. Also, Vivado's cable drivers must be installed.

Once the bistream has been built, it can be flashed using the convenience script scripts/flash.sh. Just call it passing the bitstream path as argument:

./scripts/flash.sh <path_to_bitstream>/TopLevel.bit

This script has only been tested on one machine, the bitstream can be flashed using the Vivado GUI if the script doesn't work.

The host will probably need a hot restart to correctly rescan the PCIe bus. To do so, just type sudo reboot in a terminal. Don't unplug or turn off the board or the host's power supply in order to keep the Alveo powered and prevent it from losing its configuration.

Running the host code

Note that since /dev belongs to root, running the host code requires root priviledges, or setting up a udev rule to allow the user to interact with the /dev/xdma0_* device files.

Once the bitstream has been flashed and the PCIe rescanned, the hardware can be checked with

shell# ./$(BUILDDIR)/$(CHISEL_OUTDIR)/sw/matmul-host -w

This command should ask the matrix's size and the floating point implementation used to the matmul hardware, and print this information formatted as "<MH>x<MW>_<FLT>". For example, the default configuration yields 16x16_saf.

The article's benchmark results can be reproduced using this executable. See the next section for available parameters and options.

Host executable flags

All toggle flags (with "-" in the default value column) are disabled by default.

Flag	Default	Description
`-m <arg>`	1	Number of different matrices to generate
`-n <arg>`	1	Number of different vectors to generate for each matrix
`-s <arg>`	`time(NULL)`	Random seed
`-w`	-	Read hardware info in hardware, print it and exit
`-d`	-	Dry-run: don't use hardware, just print matrices
`-p`	-	Print result vectors while benchmarking
`-h`	-	Print help

Work in progress (that will probably never be done)

The SAF implementation doesn't handle float over/underflow or NaNs
The SAF implementation doesn't handle rounding or error conditions
The core processing element pipeline cannot stall: the output FIFO must have enough space for outgoing data when sending a vector, or the hardware may enter an unpredictible state

Copyright notice

This project is released under the GNU GPL3 (see LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
external		external
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`matmul`

Details and kudos

Usage

Building the project

Dependencies

Building `dma_ip_drivers`

Building `matmul`

Make variables

Running the project

Flashing the board

Running the host code

Host executable flags

Work in progress (that will probably never be done)

Copyright notice

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

matmul

Details and kudos

Usage

Building the project

Dependencies

Building dma_ip_drivers

Building matmul

Make variables

Running the project

Flashing the board

Running the host code

Host executable flags

Work in progress (that will probably never be done)

Copyright notice

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`matmul`

Building `dma_ip_drivers`

Building `matmul`

Packages