Skip to content
Branch: master
Find file History
olajep assembly-opt: Fixes for Pubuntu / ESDK 2016.11
Single-core version works. Can't test multicore version.

assembly-opt/
	* build.sh: Don't create SREC. Replace e-as use with e-gcc to
	get preprocessor.
	* build_multi.sh: Likewise.
	* run.sh: Don't use sudo.
	* run_multi.sh: Likewise.
	* src/host_main.c (main): Load elf file instead of srec. Don't
	call e_reset_core.
	* src/host_multi.c: Don't call e_reset_core. Load elf file.
	* src/matmul_assembly.S: Adjust for user label prefix change
	from "_" to "" in ESDK 2016.3
	* src/matmul_internal.ldf: Likewise.
	* src/matmul_internal_multi.ldf: Likewise.

Signed-off-by: Ola Jeppsson <ola@adapteva.com>
Latest commit d1546d9 Nov 15, 2016
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
Debug/src Formatting Aug 14, 2015
output Formatting Aug 14, 2015
src assembly-opt: Fixes for Pubuntu / ESDK 2016.11 Nov 15, 2016
README.md
build.sh assembly-opt: Fixes for Pubuntu / ESDK 2016.11 Nov 15, 2016
build_multi.sh
makefile Formatting Aug 14, 2015
run.sh
run_multi.sh

README.md

Matrix Multiplication

Parallel Matrix multiplication of two matrices following Cannon's algorithm

Implementation

Implementation is based on the examples provided by Adapteva

Host side:

  • Initializes the operand matrices and transfers it to the shared memory
  • When device signals completion of execution, host reads the result matrix from shared memory

Device side:

  • Reads the operand matrices from the shared memory and distributes it among all the device side cores
  • Per-core matrix multiplication code written in hand-tuned assembly code using the Epiphany Instruction set
  • Cannon's algorithm is used for allocation of blocks of operand matrices to the cores and the blocks are rotated around rows and columns of cores
  • For block sizes less than 32 x 32, double buffering is used. For blocks of size 32 x 32, an alternate buffering scheme is implemented due to limited per-core memory

Further details of implementation can be found in: http://arxiv.org/abs/1410.8772

Tested on the Epiphany-IV evaluation module

Building

Single-core version

Configure the parameters accordingly in src/defs.h and run:

$ make single

Multi-core version

Configure the parameters accordingly in src/defs_multi.h and run:

$ make multi

Usage

Single-core version

$ ./run.sh

Multi-core version

$ ./run_multi.sh

Result matrix will be written to output/

License

GPL v3

Author

Contributed by Anish Varghese (Built upon example code by Yaniv Sapir)

You can’t perform that action at this time.