Skip to content
Branch: master
Find file History
olajep assembly-opt: Fixes for Pubuntu / ESDK 2016.11
Single-core version works. Can't test multicore version.

	* Don't create SREC. Replace e-as use with e-gcc to
	get preprocessor.
	* Likewise.
	* Don't use sudo.
	* Likewise.
	* src/host_main.c (main): Load elf file instead of srec. Don't
	call e_reset_core.
	* src/host_multi.c: Don't call e_reset_core. Load elf file.
	* src/matmul_assembly.S: Adjust for user label prefix change
	from "_" to "" in ESDK 2016.3
	* src/matmul_internal.ldf: Likewise.
	* src/matmul_internal_multi.ldf: Likewise.

Signed-off-by: Ola Jeppsson <>
Latest commit d1546d9 Nov 15, 2016
Type Name Latest commit message Commit time
Failed to load latest commit information.
Debug/src Formatting Aug 14, 2015
output Formatting Aug 14, 2015
src assembly-opt: Fixes for Pubuntu / ESDK 2016.11 Nov 15, 2016 assembly-opt: Fixes for Pubuntu / ESDK 2016.11 Nov 15, 2016
makefile Formatting Aug 14, 2015

Matrix Multiplication

Parallel Matrix multiplication of two matrices following Cannon's algorithm


Implementation is based on the examples provided by Adapteva

Host side:

  • Initializes the operand matrices and transfers it to the shared memory
  • When device signals completion of execution, host reads the result matrix from shared memory

Device side:

  • Reads the operand matrices from the shared memory and distributes it among all the device side cores
  • Per-core matrix multiplication code written in hand-tuned assembly code using the Epiphany Instruction set
  • Cannon's algorithm is used for allocation of blocks of operand matrices to the cores and the blocks are rotated around rows and columns of cores
  • For block sizes less than 32 x 32, double buffering is used. For blocks of size 32 x 32, an alternate buffering scheme is implemented due to limited per-core memory

Further details of implementation can be found in:

Tested on the Epiphany-IV evaluation module


Single-core version

Configure the parameters accordingly in src/defs.h and run:

$ make single

Multi-core version

Configure the parameters accordingly in src/defs_multi.h and run:

$ make multi


Single-core version

$ ./

Multi-core version

$ ./

Result matrix will be written to output/


GPL v3


Contributed by Anish Varghese (Built upon example code by Yaniv Sapir)

You can’t perform that action at this time.