

## Parallel Sparse Deep Learning Operators on Lightweight RISC-V Processors

Marco Bertuletti<sup>1</sup>, Tim Fischer<sup>1</sup>, Yichao Zhang<sup>1</sup>, Luca Benini<sup>1,2</sup> <sup>1</sup>Integrated Systems Laboratory, ETH Zurich; <sup>2</sup>Università di Bologna, Italy;





## 2. Parallelization of Sparse kernels







Parallelization by input channel

## 3. Sparse kernels speedup



Input Density

SW Implementation of Sparse kernels:

Quasi-ideal parallel speedup

CSR vs DENSE SPEEDUP

x7.3 - x3.0 - x1.7 1 core x7.0 - x1.1 - x1.5 8 cores

When the density of the inputs is high we still benefit from reduced memory footprint

Further improvement via prefetching