Benchmarking Analysis of Vision Kernels on Embedded CPU, GPU and FPGA
Benchmarking Analysis of Vision Kernels on Embedded CPU, GPU and FPGA:

This repository contains benchmark framework for measuring and comparing energy efficiency of different vision kernels on embedded platforms. It aims to provide computer vision community an easy tool to analyze the performance of vision kernels on different hardware architectures and aids with determining which hardware architecture is most suitable for different kind of vision applications.

Table of contents

Repository Structure

This repository consists of:

├── FPGATests
│   └──
├── GPUTests
│   ├── Geometric Transforms 
│   ├── Image Analysis  
│   ├── Image Arithmatic 
│   ├── Image Features 
│   ├── Image Filters 
│   ├── Input Processing 
│   ├── Optical Flow & Depth 
│   └── 
├── PYNQ-ComputerVision
│   ├── applicationCode 
│   │   ├── overlayTests 
│   │   └── unitTests
│   ├── boards
│   │   ├── Pynq-Z1
│   │   ├── Pynq-Z2
│   │   ├── Ultra96
│   │   └── ZCU104
│   ├── components 
│   └── frameworks
│       ├── cmakeModules 
│       └── utilities    

Hardware and Software Environments

List of Vision Kernels

Input Processing Image Arithmatic Filters Image Analysis Geometric Transforms Composite Kernels
combine AbsDiff filter2D calcHist affine warp canny
extract accumulate box filter equalizeHist perspective warp fast
convertTo accumulate squared dilate integral image resize harris
cvtConvert accumulate weighted erode mean std dev remap optical flow pyramid
table lookup add/subtract median min/max loc stereoBM
mulitply pyramidUp
threshold pyramidDown
bitwise and,or,xor,not


To clone the repository with PYNQ-ComputerVision submodules, open a terminal and execute:

git clone --recursive

Build Test Codes

The steps required to build and run unit tests is described in:

Results Summary

In our experiment, we evaluated the performance of vision kernels on two popular platforms for deploying embedded vision applications:

  • Nvidia Jetson TX2 (256-core Pascal GPU + ARM Cortex-A57 CPU).
  • Xilinx Zynq UltraScale+ ZCU102 (XCZU9EG FPGA + ARM Cortex-A53 CPU).

The figures below show the energy/frame (in mJ/f) comparison results.

  title={Analyzing the Energy-Efficiency of Vision Kernels on Embedded CPU, GPU and FPGA Platforms},
  author={Qasaimeh, Murad and Kristof, Denolf and Jack, Lo and Kees, Vissers and Zambreno, Joseph and Jones, Phillip H},
  booktitle={2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},


The source for this project is licensed under the 3-Clause BSD License

