Skip to content
Benchmarking Analysis of Vision Kernels on Embedded CPU, GPU and FPGA
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
GPUTests Update Apr 21, 2019
PYNQ-ComputerVision @ c182cfb
LICENSE Create LICENSE Jan 29, 2019 Update Apr 21, 2019

Benchmarking Analysis of Vision Kernels on Embedded CPU, GPU and FPGA:

This repository contains benchmark framework for measuring and comparing energy efficiency of different vision kernels on embedded platforms. It aims to provide computer vision community an easy tool to analyze the performance of vision kernels on different hardware architectures and aids with determining which hardware architecture is most suitable for different kind of vision applications.

Table of contents

Repository Structure

This repository consists of:

├── FPGATests
│   └──
├── GPUTests
│   ├── Geometric Transforms 
│   ├── Image Analysis  
│   ├── Image Arithmatic 
│   ├── Image Features 
│   ├── Image Filters 
│   ├── Input Processing 
│   ├── Optical Flow & Depth 
│   └── 
├── PYNQ-ComputerVision
│   ├── applicationCode 
│   │   ├── overlayTests 
│   │   └── unitTests
│   ├── boards
│   │   ├── Pynq-Z1
│   │   ├── Pynq-Z2
│   │   ├── Ultra96
│   │   └── ZCU104
│   ├── components 
│   └── frameworks
│       ├── cmakeModules 
│       └── utilities    

Hardware and Software Environments

List of Vision Kernels

Input Processing Image Arithmatic Filters Image Analysis Geometric Transforms Composite Kernels
combine AbsDiff filter2D calcHist affine warp canny
extract accumulate box filter equalizeHist perspective warp fast
convertTo accumulate squared dilate integral image resize harris
cvtConvert accumulate weighted erode mean std dev remap optical flow pyramid
table lookup add/subtract median min/max loc stereoBM
mulitply pyramidUp
threshold pyramidDown
bitwise and,or,xor,not


To clone the repository with PYNQ-ComputerVision submodules, open a terminal and execute:

git clone --recursive

Build Test Codes

The steps required to build and run unit tests is described in:

Results Summary

In our experiment, we evaluated the performance of vision kernels on two popular platforms for deploying embedded vision applications:

  • Nvidia Jetson TX2 (256-core Pascal GPU + ARM Cortex-A57 CPU).
  • Xilinx Zynq UltraScale+ ZCU102 (XCZU9EG FPGA + ARM Cortex-A53 CPU).

The figures below show the energy/frame (in mJ/f) comparison results.

Alt text


  title={Analyzing the Energy-Efficiency of Vision Kernels on Embedded CPU, GPU and FPGA Platforms},
  author={Qasaimeh, Murad and Kristof, Denolf and Jack, Lo and Kees, Vissers and Zambreno, Joseph and Jones, Phillip H},
  booktitle={2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},


The source for this project is licensed under the 3-Clause BSD License

You can’t perform that action at this time.