OpenCL Performance on FPGA and GPU

The given article is continuation of our FPGA research.

There are common programing models for FPGA. The traditional way, also called RTL-design, requires implementation of algorithms in LDH language such as Verilog or VHDL. Another approach is to use higher level language such as OpenCL to implement accelerated function (kernel), and compile it to target hardware (CPU, GPU, or FPGA). OpenCL allows the use of a C-based programming language for developing code across different platforms. OpenCL is a portable, open, royalty-free standard.

OpenCL is currently supported by GPU (NVidia, AMD) and FPGA (Altera, Xilinx) vendors. In particular, Xilinx supports OpenCL development via SDAccel™ development environment, which is also supported in AWS cloud.

With SDAccel™ and Amazon AWS F1 instance it is possible to design, test and run custom developed algorithms on FPGA in cloud.

Previously we compared FPGA vs GPU performance on Neural Network inference. GPU won in performance per dollar on AWS with F1 instance in most tests.

What about other applications, for instance unsipervised learning, computer vision? We took 2 algorithms: naive k-nn clustering, and edge detection and tested it's accelerated implementation.

k-means clustering

We used Rodinia-based benchmark. FPGA implementation is customized code for Xilinx OpenCL Devices.

Clustering of 1,000,000 records with 34 features to 5 classes. Time is averaged across 10 runs.

OpenCL is used on both platforms for kernel implementation and same code compiled to GPU and FPGA.

Hardware	Device	Total time, sec	Power Consumption, Watt	Tx / Watt
FPGA	Xilinx UltraScale+™ VU9P	5.6	40 (estimated)	4460
GPU	NVIDIA® Tesla® K80	3.9	300	850

edge detection

Edge detection sobel algorithm was tested on 1024x1024 images. For this task, GPU implementation is CUDA-based. Execition time was measured to process 1,000 images

Hardware	Device	Total time, sec	Power Consumption, Watt	Tx / Watt
FPGA	Xilinx UltraScale+™ VU9P	0.330	40 (estimated)	75
GPU	NVIDIA® Tesla® K80	0.320	300	10

Conclusion

In the given experiment GPU slightly outperforms FPGA in absolute speed but FPGA is 5x ~ 7x better than GPU in transactions / watt.

Author: Oleksandr Sukholeyster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-OPENCL.md

README-OPENCL.md

OpenCL Performance on FPGA and GPU

k-means clustering

edge detection

Conclusion

Files

README-OPENCL.md

Latest commit

History

README-OPENCL.md

File metadata and controls

OpenCL Performance on FPGA and GPU

k-means clustering

edge detection

Conclusion