Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
LICENSE
Makefile
README.md
boost.cpp
const.h
opencl.cpp
straight.cpp

README.md

Complete test: Simple OpenCL math benchmark

Overview

Intel's Broadwell chips (and higher) all have an onboard GPU that is OpenCL compatible. The GPU won't win any speed awards, but it really does work by default, and makes a nice test platform.

Floating-point powers and logarithms are surprisingly expensive. Here I've written a simple worker function (i.e., kernel) that evaluates floating point "elementary functions" that cancel each other out, resulting in a simple increment each time through the loop.

I've tested the same worker function with vanilla C++/STL and OpenCL (using Cpp wrappers). The vanilla straight.cpp implementation is very simple and easy to read. The opencl.cpp has a tremendous amount of boilerplate, argument passing, etc. But I find the later to be ~20x faster, which is on par with the number of compute nodes my GPU has (24). So, the benefit seems quite real. Here's hoping Cpp17 and SYCL will simplify the logistics for us soon.

Credits: OpenCL example

Prereqs

  • System (as written):
    • Debian-type linux distro
    • Intel GPU (>= broadwell)
  • Install prereqs: sudo apt-get install opencl-clhpp-headers beignet-dev beignet-opencl-icd intel-gpu-tools
  • Boost.Compute - include-only
    • Makefile assumes Boost.compute repo resides under $HOME/src
  • As written, Boost.Compute example depends on boost (which is huge):
    • sudo apt-get install libboost-all-dev
  • Run with: make; time ./opencl; time ./boost; time ./straight

Testing

  • Tested with intel i5-5300U (broadwell), Debian Jessie
  • Play with values in const.h
    • the_dim and max_iter determine job size
    • opencl fails for small the_dim, and some sizes of the_grp
  • Try an alternate kernel (uncomment) to see how functions can differ between C++ STL and OpenCL, presumably caused by differing implementations of floating point log/pow