Skip to content
guruofquality edited this page Apr 28, 2013 · 19 revisions
http://i.imgur.com/U4OVEC2.png

The OpenCL block allows a user to interface with an OpenCL compatible device, like a GPU. This block handles most of the complications of using the OpenCL API. All the user has to do is feed the block a .cl file with the kernel source and click run! This block makes use of GRAS's special buffer model so memory allocated from OpenCL can be directly written by upstream blocks and read by downstream blocks.

Setup and install

The first step is to install an OpenCL development environment. Now this part is specific to the hardware or GPU in question, so please refer to your vendor's installation or SDK install instructions for OpenCL. I personally found that this step was very easy on an Ubuntu machine with an Nvidia GPU. I simply had to install the nvidia-opencl-dev package and everything was taken care of.

After installing the OpenCL development environment. You should install GRAS according to the build instructions here:

During the cmake configuration step, you should see verbose similar to this:

Found OpenCL: /usr/lib/libOpenCL.so

If the cmake configuration cannot find the OpenCL development files, the development directories for OpenCL headers and libraries can also be manually set via the following variables in cmake:

  • OPENCL_LIBRARIES
  • OPENCL_INCLUDE_DIRS

Using OpenCL block

Implementation notes

The OpenCL block makes use of GRAS's advanced buffering model. Using the GRAS API, the OpenCL block swaps out its input and output buffer queues, and replaces these with a custom queue that uses OpenCL's buffer allocators. Therefore, blocks upstream of the OpenCL block write into memory allocated by OpenCL, and blocks downstream of the OpenCL block read from memory allocated by OpenCL.

Specifically, the OpenCL buffers are allocated with the CL_MEM_ALLOC_HOST_PTR flag. On a PCIe express graphics card, buffers may be be allocated in pinned memory and DMA'd over the PCIe interface. DMAs are executed via the enqueueMapBuffer and enqueueUnmapMemObject OpenCL API. Nvidia regards this method for best performance in the documentation.

Clone this wiki locally