-
Notifications
You must be signed in to change notification settings - Fork 11
Opencl
The OpenCL block allows a user to interface with an OpenCL compatible device, like a GPU. This block handles most of the complications of using the OpenCL API. All the user has to do is feed the block a .cl file with the kernel source and click run! This block makes use of GRAS's special buffer model so memory allocated from OpenCL can be directly written by upstream blocks and read by downstream blocks.
The first step is to install an OpenCL development environment. Now this part is specific to the hardware or GPU in question, so please refer to your vendor's installation or SDK install instructions for OpenCL. I personally found that this step was very easy on an Ubuntu machine with an Nvidia GPU. I simply had to install the nvidia-opencl-dev package and everything was taken care of.
After installing the OpenCL development environment. You should install GRAS according to the build instructions here:
During the cmake configuration step, you should see verbose similar to this:
Found OpenCL: /usr/lib/libOpenCL.so
If the cmake configuration cannot find the OpenCL development files, the development directories for OpenCL headers and libraries can also be manually set via the following variables in cmake:
- OPENCL_LIBRARIES
- OPENCL_INCLUDE_DIRS
The OpenCL block can be used in C++, python, or GNU Radio companion environments. The user has to know surprisingly little about the OpenCL API, this is the part of the API that revolves around buffer allocation, kernel compilation, device detection, etc... The OpenCL block wraps around all of that for you. The users only concern is implementing a kernel in .cl file.
Firstly, I would like to note that the OpenCL API encompass a great deal of things, and it would be impossible for this block to cover all of them. So far, this block handles linear arrays of data in and out, and exposes hooks to control linear work groups and work dimensions. I think this makes sense for GNU Radio applications which are often based on processing buffers of linear samples.
The OpenCL block makes use of GRAS's advanced buffering model. Using the GRAS API, the OpenCL block swaps out its input and output buffer queues, and replaces these with a custom queue that uses OpenCL's buffer allocators. Therefore, blocks upstream of the OpenCL block write into memory allocated by OpenCL, and blocks downstream of the OpenCL block read from memory allocated by OpenCL.
Specifically, the OpenCL buffers are allocated with the CL_MEM_ALLOC_HOST_PTR flag. On a PCIe express graphics card, buffers may be be allocated in pinned memory and DMA'd over the PCIe interface. DMAs are executed via the enqueueMapBuffer and enqueueUnmapMemObject OpenCL API. Nvidia regards this method for best performance in the documentation.