CUDA Streams and OpenACC async interoperability

Sometimes you want to either use OpenACC along with explicit CUDA kernels in CUDA C or CUDA Fortran, or you want to use a CUDA library, like CUBLAS or CUFFT. In CUDA, the default CUDA stream being used is stream zero, or the NULL stream. In PGI OpenACC, the default stream used is NOT stream zero. This means if you have an OpenACC data movement or kernel launch, followed by a CUFFT call, they will not, by default, be on the same stream and may not be executed in the right order.

There are two solutions. The simplest is to link your program with -Mcuda. With -Mcuda at link time, the PGI OpenACC runtime will use stream zero by default (unless you use async clauses).

The other is to get the default OpenACC stream using acc_get_cuda_stream(acc_async_sync) and set the default stream for CUFFT or CUBLAS to the result of that, or use that as the stream argument for your CUDA kernel launch or async memory transfer.

If you are using async clauses and want to put the CUFFT or CUBLAS operations on the same stream as your async operations, you will need to use acc_get_cuda_stream(i) where 'i' is the async value you used, and set the CUFFT or other library stream to that result, or use that for your CUDA operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Streams and OpenACC async interoperability

Clone this wiki locally