Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for the CUDA stream module #11606

Open
nglee opened this issue May 28, 2018 · 0 comments
Open

Suggestion for the CUDA stream module #11606

nglee opened this issue May 28, 2018 · 0 comments
Labels
category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib RFC

Comments

@nglee
Copy link
Contributor

nglee commented May 28, 2018

Hi, I'd like to suggest something related to the CUDA stream module.

It seems that cv::cuda::Stream class encapsulates a feature related to CUDA memory allocation using StackAllocator class and MemoryPool class.

This feature is described in detail at this documentation for BufferPool. In short, it seems that this feature is designed to bypass CUDA memory allocation API calls to speed up performance as follows:

  • When a Stream class is constructed, some amount of CUDA memory is pre-allocated and assigned to the Stream instance.
  • If an OpenCV algorithm is run on that stream, the algorithm internally uses the BufferPool class to get buffer memory (if needed) from the pre-allocted area, rather than calling the CUDA memory allocation API.
  • This reduces overhead.

Since there are many cases where CUDA algorithms need some amount of GPU memory (sometimes as small as a few bytes) for internal buffer, it seems reasonable to take this memory pre-allocation approach. I suppose that's the original reason behind this design, and this feature was enabled by default before #10751. It is now disabled by default by the mentioned PR, while users can still turn it on by cv::cuda::setBufferPoolUsage(true);.

However, in my opinion, this feature should not be used since enabling it may lead to some problems.

The problems are:

  1. The current design does not work well with device resetting function. The allocated CUDA memory is deallocated when the CUDA context is reset with cv::cuda::resetDevice(), but this is not recognized by the Stream module, leading to failures. The following code snippets fail to run due to this.

  2. It is error-prone since users would have to consider the deallocation order. The following code will show different images for seemingly unchanged variable.

  3. The Stream module becomes not safe to multi-threaded application.

All code snippets mentioned above run without error when memory pre-allocation mechanism is disabled by replacing setBufferPoolUsage(true); to setBufferPoolUsage(false); (or just by deleting the setBufferPoolUsage(true); line).

So I'm suggesting that the memory pre-allocation feature of Stream module should be blocked. But to achieve this, the StackAllocator, MemoryPool, and BufferPool class should be blocked or removed. Also, existing CUDA memory allocation using BufferPool should be replaced by ordinary GpuMat allocation using the DefaultAllocator.

Since all these changes seem too radical, another option I can think of is that we might stay the way as it is right now (disable the feature by default) and warn the users who would like to enable this feature by rewriting the documentation of setBufferPoolUsage function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib RFC
Projects
None yet
Development

No branches or pull requests

2 participants