You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'd like to suggest something related to the CUDA stream module.
It seems that cv::cuda::Stream class encapsulates a feature related to CUDA memory allocation using StackAllocator class and MemoryPool class.
This feature is described in detail at this documentation for BufferPool. In short, it seems that this feature is designed to bypass CUDA memory allocation API calls to speed up performance as follows:
When a Stream class is constructed, some amount of CUDA memory is pre-allocated and assigned to the Stream instance.
If an OpenCV algorithm is run on that stream, the algorithm internally uses the BufferPool class to get buffer memory (if needed) from the pre-allocted area, rather than calling the CUDA memory allocation API.
This reduces overhead.
Since therearemanycases where CUDA algorithms need some amount of GPU memory (sometimes as small as afewbytes) for internal buffer, it seems reasonable to take this memory pre-allocation approach. I suppose that's the original reason behind this design, and this feature was enabled by default before #10751. It is now disabled by default by the mentioned PR, while users can still turn it on by cv::cuda::setBufferPoolUsage(true);.
However, in my opinion, this feature should not be used since enabling it may lead to some problems.
The problems are:
The current design does not work well with device resetting function. The allocated CUDA memory is deallocated when the CUDA context is reset with cv::cuda::resetDevice(), but this is not recognized by the Stream module, leading to failures. The following code snippets fail to run due to this.
It is error-prone since users would have to consider the deallocation order. The following code will show different images for seemingly unchanged variable.
All code snippets mentioned above run without error when memory pre-allocation mechanism is disabled by replacing setBufferPoolUsage(true); to setBufferPoolUsage(false); (or just by deleting the setBufferPoolUsage(true); line).
So I'm suggesting that the memory pre-allocation feature of Stream module should be blocked. But to achieve this, the StackAllocator, MemoryPool, and BufferPool class should be blocked or removed. Also, existing CUDA memory allocation using BufferPool should be replaced by ordinary GpuMat allocation using the DefaultAllocator.
Since all these changes seem too radical, another option I can think of is that we might stay the way as it is right now (disable the feature by default) and warn the users who would like to enable this feature by rewriting the documentation of setBufferPoolUsage function.
The text was updated successfully, but these errors were encountered:
Hi, I'd like to suggest something related to the CUDA stream module.
It seems that
cv::cuda::Stream
class encapsulates a feature related to CUDA memory allocation usingStackAllocator
class andMemoryPool
class.This feature is described in detail at this documentation for
BufferPool
. In short, it seems that this feature is designed to bypass CUDA memory allocation API calls to speed up performance as follows:Stream
class is constructed, some amount of CUDA memory is pre-allocated and assigned to theStream
instance.BufferPool
class to get buffer memory (if needed) from the pre-allocted area, rather than calling the CUDA memory allocation API.Since there are many cases where CUDA algorithms need some amount of GPU memory (sometimes as small as a few bytes) for internal buffer, it seems reasonable to take this memory pre-allocation approach. I suppose that's the original reason behind this design, and this feature was enabled by default before #10751. It is now disabled by default by the mentioned PR, while users can still turn it on by
cv::cuda::setBufferPoolUsage(true);
.However, in my opinion, this feature should not be used since enabling it may lead to some problems.
The problems are:
The current design does not work well with device resetting function. The allocated CUDA memory is deallocated when the CUDA context is reset with
cv::cuda::resetDevice()
, but this is not recognized by theStream
module, leading to failures. The following code snippets fail to run due to this.It is error-prone since users would have to consider the deallocation order. The following code will show different images for seemingly unchanged variable.
The
Stream
module becomes not safe to multi-threaded application.Stream
is already mentioned in the documentation: https://docs.opencv.org/master/d9/df3/classcv_1_1cuda_1_1Stream.htmlhttps://github.com/nglee/opencv_test/blob/e6e7aab4202d285965d62dd79ff66e410fecc85c/cuda_stream_master/cuda_stream_master.cpp#L81-L107
All code snippets mentioned above run without error when memory pre-allocation mechanism is disabled by replacing
setBufferPoolUsage(true);
tosetBufferPoolUsage(false);
(or just by deleting thesetBufferPoolUsage(true);
line).So I'm suggesting that the memory pre-allocation feature of
Stream
module should be blocked. But to achieve this, theStackAllocator
,MemoryPool
, andBufferPool
class should be blocked or removed. Also, existing CUDA memory allocation usingBufferPool
should be replaced by ordinaryGpuMat
allocation using theDefaultAllocator
.Since all these changes seem too radical, another option I can think of is that we might stay the way as it is right now (disable the feature by default) and warn the users who would like to enable this feature by rewriting the documentation of
setBufferPoolUsage
function.The text was updated successfully, but these errors were encountered: