Arm Mali/OpenCL - OpenCL error CL_OUT_OF_RESOURCES #13108

ayaromenok · 2018-11-11T10:29:59Z

_update: this bug is specific to Mali Midgard(T8xx\T7xx\T6xx) GPU only. More resent Mali Bifrost(Gxx) series not affected

System information (version)

OpenCV => 4.0beta
Operating System / Platform => Ubuntu 18.04/arm64
Compiler => Ubuntu/Linaro 7.3.0-27
OpenCL => OpenCL 1.2 v1.r14p0-01rel0-git(966ed26)
GPU => ARM Mali T860(Midgard 4gen)

Detailed description

It is specific bug to arm\mali hardware in OpenCL, which related to limited resources on this mobile device.
OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('stage1_with_sobel', dims=2, globalsize=4096x4096x1, localsize=32x8x1) sync=false

reducing maxWorkGoupSize_ value of cl::Device from default 256 to 128 (to give more resources to kernel) remove that particular bug in cv::GausianBlur with size 3x3(and may be many more), but bigger parameters, like size '5x5' or other kernel, like described in #11503 issue, still lead to CL_OUT_OF_RESOURCES

I am continue to evaluate another CL-acceletared methods on Arm Mali-t860 to figure out another reasons of CL_OUT_OF_RESOURCES.

Steps to reproduce

#include "opencv2/opencv.hpp"
using namespace cv;
 
int main(int argc, char** argv)
{
    UMat img, gray;
    imread("lena.jpg", IMREAD_COLOR).copyTo(img); //use lena.jpg from CV examples
    cvtColor(img, gray, COLOR_BGR2GRAY);
    GaussianBlur(gray, gray,Size(3, 3), 1.5);
    Canny(gray, gray, 0, 50);   
    return 0;
}

The text was updated successfully, but these errors were encountered:

tomoaki0705 · 2018-11-11T13:59:46Z

You're too quickly jumping to the conclusion.
Did you check the result of GaussianBlur and Canny ? Was it really broken ?

You can't expect that the device has an infinite memory.
If the device is not capable of computation, such as insufficient memory, it should notice the user, and that's what exactly you're seeing.
The message OpenCL error CL_OUT_OF_RESOURCES looks like an error, but in general case, OpenCV has a fallback CPU version implementation.
So at the end, you get a correct result, and also know that you gave a task which was too heavy for Mali.

Is there any other reason you think this case is a bug ?

ayaromenok · 2018-11-11T15:16:12Z

Well, I believe, that a bug in specific combination of hardware/vendor driver and OpenCV/OpenCL (maybe even worse - graphic system, which is X11 and arm64 architecture/or kernel 4.4).

While result may be correct, fallback from hardware implementation(CL/GPU) to software implementation(CPU or CL over CPU) in case, if GPU hardware able to work with such kernels/algorithms - it's bug.

Proposed solution to decrease a maximum size of maxWorkGroupSize_ for arm/mali works for test case from above (soon I will prepare a patch), but for other test case with CL_OUT_OF_RESOURCES is not.

Creating this ticket, I am trying to evaluate (by receive some feedback from people, who get similar errors on arm/mali/opencl), how many issues can be solve with such solution (my expectation is about 5%, may be even less. looks like for more flexible Nvidia GPU it works much better).

I am continue to evaluate, how many CL kernels have issues on that particular hardware, so will update this ticket, probably for a long time :)

PS: can't assign ticket to myself (I don't won't to disturb core dev's with that issue)

tomoaki0705 · 2018-11-11T15:41:56Z

FYI, if you run entire opencv test suite, you'll get more than thousand lines of CL_OUT_OF_RESOURCES
Here's a grep from my local test result.

$ grep CL_OUT_OF_RESOURCES arm-firefly-rk3399-opencv_test_* | wc -l
2312

Firefly RK3399 has Mali T860.

Just for curiosity, how much memory does your Mali has ?

ayaromenok · 2018-11-11T17:53:55Z

number of CL_OUT_OF_RESOURCES errors depends from algorithm - it can happens in every line for 2D algorithm or even at more complicated template. so, it's not representative - only number of tests, which is passed/or not. especially opencv_perf_*.

While my issues happens on RK3399/T860/4 too, with driver/CV version from above, without yours detailed information about driver/window system/OS + OpenCV version, information from your grep is useless for this ticket:(

tomoaki0705 · 2018-11-12T02:37:44Z

I'm not following the situation.
Does the test fails in your situation ?

It passed on my firefly
Here's my log of opencv_test_imgproc and opencv_perf_imgproc

BTW, my Firefly has

GCC 5.4.0
Ubunntu 16.04
Driver version is OpenCL 1.2 v1.r13p0-00rel0-git(a4271c9).04dadb82d3612c978f88c00109101694
So, it's different platform. Just Mali T860 is common

Now your explanation is getting uncertain. Are you talking about specific test failure ? or are you just talking about printed log CL_OUT_OF_RESOURCES ?

…on of opencv#13108 - OpenCL error CL_OUT_OF_RESOURCES issue

ayaromenok · 2018-11-26T17:58:08Z

sorry for delay - just return to RK3399 board ;)

start test with opencv_test_core:

with workgroup size 256(default for T860): 598 CL_OUT_OF_RESOURCES errors
with workgroup size 128(default/2 for T860): 13 CL_OUT_OF_RESOURCES errors

Just for curiosity, how much memory does your Mali has ?

from my board: Max memory allocation size = 493 MB 19 KB. in your logs it's a 494 MB 34 KB

ayaromenok · 2018-11-26T18:18:47Z

removing duplicate error messages(multiply call of FFT, for example) from opencv_test_core:
for (default) 256

62:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=16x16x1, localsize=16x16x1) sync=false
6926:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
6946:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('meanStdDev', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=false
7833:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
8251:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
8624:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
8892:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
9780:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true 
9786:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=true
12761:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=160x768x1, localsize=160x1x1) sync=false
12899:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=160x768x1, localsize=160x1x1) sync=false
13167:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=16x80x1, localsize=16x16x1) sync=false

for (default/2) 128

7162:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('meanStdDev', dims=1, globalsize=512x1x1, localsize=128x1x1) sync=false
12370:OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('ifft_multi_radix_cols', dims=2, globalsize=641x96x1, localsize=1x96x1) sync=false

so, looks like solution to decrease size of working group is working and useful, but required additional testing

tomoaki0705 · 2018-11-26T20:42:48Z

Are you talking about specific test failure ? or are you just talking about printed log CL_OUT_OF_RESOURCES ?

:(

ayaromenok · 2018-11-27T09:42:11Z

Are you talking about specific test failure ? or are you just talking about printed log CL_OUT_OF_RESOURCES ?

CL_OUT_OF_RESOURCES may inform about different issues inside of OpenCL subsystem, like out_of_memory, out_of_stack_memory, out_of_heap_memory, to much nesting loops, out_of_registers and many more. Only detailed evaluation of every error with OpenCL debugger may give certain reason.

So it's not a specific test failure, since it's appear in different algorithms, like reduce, fft, ifft. And it's not ALL issues(only some of them), which described by CL_OUT_OF_RESOURCES in printed log.

Issue, which can not be fixed by proposed reducing WGSize(nesting loops), required more complicated evaluation/fixes, like rewriting OpenCL kernel and/or change data flow in C++ code.

BTW, my board if NanoPI M4 with Ubuntu 18.04/arm64 and slightly new driver, than yours. I will install a Ubuntu 16.04/arm32 to have more comparable results. Also I am trying to build OpenCV with OpenCL for ARM Bifrost(Mali G71) for android to make some test too.

alalek · 2018-11-27T10:03:00Z

BTW, There is one place in OpenCL with "dynamic" adjusting of OpenCL group size / kernel re-compilation for selected group size: filter2D implementation.
But this code doesn't look straightforward.

ayaromenok · 2018-11-27T10:39:22Z

There is one place in OpenCL with "dynamic" adjusting of OpenCL group size / kernel re-compilation for selected group size: filter2D implementation.

Thank you for this info - it can very helpful in future.

At present moment my idea is to detect a ARM Mali during initialization of CL device, and decrease a WorgGroup Size only for algorithms, where it may help (like gemm and reduce, but not for ifft) - to use "static" setup. Or even use one reduced workgroup size for all (if performance penalty would be small and\or would be even performance boost).

And need a lot of test before decide which solution is better.

ayaromenok · 2018-12-12T11:24:57Z

some updates:

this bug affect only older ARM Midgard GPU(T6xx/T7xx/T8xx), while latest ARM Bifrost(Gxx)(which have a bit different hardware technically) works just fine;
It takes a lot of time for me(due to issue with I/O on my ARM board) to run perf test on OpenCL mode with different WorkGroupSize: 256(default), 192, 128, 64, 32. While I continue to analyze results, looks like 128(default/2) is optimal value.

…Mali-Gxx/Bifrost(all OK) opencv#13108

ayaromenok · 2019-01-02T13:26:26Z

finally, I get stability from my board and can return to this issue.
I make a test with different size of working group(default 256, 192, 128, 64, 32) and get follow results:

core, imgproc, calib3d, stitch modules tested (with both test and perf);
20 unique OpenCL method(with different arguments) affected;
up to 9 of them can fixed by reduced maxWorkGroupSize;
while reduced size not really affect a performance on typical configuration of Arm Midgard GPU(4 cores, like on tested Rk3399), it can have bigger performance influence on very wide(8 cores - like Exynos 7420) or very narrow(1 core - like Exynos 7870)

full table

WG size	256	192	128	64	32
total methods	20	19	13	11	11
`stereoBM`	1	1	1	1	1
`stage1_with_sobel`	4	4	OK	OK	OK
`fft_multi_radix_rows`	4	2	OK	OK	OK
`ifft_multi_radix_rows`	4	2	OK	OK	OK
`ifft_multi_radix_cols`	3	2	1	OK	OK
`gemm`	6	OK	OK	OK	OK
`meanStdDev`	1	1	OK	OK	OK
`reduce`	1	1	OK	OK	OK
`pyrDown`	8	8	8	8	8
`row_filter_C1_D0`	6	6	6	6	6
`morph`	5	5	5	5	5
`row_filter`	6	6	6	6	6
`laplacian`	5	5	OK	OK	OK
`pyrUp`	5	5	5	5	5
`minmaxloc`	1	1	OK	OK	OK
`medianFilter3`	1	1	1	1	1
`medianFilter5`	5	5	5	5	5
`sobel3`	6	6	6	OK	OK
`sobel5`	1	1	1	1	1
`sobel7`	4	4	4	4	4

I am evaluating every method from above on target hardware(with some successful fixes already) an will update with results soon.

liyingchi · 2020-08-26T02:03:12Z

Thank you for your evaluation on this topic.
I would like to ask how to change the maxWorkGroupSize in the system by using python or C++ ?

tomoaki0705 · 2020-08-26T03:36:25Z

@liyingchi #11797 (comment)

Dithermaster · 2023-11-13T20:45:29Z

Are these CL_RGB images (and not CL_RGBA)? If so, check clGetSupportedImageFormats for your device, because CL_RGB is not often supported. Strangely, driver will let you create the unsupported image, compile the kernel, set the argument, but when you call enqueueNDRangeKernel it will return a status of -5 "CL_OUT_OF_RESOURCES". I have seen this on Windows NVIDIA and Intel GPU drivers; I did not test on AMD, or Mac.

Sorry for the very late answer, but I just ran into this today and this thread came up in my searching. Hat tip to Pragmataraxia at this link who solved it: https://forums.developer.nvidia.com/t/what-else-causes-cl-out-of-resources/11426/2

ayaromenok added a commit to ayaromenok/opencv that referenced this issue Nov 26, 2018

CL/maxWorkingGroupSize for ARM Mali decreased by 2 times for evaluati…

8b544a8

…on of opencv#13108 - OpenCL error CL_OUT_OF_RESOURCES issue

ayaromenok added a commit to ayaromenok/testData that referenced this issue Nov 27, 2018

added test data for opencv/opencv#13108

861f38d

ayaromenok mentioned this issue Nov 27, 2018

OpenCV/OpenCL/Mali data ayaromenok/testData#3

Open

ayaromenok added a commit to ayaromenok/opencv that referenced this issue Nov 27, 2018

correct ARM Mali detection for opencv#13108

58c4b72

alalek mentioned this issue Dec 11, 2018

CL_INVALID_WORK_GROUP_SIZE issue #13414

Open

ayaromenok added a commit to ayaromenok/opencv that referenced this issue Dec 17, 2018

added detection of ARM Mali-Txx/Mildgard(CL workaround required) and …

131b140

…Mali-Gxx/Bifrost(all OK) opencv#13108

alalek mentioned this issue Dec 17, 2018

core(ocl): add parameter to limit device max workgroup size #13464

Merged

ayaromenok added a commit to ayaromenok/opencv that referenced this issue Jan 2, 2019

sobel3/sobel5/sobel7 on Arm Midgard fix from opencv#13108

1a55915

ayaromenok added a commit to ayaromenok/opencv that referenced this issue Jan 2, 2019

row_filter_C1_D0 on Arm Midgard fix from opencv#13108

ed31fff

alalek mentioned this issue Aug 17, 2021

ocl::OpenCLExecutionContext::create(Context&, Device&) fails when device is created independently #20537

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm Mali/OpenCL - OpenCL error CL_OUT_OF_RESOURCES #13108

Arm Mali/OpenCL - OpenCL error CL_OUT_OF_RESOURCES #13108

ayaromenok commented Nov 11, 2018 •

edited

tomoaki0705 commented Nov 11, 2018

ayaromenok commented Nov 11, 2018 •

edited

tomoaki0705 commented Nov 11, 2018

ayaromenok commented Nov 11, 2018 •

edited

tomoaki0705 commented Nov 12, 2018

ayaromenok commented Nov 26, 2018 •

edited

ayaromenok commented Nov 26, 2018 •

edited

tomoaki0705 commented Nov 26, 2018

ayaromenok commented Nov 27, 2018 •

edited

alalek commented Nov 27, 2018

ayaromenok commented Nov 27, 2018

ayaromenok commented Dec 12, 2018

ayaromenok commented Jan 2, 2019 •

edited

liyingchi commented Aug 26, 2020

tomoaki0705 commented Aug 26, 2020

Dithermaster commented Nov 13, 2023 •

edited

Arm Mali/OpenCL - OpenCL error CL_OUT_OF_RESOURCES #13108

Arm Mali/OpenCL - OpenCL error CL_OUT_OF_RESOURCES #13108

Comments

ayaromenok commented Nov 11, 2018 • edited

System information (version)

Detailed description

Steps to reproduce

tomoaki0705 commented Nov 11, 2018

ayaromenok commented Nov 11, 2018 • edited

tomoaki0705 commented Nov 11, 2018

ayaromenok commented Nov 11, 2018 • edited

tomoaki0705 commented Nov 12, 2018

ayaromenok commented Nov 26, 2018 • edited

ayaromenok commented Nov 26, 2018 • edited

tomoaki0705 commented Nov 26, 2018

ayaromenok commented Nov 27, 2018 • edited

alalek commented Nov 27, 2018

ayaromenok commented Nov 27, 2018

ayaromenok commented Dec 12, 2018

ayaromenok commented Jan 2, 2019 • edited

liyingchi commented Aug 26, 2020

tomoaki0705 commented Aug 26, 2020

Dithermaster commented Nov 13, 2023 • edited

ayaromenok commented Nov 11, 2018 •

edited

ayaromenok commented Nov 11, 2018 •

edited

ayaromenok commented Nov 11, 2018 •

edited

ayaromenok commented Nov 26, 2018 •

edited

ayaromenok commented Nov 26, 2018 •

edited

ayaromenok commented Nov 27, 2018 •

edited

ayaromenok commented Jan 2, 2019 •

edited

Dithermaster commented Nov 13, 2023 •

edited