Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CL_INVALID_WORK_GROUP_SIZE on calling OpenCL kernels minmaxloc, reduce and some others #11797

Open
apolotsk opened this issue Jun 20, 2018 · 6 comments

Comments

@apolotsk
Copy link

apolotsk commented Jun 20, 2018

System information (version)
  • OpenCV => 3.4.0; 3.4.1
    • 3.4.0 was cross-compiled by Yocto.
    • 3.4.1 was natively-compiled.
  • Operating System / Platform => Yocto Linux 2.4 / i.MX 8M QUAD EVK
  • Compiler => g++ 7.3.0
Description

When running opencv_perf_core, some tests output one of the following lines up to a hundred times:

OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=160x720x1, localsize=160x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=240x1080x1, localsize=240x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=256x2048x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=160x640x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=320x1280x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=320x640x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=640x1280x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_cols', dims=2, globalsize=1025x256x1, localsize=1x256x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_cols', dims=2, globalsize=961x135x1, localsize=1x135x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=160x720x1, localsize=160x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=240x1080x1, localsize=240x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=256x2048x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('meanStdDev', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x1088x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x2176x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x480x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x736x1, localsize=32x32x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('stage1_with_sobel', dims=2, globalsize=1920x1088x1, localsize=32x32x1) sync=false

The test result, however, is successful.
Are these errors expected?

Detailed description

A complete output:

# opencv_perf_core --gtest_filter=OCL_MinMaxLocFixture_MinMaxLoc.MinMaxLoc/0
Time compensation is 0
CTEST_FULL_OUTPUT
OpenCV version: 3.4.1
OpenCV VCS version: unknown
Build type: release
Parallel framework: pthreads
CPU features: neon fp16
[ INFO:0] Initialize OpenCL runtime...
OpenCL Platforms: 
    Vivante OpenCL Platform
        iGPU: Vivante OpenCL Device GC7000L.6214.0000 (OpenCL 1.2 )
Current OpenCL device: 
    Type = iGPU
    Name = Vivante OpenCL Device GC7000L.6214.0000
    Version = OpenCL 1.2 
    Driver version = OpenCL 1.2 V6.2.4.p1.150331
    Address bits = 32
    Compute units = 4
    Max work group size = 1024
    Local memory size = 32 KB
    Max memory allocation size = 128 MB
    Double support = No
    Host unified memory = Yes
    Device extensions:
        cl_khr_byte_addressable_store
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_gl_sharing
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 4
    Preferred vector width short = 4
    Preferred vector width int = 4
    Preferred vector width long = 4
    Preferred vector width float = 4
    Preferred vector width double = 0
Note: Google Test filter = OCL_MinMaxLocFixture_MinMaxLoc.MinMaxLoc/0
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from OCL_MinMaxLocFixture_MinMaxLoc
[ RUN      ] OCL_MinMaxLocFixture_MinMaxLoc.MinMaxLoc/0, where GetParam() = (640x480, 8UC1)
[ INFO:0] Successfully initialized OpenCL cache directory: /home/root/.cache/opencv/3.4.1/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--Vivante_Corporation--Vivante_OpenCL_Device_GC7000L_6214_0000--OpenCL_1_2_V6_2_4_p1_150331
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('minmaxloc', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=true
[ PERFSTAT ]    (samples=10   mean=132.07   median=131.35   min=130.96   stddev=2.12 (1.6%))
[       OK ] OCL_MinMaxLocFixture_MinMaxLoc.MinMaxLoc/0 (1327 ms)
[----------] 1 test from OCL_MinMaxLocFixture_MinMaxLoc (1327 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1327 ms total)
[  PASSED  ] 1 test.

Device info:

root@imx8mqevk:/opt/imx-gpu-sdk/OpenCL/Info# ./Info 
Dumping platform info for 1 platforms.
  *** Platform #0 ***
  Platform version: 1.2
  CL_PLATFORM_PROFILE: FULL_PROFILE
  CL_PLATFORM_VERSION: OpenCL 1.2 V6.2.4.p1.150331
  CL_PLATFORM_NAME: Vivante OpenCL Platform
  CL_PLATFORM_VENDOR: Vivante Corporation
  CL_PLATFORM_EXTENSIONS: cl_khr_icd


Dumping detailed device info for 1 platforms.
  *** Platform #0 ***
  Platform version: 1.2
  CL_PLATFORM_PROFILE: FULL_PROFILE
  CL_PLATFORM_VERSION: OpenCL 1.2 V6.2.4.p1.150331
  CL_PLATFORM_NAME: Vivante OpenCL Platform
  CL_PLATFORM_VENDOR: Vivante Corporation
  CL_PLATFORM_EXTENSIONS: cl_khr_icd
  Enumerating devices of type: CL_DEVICE_TYPE_CPU
    - Not supported
  Enumerating devices of type: CL_DEVICE_TYPE_GPU
    --- Device #0 ---
    Device version: 1.2
    CL_DEVICE_ADDRESS_BITS: 32
    CL_DEVICE_AVAILABLE: 1
    CL_DEVICE_BUILT_IN_KERNELS: 
    CL_DEVICE_COMPILER_AVAILABLE: 1
    CL_DEVICE_DOUBLE_FP_CONFIG: 0
    CL_DEVICE_ENDIAN_LITTLE: 1
    CL_DEVICE_ERROR_CORRECTION_SUPPORT: 1
    CL_DEVICE_EXECUTION_CAPABILITIES: 1
    CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing 
    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 8192
    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: 2
    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 8589934656
    CL_DEVICE_GLOBAL_MEM_SIZE: 268435456
    CL_DEVICE_HALF_FP_CONFIG: 0
    CL_DEVICE_HOST_UNIFIED_MEMORY: 1
    CL_DEVICE_IMAGE_SUPPORT: 1
    CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
    CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
    CL_DEVICE_IMAGE3D_MAX_DEPTH: 8192
    CL_DEVICE_IMAGE3D_MAX_HEIGHT: 8192
    CL_DEVICE_IMAGE3D_MAX_WIDTH: 8192
    CL_DEVICE_IMAGE_MAX_BUFFER_SIZE: 65536
    CL_DEVICE_IMAGE_MAX_ARRAY_SIZE: 8192
    CL_DEVICE_LINKER_AVAILABLE: 1
    CL_DEVICE_LOCAL_MEM_SIZE: 32768
    CL_DEVICE_LOCAL_MEM_TYPE: 2
    CL_DEVICE_MAX_CLOCK_FREQUENCY: 500
    CL_DEVICE_MAX_COMPUTE_UNITS: 4
    CL_DEVICE_MAX_CONSTANT_ARGS: 9
    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
    CL_DEVICE_MAX_MEM_ALLOC_SIZE: 134217728
    CL_DEVICE_MAX_PARAMETER_SIZE: 1024
    CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
    CL_DEVICE_MAX_SAMPLERS: 16
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
    CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1024, 1024, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
    CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
    CL_DEVICE_NAME: Vivante OpenCL Device GC7000L.6214.0000
    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 0
    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 
    CL_DEVICE_PARENT_DEVICE: 0
    CL_DEVICE_PARTITION_MAX_SUB_DEVICES: 0
    CL_DEVICE_PARTITION_PROPERTIES: 0, 0, 0, 0
    CL_DEVICE_PARTITION_AFFINITY_DOMAIN: 0
    CL_DEVICE_PARTITION_TYPE: 0, 0, 0, 0
    CL_DEVICE_PLATFORM: 0xffff7a08eea0
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 0
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
    CL_DEVICE_PRINTF_BUFFER_SIZE: 1048576
    CL_DEVICE_PREFERRED_INTEROP_USER_SYNC: 1
    CL_DEVICE_PROFILE: FULL_PROFILE
    CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
    CL_DEVICE_QUEUE_PROPERTIES: 3
    CL_DEVICE_REFERENCE_COUNT: 1
    CL_DEVICE_SINGLE_FP_CONFIG: 14
    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
    CL_DEVICE_VENDOR: Vivante Corporation
    CL_DEVICE_VENDOR_ID: 5654870
    CL_DEVICE_VERSION: OpenCL 1.2 
    CL_DRIVER_VERSION: OpenCL 1.2 V6.2.4.p1.150331
  Enumerating devices of type: CL_DEVICE_TYPE_ACCELERATOR
    - Not supported
  Enumerating devices of type: CL_DEVICE_TYPE_CUSTOM
    - Not supported
  Enumerating devices of type: CL_DEVICE_TYPE_ALL
    --- Device #0 ---
    Device version: 1.2
    CL_DEVICE_ADDRESS_BITS: 32
    CL_DEVICE_AVAILABLE: 1
    CL_DEVICE_BUILT_IN_KERNELS: 
    CL_DEVICE_COMPILER_AVAILABLE: 1
    CL_DEVICE_DOUBLE_FP_CONFIG: 0
    CL_DEVICE_ENDIAN_LITTLE: 1
    CL_DEVICE_ERROR_CORRECTION_SUPPORT: 1
    CL_DEVICE_EXECUTION_CAPABILITIES: 1
    CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing 
    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 8192
    CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: 2
    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 8589934656
    CL_DEVICE_GLOBAL_MEM_SIZE: 268435456
    CL_DEVICE_HALF_FP_CONFIG: 0
    CL_DEVICE_HOST_UNIFIED_MEMORY: 1
    CL_DEVICE_IMAGE_SUPPORT: 1
    CL_DEVICE_IMAGE2D_MAX_HEIGHT: 8192
    CL_DEVICE_IMAGE2D_MAX_WIDTH: 8192
    CL_DEVICE_IMAGE3D_MAX_DEPTH: 8192
    CL_DEVICE_IMAGE3D_MAX_HEIGHT: 8192
    CL_DEVICE_IMAGE3D_MAX_WIDTH: 8192
    CL_DEVICE_IMAGE_MAX_BUFFER_SIZE: 65536
    CL_DEVICE_IMAGE_MAX_ARRAY_SIZE: 8192
    CL_DEVICE_LINKER_AVAILABLE: 1
    CL_DEVICE_LOCAL_MEM_SIZE: 32768
    CL_DEVICE_LOCAL_MEM_TYPE: 2
    CL_DEVICE_MAX_CLOCK_FREQUENCY: 500
    CL_DEVICE_MAX_COMPUTE_UNITS: 4
    CL_DEVICE_MAX_CONSTANT_ARGS: 9
    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
    CL_DEVICE_MAX_MEM_ALLOC_SIZE: 134217728
    CL_DEVICE_MAX_PARAMETER_SIZE: 1024
    CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
    CL_DEVICE_MAX_SAMPLERS: 16
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
    CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
    CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024, 1024, 1024, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
    CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
    CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
    CL_DEVICE_NAME: Vivante OpenCL Device GC7000L.6214.0000
    CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 0
    CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 
    CL_DEVICE_PARENT_DEVICE: 0
    CL_DEVICE_PARTITION_MAX_SUB_DEVICES: 0
    CL_DEVICE_PARTITION_PROPERTIES: 0, 0, 0, 0
    CL_DEVICE_PARTITION_AFFINITY_DOMAIN: 0
    CL_DEVICE_PARTITION_TYPE: 0, 0, 0, 0
    CL_DEVICE_PLATFORM: 0xffff7a08eea0
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 4
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 0
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
    CL_DEVICE_PRINTF_BUFFER_SIZE: 1048576
    CL_DEVICE_PREFERRED_INTEROP_USER_SYNC: 1
    CL_DEVICE_PROFILE: FULL_PROFILE
    CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
    CL_DEVICE_QUEUE_PROPERTIES: 3
    CL_DEVICE_REFERENCE_COUNT: 1
    CL_DEVICE_SINGLE_FP_CONFIG: 14
    CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
    CL_DEVICE_VENDOR: Vivante Corporation
    CL_DEVICE_VENDOR_ID: 5654870
    CL_DEVICE_VERSION: OpenCL 1.2 
    CL_DRIVER_VERSION: OpenCL 1.2 V6.2.4.p1.150331

Cmake output:

# cmake -D CMAKE_BUILD_TYPE=Release -D BUILD_TESTS=ON -D INSTALL_TESTS=ON ..
-- Looking for ccache - found (/usr/bin/ccache)
-- Found ZLIB: /usr/lib/libz.so (found suitable version "1.2.11", minimum required is "1.2.3") 
-- Could NOT find Jasper (missing:  JASPER_LIBRARIES JASPER_INCLUDE_DIR) 
-- Found ZLIB: /usr/lib/libz.so (found version "1.2.11") 
-- Checking for module 'gtk+-3.0'
--   No package 'gtk+-3.0' found
-- Checking for module 'gtk+-2.0'
--   No package 'gtk+-2.0' found
-- Checking for module 'gthread-2.0'
--   Found gthread-2.0, version 2.52.3
-- Checking for module 'gstreamer-base-1.0'
--   Found gstreamer-base-1.0, version 1.12.2
-- Checking for module 'gstreamer-video-1.0'
--   Found gstreamer-video-1.0, version 1.12.2
-- Checking for module 'gstreamer-app-1.0'
--   Found gstreamer-app-1.0, version 1.12.2
-- Checking for module 'gstreamer-riff-1.0'
--   Found gstreamer-riff-1.0, version 1.12.2
-- Checking for module 'gstreamer-pbutils-1.0'
--   Found gstreamer-pbutils-1.0, version 1.12.2
-- Checking for module 'libdc1394-2'
--   No package 'libdc1394-2' found
-- Checking for module 'libdc1394'
--   No package 'libdc1394' found
-- Looking for linux/videodev.h
-- Looking for linux/videodev.h - not found
-- Looking for linux/videodev2.h
-- Looking for linux/videodev2.h - found
-- Looking for sys/videoio.h
-- Looking for sys/videoio.h - not found
-- Checking for modules 'libavcodec;libavformat;libavutil;libswscale'
--   No package 'libavcodec' found
--   No package 'libavformat' found
--   No package 'libavutil' found
--   No package 'libswscale' found
-- Checking for module 'libavresample'
--   No package 'libavresample' found
-- Checking for module 'libgphoto2'
--   Found libgphoto2, version 2.5.8
-- Could not find OpenBLAS include. Turning OpenBLAS_FOUND off
-- Could not find OpenBLAS lib. Turning OpenBLAS_FOUND off
-- Could NOT find Atlas (missing:  Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) 
-- A library with BLAS API not found. Please specify library location.
-- LAPACK requires BLAS
-- A library with LAPACK API not found. Please specify library location.
-- Could NOT find JNI (missing:  JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) 
-- Could NOT find Matlab (missing:  MATLAB_MEX_SCRIPT MATLAB_INCLUDE_DIRS MATLAB_ROOT_DIR MATLAB_LIBRARIES MATLAB_LIBRARY_DIRS MATLAB_MEXEXT MATLAB_ARCH MATLAB_BIN) 
-- VTK is not found. Please set -DVTK_DIR in CMake to VTK build directory, or to VTK install subdirectory with VTKConfig.cmake file
-- Excluding from source files list: modules/core/src/convert.avx2.cpp
-- Excluding from source files list: modules/core/src/convert.sse4_1.cpp
-- Excluding from source files list: modules/imgproc/src/corner.avx.cpp
-- Excluding from source files list: modules/imgproc/src/filter.avx2.cpp
-- Excluding from source files list: modules/imgproc/src/imgwarp.avx2.cpp
-- Excluding from source files list: modules/imgproc/src/imgwarp.sse4_1.cpp
-- Excluding from source files list: modules/imgproc/src/resize.avx2.cpp
-- Excluding from source files list: modules/imgproc/src/resize.sse4_1.cpp
-- Excluding from source files list: modules/imgproc/src/undistort.avx2.cpp
-- Excluding from source files list: modules/objdetect/src/haar.avx.cpp
-- Excluding from source files list: <BUILD>/modules/dnn/layers/layers_common.avx.cpp
-- Excluding from source files list: <BUILD>/modules/dnn/layers/layers_common.avx2.cpp
-- Excluding from source files list: <BUILD>/modules/dnn/layers/layers_common.avx512_skx.cpp
-- Excluding from source files list: modules/features2d/src/fast.avx2.cpp
-- 
-- General configuration for OpenCV 3.4.1 =====================================
--   Version control:               unknown
-- 
--   Platform:
--     Timestamp:                   2018-06-20T07:07:40Z
--     Host:                        Linux 4.9.88-imx_4.9.88_2.0.0_ga+g5e23f9d aarch64
--     CMake:                       3.8.2
--     CMake generator:             Unix Makefiles
--     CMake build tool:            /usr/bin/make
--     Configuration:               Release
-- 
--   CPU/HW features:
--     Baseline:                    NEON FP16
--       required:                  NEON
--       disabled:                  VFPV3
-- 
--   C/C++:
--     Built as dynamic libs?:      YES
--     C++11:                       YES
--     C++ Compiler:                /usr/bin/c++  (ver 7.3.0)
--     C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -WundG
--     C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -WundG
--     C Compiler:                  /usr/bin/cc
--     C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -WmisG
--     C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -WmisG
--     Linker flags (Release):      
--     Linker flags (Debug):        
--     ccache:                      YES
--     Precompiled headers:         NO
--     Extra dependencies:          dl m pthread rt
--     3rdparty dependencies:
-- 
--   OpenCV modules:
--     To be built:                 calib3d core dnn features2d flann highgui imgcodecs imgproc java_bindings_generator ml objdetect photo python_bindings_generator shape stitching superres ts video videob
--     Disabled:                    js world
--     Disabled by dependency:      -
--     Unavailable:                 cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev java python2 python3 viz
--     Applications:                tests perf_tests apps
--     Documentation:               NO
--     Non-free algorithms:         NO
-- 
--   GUI: 
--     GTK+:                        NO
--     VTK support:                 NO
-- 
--   Media I/O: 
--     ZLib:                        /usr/lib/libz.so (ver 1.2.11)
--     JPEG:                        /usr/lib/libjpeg.so (ver )
--     WEBP:                        /usr/lib/libwebp.so (ver encoder: 0x020e)
--     PNG:                         /usr/lib/libpng.so (ver 1.6.31)
--     TIFF:                        /usr/lib/libtiff.so (ver 42 / 4.0.8)
--     JPEG 2000:                   build (ver 1.900.1)
--     OpenEXR:                     build (ver 1.7.1)
-- 
--   Video I/O:
--     DC1394:                      NO
--     FFMPEG:                      NO
--       avcodec:                   NO
--       avformat:                  NO
--       avutil:                    NO
--       swscale:                   NO
--       avresample:                NO
--     GStreamer:                   
--       base:                      YES (ver 1.12.2)
--       video:                     YES (ver 1.12.2)
--       app:                       YES (ver 1.12.2)
--       riff:                      YES (ver 1.12.2)
--       pbutils:                   YES (ver 1.12.2)
--     libv4l/libv4l2:              NO
--     v4l/v4l2:                    linux/videodev2.h
--     gPhoto2:                     YES
-- 
--   Parallel framework:            pthreads
-- 
--   Trace:                         YES (built-in)
-- 
--   Other third-party libraries:
--     Lapack:                      NO
--     Eigen:                       NO
--     Custom HAL:                  YES (carotene (ver 0.0.1))
--     Protobuf:                    build (3.5.1)
-- 
--   NVIDIA CUDA:                   NO
-- 
--   OpenCL:                        YES (no extra features)
--     Include path:                /home/root/opencv-3.4.1/3rdparty/include/opencl/1.2
--     Link libraries:              Dynamic load
-- 
--   Python (for build):            /usr/bin/python2.7
-- 
--   Java:                          
--     ant:                         NO
--     JNI:                         NO
--     Java wrappers:               NO
--     Java tests:                  NO
-- 
--   Matlab:                        NO
-- 
--   Install to:                    /usr
-- -----------------------------------------------------------------
-- 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/root/opencv-3.4.1/build
@apolotsk apolotsk changed the title CL_INVALID_WORK_GROUP_SIZE on calling OpenCL kernels minmaxloc and reduce CL_INVALID_WORK_GROUP_SIZE on calling OpenCL kernels minmaxloc, reduce and some others Jun 20, 2018
@alalek
Copy link
Member

alalek commented Jun 21, 2018

localsize=1024x1x1

because of this device setting:

CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024

Tests don't fail, because OpenCV fallback to CPU code if OpenCL runtime reports some error (or OpenCL is not available).

Perhaps OpenCV should query kernel-specific limits (but there is recursion because these kernels are built for specific group size) or limit max group size globally as a workaround.

Another question is efficiency / performance of such large groups (it is platform specific, so there is no straightforward way) - we don't run OpenCL kernels on devices with groups more that 256 items.

Could you try to override maxWorkGroupSize_ values (=256) in ocl.cpp and try to rerun tests in your configuration?

@apolotsk
Copy link
Author

Could you try to override maxWorkGroupSize_ values (=256) in ocl.cpp and try to rerun tests in your configuration?

Now running opencv_perf_core produces a lot of

OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=160x720x1, localsize=160x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=240x1080x1, localsize=240x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('fft_multi_radix_rows', dims=2, globalsize=256x2048x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=160x640x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('gemm', dims=2, globalsize=320x1280x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_cols', dims=2, globalsize=1025x256x1, localsize=1x256x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_cols', dims=2, globalsize=961x135x1, localsize=1x135x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=160x720x1, localsize=160x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=240x1080x1, localsize=240x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('ifft_multi_radix_rows', dims=2, globalsize=256x2048x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('meanStdDev', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce', dims=1, globalsize=1024x1x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x1080x1, localsize=32x8x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x2160x1, localsize=32x8x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x480x1, localsize=32x8x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('reduce_horz_opt', dims=2, globalsize=32x720x1, localsize=32x8x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('stage1_with_sobel', dims=2, globalsize=1920x1080x1, localsize=32x8x1) sync=false

@5p00kk
Copy link

5p00kk commented Jan 16, 2020

Hi,

Have you managed to solve that problem?

@apolotsk
Copy link
Author

apolotsk commented Jan 17, 2020

Hi Szymon. Unfortunately, not. However, issue #13414 seems to be related and is more active.

@alalek
Copy link
Member

alalek commented Jan 17, 2020

Try to override group size in OpenCV through environment variable OPENCV_OPENCL_DEVICE_MAX_WORK_GROUP_SIZE.

@iscemy
Copy link

iscemy commented Sep 30, 2022

We are having the same issue with i.mx8mp SoC's. Is there any update on this issue ?

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants