Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT in cv::hal_AVX op_add #1909

Open
AronRubin opened this issue Nov 16, 2018 · 1 comment
Open

SEGFAULT in cv::hal_AVX op_add #1909

AronRubin opened this issue Nov 16, 2018 · 1 comment

Comments

@AronRubin
Copy link

#0 cv::hal_AVX2::v256_load_aligned (ptr=0x540afa40)
at ../modules/core/include/opencv2/core/hal/intrin_avx.hpp:376
#1 0x0000000001fddb30 in cv::hal_AVX2::simd256::vx_load_aligned (
ptr=0x540afa40) at ../modules/core/include/opencv2/core/hal/intrin.hpp:342
#2 0x0000000001f3a6a4 in cv::hal::opt_AVX2::bin_loader<cv::hal::opt_AVX2::op_add, double, cv::hal_AVX2::v_float64x4>::la (src1=0x540afa40, src2=0x5720cea0,
dst=0x542dd340)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:344
#3 0x0000000001f5bb91 in cv::hal::opt_AVX2::bin_loop<cv::hal::opt_AVX2::op_add, double, cv::hal_AVX2::v_float64x4> (src1=0x540afa40, step1=0,
src2=0x5720cea0, step2=0, dst=0x542dd340, step=0, width=128, height=0)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:417
#4 0x0000000001f4b77f in cv::hal::opt_AVX2::add64f (src1=0x540afa40,
step1=1, src2=0x5720cea0, step2=1, dst=0x542dd340, step=1, width=128,
height=1)
at C:/Users/arubin/git/opencv/modules/core/src/arithm.simd.hpp:535
#5 0x0000000001f2fb37 in cv::hal::add64f (src1=0x540afa40, step1=1,
src2=0x5720cea0, step2=1, dst=0x542dd340, step=1, width=128, height=1)
at ../modules/core/src/arithm.simd.hpp:535
#6 0x000000000207538c in cv::arithm_op (_src1=..., _src2=..., _dst=...,
_mask=..., dtype=14, tab=0x21c1ea0 cv::getAddTab()::addTab,
muldiv=false, usrdata=0x0, oclop=0) at ../modules/core/src/arithm.cpp:857
#7 0x0000000001ecdd5c in cv::add (src1=..., src2=..., dst=..., mask=...,
dtype=-1) at ../modules/core/src/arithm.cpp:930
#8 0x00000000020e399c in cv::MatOp_AddEx::assign (
this=0x21b9450 cv::g_MatOp_AddEx, e=..., m=..., _type=-1)
at ../modules/core/src/matrix_expressions.cpp:1251
#9 0x00000000058b2c15 in cv::MatExpr::operator cv::Mat (this=0x175eb30)
at C:/Users/arubin/git/opencv/modules/core/include/opencv2/core/mat.inl.hpp:3416
#10 0x0000000005889b00 in cv::face::FacemarkLBFImpl::fitImpl (
this=0x447c4ff0, image=..., landmarks=...)
at C:/Users/arubin/git/opencv_contrib/modules/face/src/facemarkLBF.cpp:432
#11 0x0000000005889485 in cv::face::FacemarkLBFImpl::fit (this=0x447c4ff0,
image=..., roi=..., _landmarks=...)
at C:/Users/arubin/git/opencv_contrib/modules/face/src/facemarkLBF.cpp:386

Where src1 is a Mat (originally a UMat)
$3 = {flags = 1124024334, dims = 2, rows = 68, cols = 1,
data = 0x540afa40 "X3\023p▒▒L@r▒HVp▒f@*\b8/ZN@yh▒▒R\200l@▒\003\064#i▒P@x/0=e\016q@&n▒▒▒HS@▒▒\023▒▒\024t@X7>▒▒qX@:6▒ph▒v@br\032\064▒I@",
datastart = 0x540afa40 "X3\023p▒▒L@r▒HVp▒f@*\b8/ZN@yh▒▒R\200l@▒\003\064#i▒P@x/0=e\016q@&n▒▒▒HS@▒▒\023▒▒\024t@X7>▒▒qX@:6▒ph▒v@br\032\064▒I@",
dataend = 0x540afe80 "651125,3hXX\r▒▒\022",
datalimit = 0x540afe80 "651125,3hXX\r▒▒\022", allocator = 0x0,
u = 0x56270cf0, size = {p = 0x175ddc8}, step = {p = 0x175de10, buf = {16,
16}}}

And src2 is a Scalar/Matx
$4 = {flags = 1124024326, dims = 2, rows = 4, cols = 1, data = 0x175ec70 "",
datastart = 0x175ec70 "", dataend = 0x175ec90 "", datalimit = 0x175ec90 "",
allocator = 0x0, u = 0x0, size = {p = 0x175dd68}, step = {p = 0x175ddb0,
buf = {8, 8}}}

System information follows:
$ opencv_version_win32d

General configuration for OpenCV 4.0.0-pre =====================================
Version control: 4.0.0-rc-25-gf81370232-dirty

Extra modules:
Location (extra): C:/Users/arubin/git/opencv_contrib/modules
Version control (extra): 4.0.0-rc-2-gec4d5c85-dirty

Platform:
Timestamp: 2018-09-14T16:15:53Z
Host: Windows 10.0.17134 AMD64
CMake: 3.13.0-rc3
CMake generator: Ninja
CMake build tool: C:/msys64/mingw64/bin/ninja.exe
Configuration: Debug

CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (5 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (4 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (11 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

C/C++:
Built as dynamic libs?: YES
C++ Compiler: C:/msys64/mingw64/bin/g++.exe (ver 8.2.0)
C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG -g1
C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: C:/msys64/mingw64/bin/gcc.exe
C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG -g1
C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -fomit-frame-pointer -ffast-math -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--gc-sections
Linker flags (Debug): -Wl,--gc-sections
ccache: NO
Precompiled headers: YES
Extra dependencies: opengl32 glu32 C:/msys64/opt/halide/lib/libHalide.a z
3rdparty dependencies:

OpenCV modules:
To be built: aruco bgsegm bioinspired calib3d ccalib core cvv datasets dnn dnn_objdetect dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping photo plot reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab viz xfeatures2d ximgproc xobjdetect xphoto
Disabled: ovis python3 python_bindings_generator world
Disabled by dependency: -
Unavailable: cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev freetype java js matlab python2 python2 sfm
Applications: apps
Documentation: NO
Non-free algorithms: YES

Windows RT support: NO

GUI:
QT: YES (ver 5.11.2)
QT OpenGL support: YES (Qt5::OpenGL 5.11.2)
Win32 UI: YES
OpenGL support: YES (opengl32 glu32)
VTK support: YES (ver 8.1.1)

Media I/O:
ZLib: build (ver 1.2.11)
JPEG: build-libjpeg-turbo (ver 1.5.3-62)
WEBP: build (ver encoder: 0x020e)
PNG: build (ver 1.6.35)
TIFF: build (ver 42 - 4.0.9)
JPEG 2000: build (ver 1.900.1)
OpenEXR: build (ver 1.7.1)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES

Video I/O:
DC1394: NO
FFMPEG: YES (prebuilt binaries)
avcodec: YES (ver 58.35.100)
avformat: YES (ver 58.20.100)
avutil: YES (ver 56.22.100)
swscale: YES (ver 5.3.100)
avresample: YES (ver 4.0.0)
GStreamer:
base: YES (ver 1.0)
video: YES (ver 1.0)
app: YES (ver 1.0)
riff: YES (ver 1.0)
pbutils: YES (ver 1.0)
DirectShow: YES

Parallel framework: none

Trace: YES (built-in)

Other third-party libraries:
Lapack: YES (C:/msys64/mingw64/lib/libopenblas.dll.a)
Halide: YES (C:/msys64/opt/halide/lib/libHalide.a C:/msys64/opt/halide/include)
Eigen: YES (ver 3.3.5)
Custom HAL: NO
Protobuf: build (3.5.1)

OpenCL: YES (SVM)
Include path: C:/Users/arubin/git/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load

Python (for build): C:/msys64/mingw64/bin/python2.7.exe

Java:
ant: NO
JNI: C:/Program Files/Java/jdk1.8.0_161/include C:/Program Files/Java/jdk1.8.0_161/include/win32 C:/Program Files/Java/jdk1.8.0_161/include
Java wrappers: NO
Java tests: NO

Install to: C:/msys64/mingw64

[ INFO:0] Initialize OpenCL runtime...
OpenCL Platforms:
Intel(R) OpenCL
iGPU: Intel(R) HD Graphics 630 (OpenCL 2.1 NEO )
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (OpenCL 2.1 (Build 611))
NVIDIA CUDA
dGPU: GeForce GTX 1070 (OpenCL 1.2 CUDA)
Current OpenCL device:
Type = iGPU
Name = Intel(R) HD Graphics 630
Version = OpenCL 2.1 NEO
Driver version = 23.20.16.4973
Address bits = 64
Compute units = 24
Max work group size = 256
Local memory size = 64 KB
Max memory allocation size = 3 GB 179 MB 418 KB
Double support = Yes
Host unified memory = Yes
Device extensions:
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_fp16
cl_khr_depth_images
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_icd
cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_intel_subgroups
cl_intel_required_subgroup_size
cl_intel_subgroups_short
cl_khr_spir
cl_intel_accelerator
cl_intel_media_block_io
cl_intel_driver_diagnostics
cl_intel_device_side_avc_motion_estimation
cl_khr_priority_hints
cl_khr_subgroups
cl_khr_il_program
cl_khr_fp64
cl_intel_planar_yuv
cl_intel_packed_yuv
cl_intel_motion_estimation
cl_intel_advanced_motion_estimation
cl_khr_gl_sharing
cl_khr_gl_depth_images
cl_khr_gl_event
cl_khr_gl_msaa_sharing
cl_intel_dx9_media_sharing
cl_khr_dx9_media_sharing
cl_khr_d3d10_sharing
cl_khr_d3d11_sharing
cl_intel_d3d11_nv12_media_sharing
cl_intel_simultaneous_sharing
Has AMD Blas = No
Has AMD Fft = No
Preferred vector width char = 16
Preferred vector width short = 8
Preferred vector width int = 4
Preferred vector width long = 1
Preferred vector width float = 1
Preferred vector width double = 1
OpenCV's HW features list:
ID= 1 (MMX) -> ON
ID= 2 (SSE) -> ON
ID= 3 (SSE2) -> ON
ID= 4 (SSE3) -> ON
ID= 5 (SSSE3) -> ON
ID= 6 (SSE4.1) -> ON
ID= 7 (SSE4.2) -> ON
ID= 8 (POPCNT) -> ON
ID= 9 (FP16) -> ON
ID= 10 (AVX) -> ON
ID= 11 (AVX2) -> ON
ID= 12 (FMA3) -> ON
Total available: 12

@berak
Copy link
Contributor

berak commented Nov 17, 2018

@AronRubin , hello to a fellow mingw user ;)
it seems, that the avx2 instructions used in the opencv code don't compile correctly, using mingw64.

i'm still using 7.2.0, you're on 8.2.0, things might have improved in the meantime, but i had to redact the
cmake CPU_DISPATCH option to an empty list, to compile all of the opencv + contrib modules properly.

(also remember , devs gave up on mingw way back after opencv 2.4.6. -- there were far too many different versions of it to sustainably support that, still true now, sad as it is ..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants