Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blender 2.80 segfaults when attempting to initialize NEO #194

Open
akien-mga opened this issue Aug 1, 2019 · 14 comments
Open

Blender 2.80 segfaults when attempting to initialize NEO #194

akien-mga opened this issue Aug 1, 2019 · 14 comments
Labels
bug

Comments

@akien-mga
Copy link

@akien-mga akien-mga commented Aug 1, 2019

Cross-posting for original Blender bug report: https://developer.blender.org/T68052

System Information
Operating system: Linux-5.1.20-desktop-2.mga7-x86_64-with-mageia-7-Official 64 Bits
Graphics card 1 (integrated): Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2) Intel Open Source Technology Center 4.5 (Core Profile) Mesa 19.1.3
Graphics card 2 (discrete): AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0) X.Org 4.5 (Core Profile) Mesa 19.1.3

Blender Version
2.80 (sub 75), branch: master, commit date: 2019-07-29 14:47, hash: rBf6cb5f54494e

Intel Compute Runtime Version

-rw-r--r-- 1 akien akien    86852 Jul 31 15:05 intel-gmmlib-19.2.3-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien   108884 Jul 31 15:05 intel-gmmlib-devel-19.2.3-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien  5754780 Aug  1 10:56 intel-igc-core-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien 60140276 Aug  1 10:58 intel-igc-core-debuginfo-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien    35052 Aug  1 10:57 intel-igc-debuginfo-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien  8262740 Aug  1 10:57 intel-igc-debugsource-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien    64508 Aug  1 10:56 intel-igc-opencl-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien   552148 Aug  1 10:58 intel-igc-opencl-debuginfo-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien    72332 Aug  1 10:56 intel-igc-opencl-devel-1.0.10-2.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien   698812 Aug  1 11:05 intel-opencl-19.28.13502-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien   400436 Jul 31 15:03 intel-opencl-clang-8.0.72-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien     9736 Jul 31 15:03 intel-opencl-clang-devel-8.0.72-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien  8778080 Aug  1 11:05 intel-opencl-debuginfo-19.28.13502-1.mga7.x86_64.rpm
-rw-r--r-- 1 akien akien   679452 Aug  1 11:05 intel-opencl-debugsource-19.28.13502-1.mga7.x86_64.rpm

Custom built on Mageia 7 from @JacekDanecki's Fedora Rawhide SRPMs at https://copr.fedorainfracloud.org/coprs/jdanecki/intel-opencl/ (builds from 2019-07-23)
Mageia is not Fedora-based but also RPM-based, and the source RPMs could be rebuilt without issue.


My HP Spectre x360 laptop has an Intel HD Graphics 630 IGP and an AMD Radeon RX Vega M GL dGPU.
The AMD card uses Mesa's Clover OpenCL platform, which Blender does not seem to support.

I built and installed NEO as outlined above, and confirmed that it works fine for an example OpenCL program.

$ ./clinfo --list
Platform #0: Clover
 `-- Device #0: AMD VEGAM (DRM 3.30.0, 5.1.20-desktop-2.mga7, LLVM 8.0.0)
Platform #1: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Gen9 HD Graphics NEO

In Blender 2.80, when I access Edit > Preferences > System to review what OpenCL platforms were detected, Blender segfaults:

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007fffffffb164 in ?? ()

(gdb) bt
#0  0x00007fffffffb164 in ?? ()
#1  0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#2  0x00007fffb0a84234 in __gthread_once (__func=<optimized out>, __once=0x7fffb1ffefcc <InitializeCheckInstrTypesPassFlag>) at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#3  std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (
    __f=@0x7fffb0a83e70: {void *(llvm::PassRegistry &)} 0x7fffb0a83e70 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, __once=...) at /usr/include/c++/8.3.1/mutex:684
#4  llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (
    F=@0x7fffb0a83e70: {void *(llvm::PassRegistry &)} 0x7fffb0a83e70 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, flag=...) at /usr/include/llvm/Support/Threading.h:102
#5  initializeCheckInstrTypesPass (Registry=...) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:49
#6  IGC::CheckInstrTypes::CheckInstrTypes (this=<optimized out>, instrList=0x7fffffffb164) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:56
#7  0x00007fffb0924be0 in IGC::unify_opt_PreProcess (pContext=pContext@entry=0x7fffffffb0d0) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/ShaderCodeGen.cpp:1051
#8  0x00007fffb082401f in IGC::CommonOCLBasedPasses (pContext=0x7fffffffb0d0, BuiltinGenericModule=std::unique_ptr<llvm::Module> = {...}, BuiltinSizeModule=std::unique_ptr<llvm::Module> = {...})
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/UnifyIROCL.cpp:186
#9  0x00007fffb082530f in IGC::UnifyIRSPIR(IGC::OpenCLProgramContext*, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >) ()
    at /usr/include/c++/8.3.1/ext/new_allocator.h:116
#10 0x00007fffb07f1464 in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7fffffffb800, pOutputArgs=pOutputArgs@entry=0x7fffffffb7d0, 
    inputDataFormatTemp=inputDataFormatTemp@entry=TC::TB_DATA_FORMAT_LLVM_TEXT, IGCPlatform=..., profilingTimerResolution=<optimized out>) at /usr/include/c++/8.3.1/bits/move.h:74
#11 0x00007fffb07f2278 in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7fffffffb800, pOutputArgs=pOutputArgs@entry=0x7fffffffb7d0, inputDataFormatTemp=TC::TB_DATA_FORMAT_LLVM_TEXT, IGCPlatform=..., 
    profilingTimerResolution=<optimized out>) at /usr/include/c++/8.3.1/ext/new_allocator.h:86
#12 0x00007fffb089ffe7 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, specConstantsIds=specConstantsIds@entry=0x0, 
    specConstantsValues=specConstantsValues@entry=0x0, options=<optimized out>, internalOptions=<optimized out>, tracingOptions=<optimized out>, tracingOptionsCount=<optimized out>, gtPinInput=<optimized out>)
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:230
#13 0x00007fffb08a29a3 in IGC::IgcOclTranslationCtx<1ul>::TranslateImpl (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, options=<optimized out>, internalOptions=<optimized out>, 
    tracingOptions=<optimized out>, tracingOptionsCount=0) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/igc_ocl_translation_ctx.h:45
#14 0x00007fffb75631ca in IGC::IgcOclTranslationCtx<1ul>::Translate<IGC::OclTranslationOutput<1ul> > (tracingOptionsCount=0, tracingOptions=0x0, internalOptions=<optimized out>, options=<optimized out>, 
    src=<optimized out>, this=<optimized out>) at /usr/include/igc/ocl_igc_interface/igc_ocl_translation_ctx.h:51
#15 NEO::translate<IGC::IgcOclTranslationCtx<3ul> > (internalOptions=<optimized out>, options=<optimized out>, src=<optimized out>, tCtx=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--c
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.inl:27
#16 NEO::CompilerInterface::getSipKernelBinary (this=0x7fffc6ef8ee0, kernel=<optimized out>, device=..., retBinary=std::vector of length 0, capacity 0) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.cpp:348
#17 0x00007fffb7547f54 in std::call_once<NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&>(std::once_flag&, NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&)::{lambda()#2}::_FUN() () at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/built_ins/built_ins.cpp:101
#18 0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#19 0x00007fffb7548d02 in NEO::BuiltIns::getSipKernel (this=0x7fffc6f21c00, type=<optimized out>, device=...) at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#20 0x00007fffb75b4c2b in NEO::Platform::initialize (this=0x7fffc6fca700) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/platform/platform.cpp:181
#21 0x00007fffb7528d38 in clGetPlatformIDs (numEntries=<optimized out>, platforms=<optimized out>, numPlatforms=<optimized out>) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/api/api.cpp:78
#22 0x00007fffc3a520ba in _find_and_check_platforms (num_icds=2) at ocl_icd_loader.c:451
#23 __initClIcd () at ocl_icd_loader.c:652
#24 _initClIcd_real () at ocl_icd_loader.c:702
#25 0x00007fffc3a542a4 in _initClIcd () at ocl_icd_loader.c:724
#26 clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x7fffffffc71c) at ocl_icd_loader.c:846
#27 0x0000000003220776 in ccl::device_opencl_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) ()
#28 0x00000000031e87c6 in ccl::Device::available_devices(unsigned int) ()
#29 0x0000000001548699 in ?? ()
#30 0x000000000174f969 in _PyMethodDef_RawFastCallKeywords ()
#31 0x000000000174fa25 in _PyCFunction_FastCallKeywords ()
#32 0x000000000114f33f in _PyEval_EvalFrameDefault ()
#33 0x0000000001803c23 in _PyEval_EvalCodeWithName ()
#34 0x000000000174f4a6 in _PyFunction_FastCallKeywords ()
#35 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#36 0x0000000001145530 in ?? ()
#37 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#38 0x0000000001145530 in ?? ()
#39 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#40 0x0000000001145530 in ?? ()
#41 0x000000000174f3e6 in _PyFunction_FastCallDict ()
#42 0x00000000013e69ce in ?? ()
#43 0x00000000014f2c2f in ?? ()
#44 0x0000000002d48623 in ?? ()
#45 0x0000000002d4ab86 in ED_region_panels_layout_ex ()
#46 0x0000000002d4b06d in ED_region_panels_ex ()
#47 0x0000000002d4ebf1 in ED_region_do_draw ()
#48 0x0000000001517513 in wm_draw_update ()
#49 0x0000000001514c30 in WM_main ()
#50 0x00000000010c0abe in main ()

(Packaging side note for @JacekDanecki: you should build your RPMs with RelWithDebInfo and let RPM strip the debug information from the resulting binaries. You would then end up with proper Release binaries and additional -debuginfo and -debugsource RPMs which then provide the relevant debug symbols used to get the above backtrace.)

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

Note that I'm relatively new to both OpenCL and Blender. From what I could find so far it seems Blender only supports OpenCL for AMD on Windows: https://git.blender.org/gitweb/gitweb.cgi/blender.git/blob/7c5838cfd66a656e3ad422ddbe0f23b31dcff1e3:/intern/cycles/device/opencl/opencl_util.cpp#l731

So I assume it would be expected that NEO doesn't work in Blender yet, but I believe the segfault is worth investigating nevertheless (especially since it happens in IGC).

@JacekDanecki

This comment has been minimized.

Copy link
Contributor

@JacekDanecki JacekDanecki commented Aug 1, 2019

I've tried blender 2.80 (few beta versions, last was 2.80-rc3), and each time when I selected "Edit > Preferences > System" blender hung (without segfault) when Neo was enabled.
For blender 2.79, we have another issue reported here

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

I've tried blender 2.80 (few beta versions, last was 2.80-rc3), and each time when I selected "Edit > Preferences > System" blender hung (without segfault) when Neo was enabled.

Indeed, I've seen this behavior too if I uninstall Mesa's Clover which is used by my AMD dGPU. So it seems like Blender will crash if there's Clover + NEO, and only hang if there's NEO alone.

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

Edit: Moved details about this other Blender segfault to a dedicated issue: #195.

Follow-up comments here down to #194 (comment) were focusing on that other issue.

BTW, I tried compiling Blender from source on my distro, and then it no longer crashes nor hangs. It still can't list the Intel device as a valid entry by default though, probably because it's not whitelisted.

With this hack to Blender's code, I can see the device in Edit > Preferences > System:

diff --git a/intern/cycles/device/opencl/opencl.h b/intern/cycles/device/opencl/opencl.h
index 82b961b8de7..04f07f30365 100644
--- a/intern/cycles/device/opencl/opencl.h
+++ b/intern/cycles/device/opencl/opencl.h
@@ -90,7 +90,7 @@ class OpenCLInfo {
   static bool device_version_check(cl_device_id device, string *error = NULL);
   static string get_hardware_id(const string &platform_name, cl_device_id device_id);
   static void get_usable_devices(vector<OpenCLPlatformDevice> *usable_devices,
-                                 bool force_all = false);
+                                 bool force_all = true);
 
   /* ** Some handy shortcuts to low level cl*GetInfo() functions. ** */
 

Screenshot_20190801_133830

Trying to use it for a render triggers another segfault:

I0801 13:36:31.114132 20742 util_task.cpp:329] Creating pool of 8 threads.
I0801 13:36:31.114143 20742 util_task.cpp:241] Detected 8 processors in active group.
I0801 13:36:31.114147 20742 util_task.cpp:251] Not setting thread group affinity.
[New Thread 0x7fffa45fa700 (LWP 20752)]
[New Thread 0x7fffa4dfb700 (LWP 20753)]
[New Thread 0x7fffa55fc700 (LWP 20754)]
[New Thread 0x7fffa5dfd700 (LWP 20755)]
[New Thread 0x7fffb1ba6700 (LWP 20756)]
[New Thread 0x7fffb13a5700 (LWP 20757)]
[New Thread 0x7fffa6dff700 (LWP 20758)]
[New Thread 0x7fffa65fe700 (LWP 20759)]
[New Thread 0x7fffa3df9700 (LWP 20760)]
I0801 13:36:31.115921 20742 opencl_split.cpp:632] Creating new Cycles device for OpenCL platform Intel(R) OpenCL HD Graphics, device Intel(R) Gen9 HD Graphics NEO.
[New Thread 0x7fffa1478700 (LWP 20761)]
I0801 13:36:31.117763 20761 session.cpp:753] Requested features:
Experimental features: Off
Max nodes group: 0
Nodes features: 0
Use Hair: False
Use Object Motion: False
Use Camera Motion: False
Use Baking: False
Use Subsurface: False
Use Volume: False
Use Branched Integrator: False
Use Patch Evaluation: False
Use Transparent Shadows: False
Use Principled BSDF: True
Use Denoising: False
Use Displacement: False
Use Background Light: True
I0801 13:36:31.117825 20761 opencl_split.cpp:761] Loading kernels for platform Intel(R) OpenCL HD Graphics, device Intel(R) Gen9 HD Graphics NEO.
I0801 13:36:31.117941 20761 opencl_util.cpp:297] OpenCL program base not found in cache.
I0801 13:36:31.134737 20761 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_CL_KHR_FP16__ '.
I0801 13:36:31.135056 20761 opencl_util.cpp:297] Loaded program from /home/akien/.cache/cycles/kernels/cycles_kernel_base_46A2F28B047F7077207C6B26AD34C51F_66F686122DEAFAABB0CCA9DBEB37273C.clbin.
I0801 13:36:31.135148 20761 opencl_util.cpp:297] OpenCL program background not found in cache.
I0801 13:36:31.187186 20761 opencl_util.cpp:297] OpenCL program background not found on disk.
I0801 13:36:31.187569 20753 opencl_util.cpp:297] OpenCL program background not found in cache.
Cycles: compiling OpenCL program background...
I0801 13:36:31.250113 20753 opencl_util.cpp:297] Build flags: -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_TRANSPARENT__ -D__NO_SHADOW_TRICKS__ -D__NO_DENOISING__ -D__NO_SHADER_RAYTRACE__
[Detaching after vfork from child process 20763]
AL lib: (WW) GetSymbol: Failed to load jack_error_callback: /lib64/libjack.so.0: undefined symbol: jack_error_callback
AL lib: (WW) jack_msg_handler: Cannot connect to server socket err = No such file or directory
AL lib: (WW) jack_msg_handler: Cannot connect to server request channel
AL lib: (WW) jack_msg_handler: jack server is not running or cannot be started
AL lib: (WW) jack_msg_handler: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
AL lib: (WW) jack_msg_handler: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
AL lib: (WW) ALCjackBackendFactory_init: jack_client_open() failed, 0x11
AL lib: (WW) alc_initconfig: Failed to initialize backend "jack"
sh: line 1: 20764 Segmentation fault      (core dumped) "/home/akien/tmp/blender/build/bin/blender" "--background" "--factory-startup" "--python-expr" "import _cycles; _cycles.opencl_compile(r'1', r'Intel(R) Gen9 HD Graphics NEO', r'Intel(R) OpenCL HD Graphics', r'-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_CL_KHR_FP16__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_TRANSPARENT__ -D__NO_SHADOW_TRICKS__ -D__NO_DENOISING__ -D__NO_SHADER_RAYTRACE__', r'kernel_background.cl', r'/home/akien/.cache/cycles/kernels/cycles_kernel_background_3A063A327B9939998F9E7BFA0349B693_13AC40ED2FE033D81EB1496FB3C61F4D.clbin')" > /dev/null
I0801 13:36:34.637307 20753 opencl_util.cpp:297] Separate-process building of /home/akien/.cache/cycles/kernels/cycles_kernel_background_3A063A327B9939998F9E7BFA0349B693_13AC40ED2FE033D81EB1496FB3C61F4D.clbin failed, will fall back to regular building.
Cycles: compiling OpenCL program background...
I0801 13:36:34.667415 20753 opencl_util.cpp:297] Build flags: -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_TRANSPARENT__ -D__NO_SHADOW_TRICKS__ -D__NO_DENOISING__ -D__NO_SHADER_RAYTRACE__
I0801 13:36:34.667428 20753 opencl_util.cpp:324] Build options passed to clBuildProgram: '-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_CL_KHR_FP16__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_TRANSPARENT__ -D__NO_SHADOW_TRICKS__ -D__NO_DENOISING__ -D__NO_SHADER_RAYTRACE__'.

Thread 60 "blender" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa4dfb700 (LWP 20753)]
0x00007fffbcd034e4 in ?? () from /lib64/libLLVM-8.so

(gdb) bt
#0  0x00007fffbcd034e4 in  () at /lib64/libLLVM-8.so
#1  0x00007fffbcd07a5d in  () at /lib64/libLLVM-8.so
#2  0x00007fffbcd07f5c in llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) () at /lib64/libLLVM-8.so
#3  0x00007fffbc417e28 in llvm::FPPassManager::runOnFunction(llvm::Function&) () at /lib64/libLLVM-8.so
#4  0x00007fffbc417ee3 in llvm::FPPassManager::runOnModule(llvm::Module&) () at /lib64/libLLVM-8.so
#5  0x00007fffbc4173da in llvm::legacy::PassManagerImpl::run(llvm::Module&) () at /lib64/libLLVM-8.so
#6  0x00007fffacc44e15 in IGC::CommonOCLBasedPasses(IGC::OpenCLProgramContext*, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >) (pContext=0x7fffa4df7af0, BuiltinGenericModule=std::unique_ptr<llvm::Module> = {...}, BuiltinSizeModule=std::unique_ptr<llvm::Module> = {...})
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/UnifyIROCL.cpp:464
#7  0x00007fffacc4530f in IGC::UnifyIRSPIR(IGC::OpenCLProgramContext*, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >) ()
    at /usr/include/c++/8.3.1/ext/new_allocator.h:86
#8  0x00007fffacc11464 in TC::TranslateBuild(TC::STB_TranslateInputArgs const*, TC::STB_TranslateOutputArgs*, TC::TB_DATA_FORMAT, IGC::CPlatform const&, float)
    (pInputArgs=pInputArgs@entry=0x7fffa4df8220, pOutputArgs=pOutputArgs@entry=0x7fffa4df81f0, inputDataFormatTemp=inputDataFormatTemp@entry=TC::TB_DATA_FORMAT_SPIR_V, IGCPlatform=..., profilingTimerResolution=<optimized out>) at /usr/include/c++/8.3.1/bits/move.h:74
#9  0x00007fffacc12278 in TC::TranslateBuild(TC::STB_TranslateInputArgs const*, TC::STB_TranslateOutputArgs*, TC::TB_DATA_FORMAT, IGC::CPlatform const&, float)
    (pInputArgs=pInputArgs@entry=0x7fffa4df8220, pOutputArgs=pOutputArgs@entry=0x7fffa4df81f0, inputDataFormatTemp=TC::TB_DATA_FORMAT_SPIR_V, IGCPlatform=..., profilingTimerResolution=<optimized out>)
    at /usr/include/c++/8.3.1/ext/new_allocator.h:86
#10 0x00007fffaccbffe7 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int, void*) const
    (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, specConstantsIds=<optimized out>, specConstantsValues=<optimized out>, options=<optimized out>, internalOptions=<optimized out>, tracingOptions=<optimized out>, tracingOptionsCount=<optimized out>, gtPinInput=<optimized out>) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:230
#11 0x00007fffb38d2c22 in IGC::IgcOclTranslationCtx<3ul>::Translate<IGC::OclTranslationOutput<1ul> >(CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int, void*)
    (gtPinInput=<optimized out>, tracingOptionsCount=0, tracingOptions=0x0, internalOptions=<optimized out>, options=<optimized out>, specConstantsValues=<optimized out>, specConstantsIds=<optimized out>, src=0x7fff9ac10260, this=<optimized out>) at /usr/include/igc/ocl_igc_interface/igc_ocl_translation_ctx.h:103
#12 0x00007fffb38d2c22 in NEO::translate<IGC::IgcOclTranslationCtx<3ul> >(IGC::IgcOclTranslationCtx<3ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buf--Type <RET> for more, q to quit, c to continue without paging--
fer<1ul>*, CIF::Builtins::Buffer<1ul>*, void*)
    (gtpinInit=<optimized out>, internalOptions=<optimized out>, options=<optimized out>, specConstantsValues=<optimized out>, specConstantsIds=<optimized out>, src=0x7fff9ac10260, tCtx=<optimized out>)
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.inl:76
#13 0x00007fffb38d2c22 in NEO::CompilerInterface::build(NEO::Program&, NEO::TranslationArgs const&, bool) (this=this@entry=0x7fffc02a17a0, program=..., inputArgs=..., enableCaching=enableCaching@entry=true)
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.cpp:135
#14 0x00007fffb392689f in NEO::Program::build(unsigned int, _cl_device_id* const*, char const*, void (*)(_cl_program*, void*), void*, bool)
    (this=0x7fff9ace2000, numDevices=<optimized out>, deviceList=<optimized out>, buildOptions=<optimized out>, funcNotify=0x0, userData=0x0, enableCaching=true)
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/program/build.cpp:105
#15 0x00007fffb38a07e5 in clBuildProgram(cl_program, cl_uint, cl_device_id const*, char const*, void (*)(cl_program, void*), void*)
    (program=<optimized out>, numDevices=<optimized out>, deviceList=<optimized out>, options=<optimized out>, funcNotify=<optimized out>, userData=<optimized out>)
    at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/api/api.cpp:1369
#16 0x00007ffff189ae4b in clBuildProgram
    (program=0x7fff9ace2010, num_devices=0, device_list=0x0, options=0x7fff9aab6000 "-cl-no-signed-zeros -cl-mad-enable -D__KERNEL_CL_KHR_FP16__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -"..., pfn_notify=0x0, user_data=0x0) at ocl_icd_loader_gen.c:387
#17 0x000000000158cab1 in ccl::OpenCLDevice::OpenCLProgram::build_kernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) ()
#18 0x000000000158dabf in ccl::OpenCLDevice::OpenCLProgram::compile_kernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) ()
#19 0x000000000158efd1 in ccl::OpenCLDevice::OpenCLProgram::compile() ()
#20 0x00000000020c0603 in ccl::TaskScheduler::thread_run(int) ()
#21 0x00000000020c200e in ccl::thread::run(void*) ()
#22 0x00007ffff61de07f in std::execute_native_thread_routine(void*) (__p=0x7fffaa43c760) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#23 0x00007ffff6cec04c in start_thread () at /lib64/libpthread.so.0
#24 0x00007ffff64edbcf in clone () at /lib64/libc.so.6

(I'd be happy to move this to a dedicated issue if it's better.)

@lwesiers

This comment has been minimized.

Copy link
Contributor

@lwesiers lwesiers commented Aug 1, 2019

Hello there,

I see that you are using debug version of driver & compiler. Could you please setup this flag

export IGC_ShaderDumpEnableAll=1

before launching blender? and send to me (via mail) shader dumps from folder /tmp/IntelIGC/ ?
I will check the shaders on our side.

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

I'm actually using RelWithDebInfo builds, so it's release builds with debug symbols. But I'll do build with Debug type to provide more info.

@lwesiers

This comment has been minimized.

Copy link
Contributor

@lwesiers lwesiers commented Aug 1, 2019

Actually this target has also possibility to throw shader dumps - so you can try with this too.

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

I tried but I didn't see any IntelIGC folder in /tmp nor ~/tmp. I have debug builds going on so I'll be able to test with those soon.

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

I get a linking failure for IGC:

[ 74%] Linking CXX shared library Debug/libigc.so
/usr/bin/ld: Debug/libCompiler.a(PreRAScheduler.cpp.o): in function `IGC::PreRAScheduler::dumpDDGContents()':
/home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/CISACodeGen/PreRAScheduler.cpp:252: undefined reference to `llvm::Value::dump() const'
/usr/bin/ld: Debug/libCompiler.a(PreRAScheduler.cpp.o): in function `IGC::PreRAScheduler::dumpPriorityQueueContents()':
/home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/CISACodeGen/PreRAScheduler.cpp:855: undefined reference to `llvm::Value::dump() const'
/usr/bin/ld: /home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/CISACodeGen/PreRAScheduler.cpp:876: undefined reference to `llvm::Value::dump() const'
/usr/bin/ld: /home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/CISACodeGen/PreRAScheduler.cpp:888: undefined reference to `llvm::Value::dump() const'
/usr/bin/ld: /home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/CISACodeGen/PreRAScheduler.cpp:902: undefined reference to `llvm::Value::dump() const'
/usr/bin/ld: Debug/libCompiler.a(LexicalScopes.cpp.o): in function `IGC::LexicalScope::dump(unsigned int) const':
/home/akien/Projects/mageia/Sandbox/_rpm/BUILD/intel-graphics-compiler-c7dec76146e3a18b9ed9f489d033e65ff224e869/IGC/Compiler/DebugInfo/LexicalScopes.cpp:324: undefined reference to `llvm::Metadata::dump() const'
collect2: error: ld returned 1 exit status

Apparently the dump() methods are only exposed in Debug builds of LLVM, and my distro package is a RelWithDebInfo build too... I can try to rebuild LLVM in debug mode but that will take a while.

@alalek

This comment has been minimized.

Copy link

@alalek alalek commented Aug 1, 2019

/lib64/libLLVM-8.so

Not sure if IGC want to utilize function from this LLVM binary (instead of libopencl-clang.so).
See #122

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

I'll move the issue described in #194 (comment) and follow-ups to a new issue, as otherwise it's going to divert the attention from the original bug described in the OP.

I should have done this from the start, sorry about that.

@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 1, 2019

Back of the original issue, I should outline that the crash (or hang with only IGC) when accessing Edit > Preferences > System is reproducible in official release builds (in my case https://www.blender.org/download/Blender2.80/blender-2.80-linux-glibc217-x86_64.tar.bz2/), but I can't reproduce it on a self-compiled debug build.

I checked if export IGC_ShaderDumpEnableAll=1 would produce any useful information about this crash, but I don't get any dump. I guess no shaders are involved at this stage, only OpenCL init.

I do have a better backtrace now that I installed a debug version of LLVM and IGC though (running through gdb with ./blender --debug-cycles):

I0801 20:22:45.364782 16503 blender_python.cpp:180] Debug flags initialized to:
CPU flags:
  AVX2       : True
  AVX        : True
  SSE4.1     : True
  SSE3       : True
  SSE2       : True
  BVH layout : BVH8
  Split      : False
CUDA flags:
 Adaptive Compile: False
OpenCL flags:
  Device type    : ALL
  Debug          : False
  Memory limit   : 0
I0801 20:22:48.992769 16503 device_opencl.cpp:48] CLEW initialization succeeded.
Missing separate debuginfo for /usr/lib64/gallium-pipe/pipe_radeonsi.so
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/69/ea4637c4c1a241e065be93277ee7b56df5bed6.debug
Or try: urpmi  /usr/lib/debug/.build-id/69/ea4637c4c1a241e065be93277ee7b56df5bed6.debug
Missing separate debuginfo for /usr/lib64/gallium-pipe/pipe_swrast.so
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/34/c7d456fe1a3bb9a88acb8160c66098112b7d99.debug
Or try: urpmi  /usr/lib/debug/.build-id/34/c7d456fe1a3bb9a88acb8160c66098112b7d99.debug
[New Thread 0x7fffbcf8e700 (LWP 16597)]
[New Thread 0x7fffbc64c700 (LWP 16598)]
[New Thread 0x7fffbbe4b700 (LWP 16599)]
[New Thread 0x7fffbb64a700 (LWP 16600)]
[New Thread 0x7fffbae49700 (LWP 16601)]
[New Thread 0x7fffba648700 (LWP 16602)]
[New Thread 0x7fffb9e47700 (LWP 16603)]
[New Thread 0x7fffb9646700 (LWP 16604)]
[New Thread 0x7fffb8e45700 (LWP 16605)]
[New Thread 0x7fffb8644700 (LWP 16606)]
[New Thread 0x7fffb7e43700 (LWP 16607)]
[New Thread 0x7fffb7642700 (LWP 16608)]
[New Thread 0x7fffb6a41700 (LWP 16609)]
[New Thread 0x7fffb6240700 (LWP 16610)]
[New Thread 0x7fffb5a3f700 (LWP 16611)]
[New Thread 0x7fffb523e700 (LWP 16612)]
[New Thread 0x7fffb4a3d700 (LWP 16613)]
[New Thread 0x7fffb423c700 (LWP 16614)]
[New Thread 0x7fffb3a3b700 (LWP 16615)]
[New Thread 0x7fffb323a700 (LWP 16616)]
[Thread 0x7fffb323a700 (LWP 16616) exited]
[Thread 0x7fffb3a3b700 (LWP 16615) exited]
[Thread 0x7fffb423c700 (LWP 16614) exited]
[Thread 0x7fffb4a3d700 (LWP 16613) exited]
[Thread 0x7fffb523e700 (LWP 16612) exited]
[Thread 0x7fffb5a3f700 (LWP 16611) exited]
[Thread 0x7fffb6240700 (LWP 16610) exited]
[Thread 0x7fffb6a41700 (LWP 16609) exited]
[New Thread 0x7fffb323a700 (LWP 16617)]
[New Thread 0x7fffb3a3b700 (LWP 16618)]

Thread 1 "blender" received signal SIGSEGV, Segmentation fault.
0x00007ffff7f80980 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007ffff7f80980 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00007fffbfa7fac1 in llvm::sys::RWMutexImpl::writer_acquire (this=this@entry=0x7fffca4c0640) at ../lib/Support/RWMutex.cpp:104
#2  0x00007fffbfbcd558 in llvm::sys::SmartRWMutex<true>::lock (this=0x7fffca4c0640, this@entry=0x7fffb1f61650) at ../lib/IR/PassRegistry.cpp:62
#3  llvm::sys::SmartScopedWriter<true>::SmartScopedWriter (m=..., this=<synthetic pointer>) at ../include/llvm/Support/RWMutex.h:166
#4  llvm::PassRegistry::registerPass (this=this@entry=0x7fffca4c0640, PI=..., ShouldFree=ShouldFree@entry=true) at ../lib/IR/PassRegistry.cpp:59
#5  0x00007fffc0af25eb in initializeLoopInfoWrapperPassPassOnce (Registry=...) at ../lib/Analysis/LoopInfo.cpp:772
#6  0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#7  0x00007fffaed691e8 in __gthread_once (__once=0x7fffb1174960 <InitializeCheckInstrTypesPassFlag>, __func=0x7ffff1496300 <std::__once_proxy()>)
    at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#8  0x00007fffaed75d0a in std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (__once=..., __f=
    @0x7fffaf256ae2: {void *(llvm::PassRegistry &)} 0x7fffaf256ae2 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, __args#0=...) at /usr/include/c++/8.3.1/mutex:684
#9  0x00007fffaed70e18 in llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (flag=..., 
    F=@0x7fffaf256ae2: {void *(llvm::PassRegistry &)} 0x7fffaf256ae2 <initializeCheckInstrTypesPassOnce(llvm::PassRegistry&)>, ArgList#0=...) at /usr/include/llvm/Support/Threading.h:102
#10 0x00007fffaf256be8 in initializeCheckInstrTypesPass (Registry=...) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:49
#11 0x00007fffaf256c43 in IGC::CheckInstrTypes::CheckInstrTypes (this=0x7fffb24646e0, instrList=0x7fffffffaf24) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/CheckInstrTypes.cpp:56
#12 0x00007fffaf020b35 in IGC::unify_opt_PreProcess (pContext=0x7fffffffae90) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/Compiler/CISACodeGen/ShaderCodeGen.cpp:1051
#13 0x00007fffaee35e3e in IGC::CommonOCLBasedPasses (pContext=0x7fffffffae90, BuiltinGenericModule=std::unique_ptr<llvm::Module> = {...}, BuiltinSizeModule=std::unique_ptr<llvm::Module> = {...})
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/UnifyIROCL.cpp:186
#14 0x00007fffaee370ed in IGC::UnifyIRSPIR (pContext=0x7fffffffae90, BuiltinGenericModule=std::unique_ptr<llvm::Module> = {...}, BuiltinSizeModule=std::unique_ptr<llvm::Module> = {...})
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/UnifyIROCL.cpp:494
#15 0x00007fffaed97ff2 in TC::TranslateBuild (pInputArgs=0x7fffffffc090, pOutputArgs=0x7fffffffbc80, inputDataFormatTemp=TC::TB_DATA_FORMAT_LLVM_TEXT, IGCPlatform=..., profilingTimerResolution=83)
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/dllInterfaceCompute.cpp:983
#16 0x00007fffaef1ffd3 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate (this=0x7fffc5e71250, outVersion=1, src=0x7fffc5e7f680, specConstantsIds=0x0, specConstantsValues=0x0, options=0x7fffc5e7f6a0, 
    internalOptions=0x7fffc5e7f6c0, tracingOptions=0x0, tracingOptionsCount=0, gtPinInput=0x0)
    at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:230
#17 0x00007fffaef20284 in IGC::IgcOclTranslationCtx<1ul>::TranslateImpl (this=0x7fffc5e7f760, outVersion=1, src=0x7fffc5e7f680, options=0x7fffc5e7f6a0, internalOptions=0x7fffc5e7f6c0, tracingOptions=0x0, 
--Type <RET> for more, q to quit, c to continue without paging--c
    tracingOptionsCount=0) at /usr/src/debug/intel-igc-1.0.10-2.mga7.x86_64/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.cpp:41
#18 0x00007fffb678f1ca in IGC::IgcOclTranslationCtx<1ul>::Translate<IGC::OclTranslationOutput<1ul> > (tracingOptionsCount=0, tracingOptions=0x0, internalOptions=<optimized out>, options=<optimized out>, src=<optimized out>, this=<optimized out>) at /usr/include/igc/ocl_igc_interface/igc_ocl_translation_ctx.h:51
#19 NEO::translate<IGC::IgcOclTranslationCtx<3ul> > (internalOptions=<optimized out>, options=<optimized out>, src=<optimized out>, tCtx=<optimized out>) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.inl:27
#20 NEO::CompilerInterface::getSipKernelBinary (this=0x7fffc7f0ae20, kernel=<optimized out>, device=..., retBinary=std::vector of length 0, capacity 0) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/compiler_interface/compiler_interface.cpp:348
#21 0x00007fffb6773f54 in std::call_once<NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&>(std::once_flag&, NEO::BuiltIns::getSipKernel(NEO::SipKernelType, NEO::Device&)::{lambda()#1}&)::{lambda()#2}::_FUN() () at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/built_ins/built_ins.cpp:101
#22 0x00007ffff7f836d7 in __pthread_once_slow () from /lib64/libpthread.so.0
#23 0x00007fffb6774d02 in NEO::BuiltIns::getSipKernel (this=0x7fffc7fe5d80, type=<optimized out>, device=...) at /usr/include/c++/8.3.1/x86_64-mageia-linux-gnu/bits/gthr-default.h:699
#24 0x00007fffb67e0c2b in NEO::Platform::initialize (this=0x7fffd3acbb00) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/platform/platform.cpp:181
#25 0x00007fffb6754d38 in clGetPlatformIDs (numEntries=<optimized out>, platforms=<optimized out>, numPlatforms=<optimized out>) at /usr/src/debug/intel-opencl-19.28.13502-1.mga7.x86_64/runtime/api/api.cpp:78
#26 0x00007fffc4a800ba in _find_and_check_platforms (num_icds=2) at ocl_icd_loader.c:451
#27 __initClIcd () at ocl_icd_loader.c:652
#28 _initClIcd_real () at ocl_icd_loader.c:702
#29 0x00007fffc4a822a4 in _initClIcd () at ocl_icd_loader.c:724
#30 clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x7fffffffc71c) at ocl_icd_loader.c:846
#31 0x0000000003220776 in ccl::device_opencl_info(ccl::vector<ccl::DeviceInfo, ccl::GuardedAllocator<ccl::DeviceInfo> >&) ()
#32 0x00000000031e87c6 in ccl::Device::available_devices(unsigned int) ()
#33 0x0000000001548699 in ?? ()
#34 0x000000000174f969 in _PyMethodDef_RawFastCallKeywords ()
#35 0x000000000174fa25 in _PyCFunction_FastCallKeywords ()
#36 0x000000000114f33f in _PyEval_EvalFrameDefault ()
#37 0x0000000001803c23 in _PyEval_EvalCodeWithName ()
#38 0x000000000174f4a6 in _PyFunction_FastCallKeywords ()
#39 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#40 0x0000000001145530 in ?? ()
#41 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#42 0x0000000001145530 in ?? ()
#43 0x000000000114dfa3 in _PyEval_EvalFrameDefault ()
#44 0x0000000001145530 in ?? ()
#45 0x000000000174f3e6 in _PyFunction_FastCallDict ()
#46 0x00000000013e69ce in ?? ()
#47 0x00000000014f2c2f in ?? ()
#48 0x0000000002d48623 in ?? ()
#49 0x0000000002d4ab86 in ED_region_panels_layout_ex ()
#50 0x0000000002d4b06d in ED_region_panels_ex ()
#51 0x0000000002d4ebf1 in ED_region_do_draw ()
#52 0x0000000001517513 in wm_draw_update ()
#53 0x0000000001514c30 in WM_main ()
#54 0x00000000010c0abe in main ()
@akien-mga

This comment has been minimized.

Copy link
Author

@akien-mga akien-mga commented Aug 2, 2019

See comment from Brecht Van Lommel at https://developer.blender.org/T68052#742627:

We've had problems in the past where there was a conflict between LLVM symbols, with Blender using a different LLVM version than the driver.

We try to hide these symbols on the Blender side (using source/creator/blender.map), but may have missed some.

If your build did not include OSL / LLVM, that could explain why you couldn't reproduce it.

The latter is correct and explains why I couldn't reproduce it in a self-compiled build though after rebuilding with WITH_LLVM=ON and WITH_CYCLES_OSL=ON, I still don't reproduce the crash. I suspect that official Blender binaries link LLVM statically, hence the possible symbols mismatch that I can't reproduce with my dynamically linked Blender (which uses the same LLVM as IGC).

The conflicts between LLVM symbols are a likely scenario, the ongoing debugging in #195 shows that there might be packaging issues that might make them worse. I'll have to try the official COPR builds on Fedora Rawhide to verify that it's not just my custom builds that introduced issues.

@JacekDanecki

This comment has been minimized.

Copy link
Contributor

@JacekDanecki JacekDanecki commented Aug 5, 2019

@akien-mga Hang during Neo initialization was related to std::call_once function which worked incorrectly, and getCompilerInterface function waited on mutex already taken.
When I replaced std::call_once with test code, I've observed segfault in IGC, so I've recompiled Blender with patch to enable Neo you've provided. With recompiled Blender I was not able to reproduce hang, but I've reproduced segfault in IGC.
When I recompiled IGC, opencl-clang, spirv-llvm-translator with llvm/clang 8 sources in Debug mode, I've reproduced assert mentioned in issue 195

Cycles: compiling OpenCL program background...
[Detaching after fork from child process 23736]blender: /media/data/sources/open-source/neo/build-igc/llvm/src/projects/llvm-spirv/lib/SPIRV/SPIRVWriter.cpp:1174: SPIRV::SPIRVValue* SPIRV::
LLVMToSPIRV::transIntrinsicInst(llvm::IntrinsicInst*, SPIRV::SPIRVBasicBlock*): Assertion `cast<MemCpyInst>(II)->getSourceAlignment() == cast<MemCpyInst>(II)->getDestAlignment() && "Alignme
nt mismatch!"' failed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.