Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHECK failed: tsan_interceptors_posix.cpp:1987 "((thr->slot)) != (0)" / segfault running Godot 4 with TSAN #1647

Closed
rcorre opened this issue Apr 30, 2023 · 8 comments

Comments

@rcorre
Copy link

rcorre commented Apr 30, 2023

I'm compiling and running Godot 4.0.2 like so:

scons compiledb=yes debug_symbols=yes linker=mold  use_llvm=yes use_static_cpp=no target=editor use_tsan=yes
bin/godot.linuxbsd.editor.x86_64.llvm.san example/project.godot

TSAN fails a check and segfaults:

ThreadSanitizer: CHECK failed: tsan_interceptors_posix.cpp:1987 "((thr->slot)) != (0)" (0x0, 0x0) (tid=17590)

Thread 17 "godot.linuxbsd." received signal SIGSEGV, Segmentation fault.
0x000055555822b281 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate(__sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long, unsigned long) ()
(gdb) bt
#0  0x000055555822b281 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate(__sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long, unsigned long) ()
#1  0x0000555558228dce in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long) ()
#2  0x00005555582e579f in __tsan::PrintCurrentStackSlow(unsigned long) ()
#3  0x00005555582ca742 in __tsan::CheckUnwind() ()
#4  0x000055555823f0c1 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ()
#5  0x000055555826982e in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
#6  0x00005555582699a8 in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
#7  <signal handler called>
#8  0x00005555582c5b44 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
#9  0x00005555582c5f9e in __tsan::user_calloc(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) ()
#10 0x0000555558278fd2 in calloc ()
#11 0x00007fffe57103f1 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#12 0x00007fffe57105e2 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#13 0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#14 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#15 0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
$ clang --version
clang version 15.0.7
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I saw another ticket about a different CHECK FAILED (#950), so I figured this could be a bug in TSAN. If this indicates a bug in Godot instead, my bad, feel free to close!

@dvyukov
Copy link
Contributor

dvyukov commented May 2, 2023

It looks like the problem is here:

#13 0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#14 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444

A new thread calls into libnvidia-glcore.so which calls calloc before tsan initialization for the thread.
The assumption is that tsan thread function wrapper will run before any "user" code.

What do you have at pthread_create.c:444? How/why is it calling into libnvidia-glcore.so?

@rcorre
Copy link
Author

rcorre commented May 2, 2023

What do you have at pthread_create.c:444

I'm on glibc, so pthread_create.c:444 is ret = pd->start_routine (pd->arg);: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_create.c;h=a3619da1e216190bb4679936e105d418f683222a;hb=e6a252758cbadb13654e66e1f2445ef6f8a4dea0#l444

How/why is it calling into libnvidia-glcore.so?

I'm unsure so far, I'll keep trying to track down what spawns that thread.

It looks like there are a few threads like that:

Thread 19 (Thread 0x7fffde2d06c0 (LWP 44732) "[vkcf] Analysis"):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffe6d53c68) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffe6d53c68, private=0) at lowlevellock.c:49
#2  0x00007ffff7aa1efa in lll_mutex_lock_optimized (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:128
#4  0x00007fffe570ed63 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#5  0x00007fffe7139e31 in ?? () from /usr/lib/libGLX_nvidia.so.0
#6  0x00007fffe5710716 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#7  0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#8  0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 18 (Thread 0x7fffdead16c0 (LWP 44731) "[vkrt] Analysis"):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffe6d53c68) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffe6d53c68, private=0) at lowlevellock.c:49
#2  0x00007ffff7aa1efa in lll_mutex_lock_optimized (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffe6d53c68) at pthread_mutex_lock.c:128
#4  0x00007fffe570ed63 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#5  0x00007fffe7139e31 in ?? () from /usr/lib/libGLX_nvidia.so.0
#6  0x00007fffe5710716 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#7  0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#8  0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 17 (Thread 0x7fffdf2d26c0 (LWP 44730) "godot.linuxbsd."):
#0  0x0000555555a641c1 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate(__sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long, unsigned long) ()
#1  0x0000555555a61d0e in __sanitizer::InternalAlloc(unsigned long, __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*, unsigned long) ()
#2  0x0000555555b1e66f in __tsan::PrintCurrentStackSlow(unsigned long) ()
#3  0x0000555555b03612 in __tsan::CheckUnwind() ()
#4  0x0000555555a77ff1 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ()
#5  0x0000555555aa271e in __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) ()
#6  0x0000555555aa2898 in sighandler(int, __sanitizer::__sanitizer_siginfo*, void*) ()
#7  <signal handler called>
#8  0x0000555555afea14 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
#9  0x0000555555afee6e in __tsan::user_calloc(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) ()
#10 0x0000555555ab1ec2 in calloc ()
#11 0x00007fffe57103f1 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#12 0x00007fffe57105e2 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#13 0x00007fffe5711624 in ?? () from /usr/lib/libnvidia-glcore.so.530.41.03
#14 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#15 0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The rest all seem to have TSAN injected before any other code:

#19 0x0000555555a9f823 in __tsan_thread_start_func ()
#20 0x00007ffff7a9ebb5 in start_thread (arg=<optimized out>) at pthread_create.c:444
#21 0x00007ffff7b20d90 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

This is where the main thread is:

#20 0x00007fffeabacc06 in terminator_CreateDevice (physicalDevice=<optimized out>, pCreateInfo=<optimized out>, pAllocator=<optimized out>, pDevice=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:6079
#21 0x00007fffeaba917b in loader_create_device_chain (pd=pd@entry=0x7b0800079940, pCreateInfo=pCreateInfo@entry=0x7fffffffbdf8, pAllocator=pAllocator@entry=0x0, inst=inst@entry=0x7b840000b400, dev=dev@entry=0x7b8c00013400, callingLayer=callingLayer@entry=0x0, layerNextGDPA=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:5227
#22 0x00007fffeabaaae6 in loader_layer_create_device (instance=<optimized out>, physicalDevice=<optimized out>, pCreateInfo=<optimized out>, pAllocator=<optimized out>, pDevice=<optimized out>, layerGIPA=<optimized out>, nextGDPA=<optimized out>) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/loader.c:4602
#23 0x00007fffeabc2ccb in vkCreateDevice (physicalDevice=0x7b0800079880, pCreateInfo=pCreateInfo@entry=0x7fffffffbdf8, pAllocator=pAllocator@entry=0x0, pDevice=pDevice@entry=0x7b740000a370) at /usr/src/debug/vulkan-icd-loader/Vulkan-Loader-1.3.245/loader/trampoline.c:858
#24 0x0000555557c4067e in VulkanContext::_create_device (this=this@entry=0x7b740000a010) at drivers/vulkan/vulkan_context.cpp:1475
#25 0x0000555557c40929 in VulkanContext::_initialize_queues (this=this@entry=0x7b740000a010, p_surface=p_surface@entry=0x7b14000f9920) at drivers/vulkan/vulkan_context.cpp:1528
#26 0x0000555557c418b4 in VulkanContext::_window_create (this=0x7b740000a010, p_window_id=0, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_surface=0x7b14000f9920, p_width=1152, p_height=648) at drivers/vulkan/vulkan_context.cpp:1688
#27 0x0000555555ba5923 in VulkanContextX11::window_create (this=0x7b740000a010, p_window_id=0, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_window=16777218, p_display=<optimized out>, p_width=1152, p_height=648) at platform/linuxbsd/x11/vulkan_context_x11.cpp:56
#28 0x0000555555b52033 in DisplayServerX11::_create_window (this=<optimized out>, this@entry=0x7b7000000010, p_mode=<optimized out>, p_mode@entry=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=<optimized out>, p_vsync_mode@entry=DisplayServer::VSYNC_ENABLED, p_flags=<optimized out>, p_flags@entry=0, p_rect=...) at platform/linuxbsd/x11/display_server_x11.cpp:5144
#29 0x0000555555b6b5bc in DisplayServerX11::DisplayServerX11 (this=0x7b7000000010, this@entry=0x7fffffffcd10, p_rendering_driver=..., p_mode=p_mode@entry=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=p_vsync_mode@entry=DisplayServer::VSYNC_ENABLED, p_flags=p_flags@entry=0, p_position=p_position@entry=0x0, p_resolution=..., p_screen=<optimized out>, r_error=@0x7fffffffcd10: OK) at platform/linuxbsd/x11/display_server_x11.cpp:5551
#30 0x0000555555b69922 in DisplayServerX11::create_func (p_rendering_driver=..., p_mode=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_flags=0, p_position=0x0, p_resolution=..., p_screen=<optimized out>, r_error=<optimized out>) at platform/linuxbsd/x11/display_server_x11.cpp:4845
#31 0x000055555b18c4f5 in DisplayServer::create (p_index=<optimized out>, p_rendering_driver=..., p_mode=DisplayServer::WINDOW_MODE_MAXIMIZED, p_vsync_mode=DisplayServer::VSYNC_ENABLED, p_flags=0, p_position=0x0, p_resolution=..., p_screen=<optimized out>, r_error=<optimized out>) at servers/display_server.cpp:904
#32 0x0000555555bdd069 in Main::setup2 (p_main_tid_override=p_main_tid_override@entry=0) at main/main.cpp:2001
#33 0x0000555555bd5b61 in Main::setup (execpath=0x7fffffffde1d "/home/rcorre/src/godot/godot/bin/godot.linuxbsd.editor.x86_64.llvm.san", argc=<optimized out>, argv=<optimized out>, p_second_phase=true) at main/main.cpp:1879
#34 0x0000555555b26bf0 in main (argc=<optimized out>, argv=<optimized out>) at platform/linuxbsd/godot_linuxbsd.cpp:61

@dvyukov
Copy link
Contributor

dvyukov commented May 2, 2023

I'm on glibc, so pthread_create.c:444 is ret = pd->start_routine (pd->arg);

Oh, this is thread start routine. So, yes, libnvidia-glcore.so somehow escapes tsan pthread_create interceptor that should initialize the thread.

Perhaps it uses raw clone syscall or something to create it.
However, if they do this, the thread also won't have TLS initialized by glibc...
I can't find source for it.

Debian says it's "non-free" package:
https://packages.debian.org/sid/libnvidia-glcore
Does it mean there are no sources available?

@rcorre
Copy link
Author

rcorre commented May 2, 2023

Does it mean there are no sources available?

Yes, unfortunately these are the proprietary NVidia drivers. I'll try to repro with the free Noveau drivers instead.

@rcorre
Copy link
Author

rcorre commented May 3, 2023

Turns out noveau doesn't support my GPU, so I can only use the proprietary drivers :(

@Calinou
Copy link

Calinou commented May 11, 2023

Maybe look into using lavapipe or SwiftShader, which won't use the GPU at all to render Godot. The thread synchronization issue should still occur even if software Vulkan emulation is used.

@rcorre
Copy link
Author

rcorre commented May 23, 2023

Good idea @Calinou, lavapipe worked. @dvyukov, I'll leave it to you whether you want to close this. Thanks for all the help!

@dvyukov
Copy link
Contributor

dvyukov commented May 23, 2023

I don't think it's actionable on sanitizer side and the history will be kept, so closing for now.

@dvyukov dvyukov closed this as completed May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants