Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tsan hangs in fork before exec #1400

Closed
gusev-p opened this issue Apr 26, 2021 · 3 comments
Closed

Tsan hangs in fork before exec #1400

gusev-p opened this issue Apr 26, 2021 · 3 comments

Comments

@gusev-p
Copy link

gusev-p commented Apr 26, 2021

Backtrace at the moment of hang:

Thread 1 (Thread 0x7fea94371440 (LWP 89779)):
#0  __sanitizer::internal_sched_yield () at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_linux.cpp:422
#1  0x00000000014cd1f5 in __sanitizer::StaticSpinMutex::LockSlow (this=0x35eaa30 <__sanitizer::thePersistentAllocator>) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_mutex.h:54
#2  0x00000000014dc7e1 in __sanitizer::StaticSpinMutex::Lock (this=0x35eaa30 <__sanitizer::thePersistentAllocator>) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_mutex.h:31
#3  __sanitizer::GenericScopedLock<__sanitizer::StaticSpinMutex>::GenericScopedLock (this=<optimized out>, mu=0x35eaa30 <__sanitizer::thePersistentAllocator>) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_mutex.h:183
#4  __sanitizer::PersistentAllocator::alloc (this=<optimized out>, size=160) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_persistent_allocator.h:51
#5  __sanitizer::PersistentAlloc (sz=160) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_persistent_allocator.h:66
#6  __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put (this=0x35eaa78 <__sanitizer::theDepot>, args=..., inserted=inserted@entry=0x0) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_stackdepotbase.h:125
#7  0x00000000014dc057 in __sanitizer::StackDepotPut (stack=...) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/sanitizer_common/sanitizer_stackdepot.cpp:98
#8  0x0000000001546235 in __tsan::CurrentStackId (thr=thr@entry=0x7fea94331800, pc=pc@entry=21955725) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/tsan/rtl/tsan_rtl.cpp:564
#9  0x00000000014e8a0a in __tsan::init (thr=0x7fea94331800, thr@entry=0x1, pc=21955725, pc@entry=140645485451264, fd=fd@entry=1, s=0x3ded038 <__tsan::fdctx+8200>, write=false) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/tsan/rtl/tsan_fd.cpp:112
#10 0x00000000014e8ba5 in __tsan::FdDup (thr=thr@entry=0x7fea94331800, pc=pc@entry=21955725, oldfd=oldfd@entry=19, newfd=newfd@entry=1, write=<optimized out>) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/tsan/rtl/tsan_fd.cpp:233
#11 0x00000000014f0574 in __interceptor_dup2 (oldfd=19, newfd=1) at /place/sandbox-data/tasks/3/8/846665983/__FUSE/mount_point_9f140102-bea8-427b-86c7-ebfcbf46138c/contrib/libs/clang11-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1569
#12 0x00000000017f9b59 in TFileHandle::Duplicate2Posix (this=0x7fff40296620, dstHandle=1) at /place/sandbox-data/srcdir/arcadia_cache/util/system/file.cpp:580
#13 0x0000000002d846ce in NUnifiedAgent::TOSProcess::TOSProcess (this=0x7b10000193a0, cmd=..., args=..., env=..., stdoutFilePath=..., stderrFilePath=...) at /place/sandbox-data/srcdir/arcadia_cache/logbroker/unified_agent/common/os_process.cpp:72
#14 0x0000000002d7fee3 in NUnifiedAgent::TAgentProcess::TAgentProcess (this=0x7b1000019380, cmd=..., args=TVector (length=4, capacity=4) = {...}, env=THashMap of length 3 = {...}, statusPort=16664) at /place/sandbox-data/srcdir/arcadia_cache/logbroker/unified_agent/tests/integration/lib/agent_process.cpp:36

Relevant code:

TOSProcess::TOSProcess(...) {
    Pid = syscall(SYS_clone, SIGCHLD, 0);
    if (Pid == 0) {
            if (stdoutFile.Duplicate2Posix(STDOUT_FILENO) != STDOUT_FILENO) {
                _exit(41);
            }
            if (stderrFile.Duplicate2Posix(STDERR_FILENO) != STDERR_FILENO) {
                _exit(42);
            }

            ::execve(execArgs[0], execArgs.data(), execEnv.data());

            _exit(43);
        }
        Y_VERIFY(Pid != -1, "SYS_clone failed, errno [%d]", errno);
}

It seems that we've forked while holding this spin lock https://github.com/llvm-project/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_persistent_allocator.h#L29. This issue looks similar to this problem with asan #774 .

@dvyukov
Copy link
Contributor

dvyukov commented Apr 27, 2021

Hi Petr,

TSan does not support raw clone syscall.
It could, but it would tsan changes and the code still needs to be annotated with sanitizer syscall annotations <sanitizer/linux_syscall_hooks.h>.

A much easier route is to use fork/vfork when building with tsan. It should fix the deadlock.

@gusev-p
Copy link
Author

gusev-p commented Apr 27, 2021

Thank you for the reply!

Clone syscall appeared in this code some time ago as an attempt to workaround hangs under sanitizers. I'll try vfork again and reopen this issue in case of problems.

@gusev-p gusev-p closed this as completed Apr 27, 2021
@dvyukov
Copy link
Contributor

dvyukov commented Apr 27, 2021

TSan intercepts vfork, so it's intended to work, and if something doesn't work, it's at least possible to fix things (which is not possible with clone because tsan is simply not aware of the call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants