New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm_shutdown called by lld::exitLld(1) is racy #66974
Comments
@llvm/issue-subscribers-lld-elf
`lld::fatal` calls `exitLld(1)`. `lld::error` when the error number reaches the limit calls `exitLld(1)` as well.
```
exitLld
llvm_shutdown();
llvm::sys::Process::Exit(val, /*NoCleanup=*/true);
```
If a worker thread created by I have measured flakiness for ELF/invalid-eh-frame5.s and ELF/invalid-eh-frame.s. @nga888 |
@MaskRay, sorry I am aware of this issue but I'm very busy with other work and do not have time to investigate in depth right now. TBH, the way LLD can exit via a worker thread has always been a concern but the patch you reference above that fixed issues that could cause crashing and deadlock on Windows was a far more prevalent issue at that time. Also the "exit" handling has become a bit more complex since that patch was committed. The scenario you describe does sound strange because of the mismatch of the exit code which would also imply that the error state is not consistent between the threads. In any case, avoiding a race to "exit" would definitely be preferable. |
If we (or whoever takes on this flakiness issue) could find another way to sync the worker threads to prevent multiple attempts to exit concurrently, I'd be in favor of that. I worry that There's still an open question as to why the main thread tries to exit with a 0 exit status, we don't have an answer to that question. |
Hi @MaskRay, I've at last found some time to have a look at this issue. So far, I have been unable to reproduce the issue with these tests on Ubuntu 22.04.3 with LLVM built from a26aa79, using both LLVM 17.0.6 and GCC 12.3.0. Do you have details of the build configuration that you were using when you saw these issues? Looking at the current code, I don't think there should be a "race to exit" for these particular tests, although the possibility of a "race to exit" in LLD certainly does exist... |
Sorry for my slow response. I have replicated
I cannot find any either... |
lld::fatal
callsexitLld(1)
.lld::error
when the error number reaches the limit callsexitLld(1)
as well.If a worker thread created by
llvm/lib/Support/Parallel.cpp
callsexitLld(1)
,llvm_shutdown
will destroythe ManagedStatic instance in Executor::getDefaultExecutor.
If the main thread is blocked on
L.sync()
inTaskGroup::~TaskGroup
, it is possible that the main thread finishes first and somehow exits with code 0 (not exactly clear how this happened given thaterrorCount()
is non-zero), before the work thread calls_Exit(1)
.When this happens, the exit code of LLD will be 0 instead of the expected 1 (therefore
RUN: not ld.lld --eh-frame-hdr %t -o /dev/null 2>&1
fails)To trigger test failures with the race condition, we need to ensure that a worker thread calls exit/fatal and the concurrency is at least 2. Not many tests satisfy the condition. I have measured flakiness for ELF/invalid-eh-frame5.s and ELF/invalid-eh-frame.s.
Sometimes the tests may exhibit 6 failures in 1000 runs, sometimes only 5 in 5000 runs.
If I commented out
llvm_shutdown
or add an_Exit(val)
abovellvm_shutdown
, the flakiness will go away.@nga888 https://reviews.llvm.org/D70447
The text was updated successfully, but these errors were encountered: