Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-rt] Also consider SIGPROF as a synchronous signal #85188

Merged

Conversation

serge-sans-paille
Copy link
Collaborator

Blocking that signal causes inter-blocking for profilers that monitor threads through that signal.
Fix #83844 and #83561

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 14, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (serge-sans-paille)

Changes

Blocking that signal causes inter-blocking for profilers that monitor threads through that signal.
Fix #83844 and #83561


Full diff: https://github.com/llvm/llvm-project/pull/85188.diff

1 Files Affected:

  • (modified) compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (+2-1)
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index 8ffc703b05eace..80dda58aacbf2d 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -2168,7 +2168,8 @@ static bool is_sync_signal(ThreadSignalContext *sctx, int sig,
     return false;
 #endif
   return sig == SIGSEGV || sig == SIGBUS || sig == SIGILL || sig == SIGTRAP ||
-         sig == SIGABRT || sig == SIGFPE || sig == SIGPIPE || sig == SIGSYS;
+         sig == SIGABRT || sig == SIGFPE || sig == SIGPIPE || sig == SIGSYS ||
+         sig == SIGPROF;
 }
 
 void sighandler(int sig, __sanitizer_siginfo *info, void *ctx) {

Blocking that signal causes inter-blocking for profilers that monitor
threads through that signal.
Fix llvm#83844 and llvm#83561
@canova
Copy link
Contributor

canova commented Mar 14, 2024

Awesome, this indeed does fix our problem with the Firefox Profiler. Thanks!

@serge-sans-paille serge-sans-paille merged commit 6f3f659 into llvm:main Mar 14, 2024
3 of 4 checks passed
@serge-sans-paille
Copy link
Collaborator Author

thanks @melver for the fast review \o

@cHackz18
Copy link

Hello there! I believe this change may have caused the following build bot to start failing, are you able to take a look?

https://lab.llvm.org/buildbot/#/builders/247/builds/15279

@PiJoules
Copy link
Contributor

Hi. I think this change is causing some tsan tests to fail on our builders (https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8753487020222109921/overview):

FAIL: ThreadSanitizer-x86_64 :: signal_thread.cpp (4648 of 15687)
******************** TEST 'ThreadSanitizer-x86_64 :: signal_thread.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /b/s/w/ir/x/w/llvm_build/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall    -msse4.2   -gline-tables-only -I/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp -o /b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/tsan/X86_64Config/Output/signal_thread.cpp.tmp &&  /b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/tsan/X86_64Config/Output/signal_thread.cpp.tmp 2>&1 | FileCheck /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp
+ /b/s/w/ir/x/w/llvm_build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -msse4.2 -gline-tables-only -I/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp -o /b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/tsan/X86_64Config/Output/signal_thread.cpp.tmp
+ /b/s/w/ir/x/w/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/tsan/X86_64Config/Output/signal_thread.cpp.tmp
+ FileCheck /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp
/b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp:52:11: error: CHECK: expected string not found in input
// CHECK: DONE
          ^
<stdin>:1:1: note: scanning from here
ThreadSanitizer: CHECK failed: tsan_interceptors_posix.cpp:2075 "((thr->slot)) != (0)" (0x0, 0x0) (tid=2104100)
^

Input file: <stdin>
Check file: /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/tsan/signal_thread.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: ThreadSanitizer: CHECK failed: tsan_interceptors_posix.cpp:2075 "((thr->slot)) != (0)" (0x0, 0x0) (tid=2104100) 
check:52     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
>>>>>>

--

Could you take a look and send out a fix or revert? Thanks.

dyung added a commit to dyung/llvm-project that referenced this pull request Mar 14, 2024
dyung added a commit that referenced this pull request Mar 14, 2024
@serge-sans-paille
Copy link
Collaborator Author

Go tit, some tests where using SIGPROF as a way to test some signals, switched to another signal and that works correctly

vitalybuka added a commit that referenced this pull request Mar 15, 2024
@vitalybuka
Copy link
Collaborator

Something special about powerpc, can you please take a look?
https://lab.llvm.org/buildbot/#/builders/18/builds/15903
https://lab.llvm.org/buildbot/#/builders/19/builds/25447

@serge-sans-paille
Copy link
Collaborator Author

hell, I'll have a look on monday

@vitalybuka
Copy link
Collaborator

@serge-sans-paille
Copy link
Collaborator Author

yeah, reproducing the issue on a power machine is taking me some time.

@melver
Copy link
Contributor

melver commented Mar 19, 2024

yeah, reproducing the issue on a power machine is taking me some time.

You may revert the change until then, and then recommit with the fix. This is our usual practice if it takes too long to come up with the fix.

@serge-sans-paille
Copy link
Collaborator Author

yeah, reproducing the issue on a power machine is taking me some time.

You may revert the change until then, and then recommit with the fix. This is our usual practice if it takes too long to come up with the fix.

ok. I have a fix that works locally, but we're never sure. I'll sumit it and if it doesn't fixes the test case on our buildbot I'll revert.

serge-sans-paille added a commit that referenced this pull request Mar 19, 2024
serge-sans-paille added a commit that referenced this pull request Mar 19, 2024
@serge-sans-paille
Copy link
Collaborator Author

It seems to be back to normal \o/

@vitalybuka
Copy link
Collaborator

It seems to be back to normal \o/

Thank you!

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Mar 21, 2024
…t builds r=sergesanspaille,glandium

Due to SIGPROF being async, it was hanging on some cases because some functions
were incorrectly marked non-blocking. This patch is merge to LLVM in:
llvm/llvm-project#85188 But we want to patch our clang
here to start benefiting from that quickly.

We are also patching our rustc here. Even though they are not used by default
during our normal builds, this custom rustc is needed for building and
running TSan already:
https://firefox-source-docs.mozilla.org/tools/sanitizer/tsan.html#llvm-clang-rust

Differential Revision: https://phabricator.services.mozilla.com/D204631
@alexfh
Copy link
Contributor

alexfh commented Mar 22, 2024

The recommit (ddcbab3) causes the compiler-rt/test/tsan/signal_sync2.cpp test to be flaky on x86-64 linux (~25 failures in 1000 runs).

@vitalybuka
Copy link
Collaborator

vitalybuka commented Mar 22, 2024

I'll fix it.

vitalybuka added a commit that referenced this pull request Mar 22, 2024
Otherwise it may crash too early.

This is followup to #85188
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
@vitalybuka
Copy link
Collaborator

@dvyukov @alexfh After running a random internal tests TSAN, not just compiler-rt tests, is still very broken after this patch.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 23, 2024

This looks wrong. SIGPROF is generally not synchronous. Handling it synchronously will cause memory corruptions.
The right way to deal with such deadlocks is to fix why the thread is not delivering it later. Usually a thread is blocked in some syscall that is not marked as BLOCK_REAL in tsan runtime.

@vitalybuka
Copy link
Collaborator

@dvyukov Thanks for confirming that. I will revert this and related patches.

@canova
Copy link
Contributor

canova commented Mar 24, 2024

@dvyukov

This looks wrong. SIGPROF is generally not synchronous. Handling it synchronously will cause memory corruptions. The right way to deal with such deadlocks is to fix why the thread is not delivering it later. Usually a thread is blocked in some syscall that is not marked as BLOCK_REAL in tsan runtime.

Marking functions as blocking with BLOCK_REAL is the exact thing I wanted to do initially (in #83844 and #83561). And I actually sent a PR (#84162) to fix one of the issues I encountered, but it got no attention/feedback so far. I also asked a question on #83844 to learn on how to intercept syscalls (as oppose to intercepting glibc calls, we need to intercept FUTEX syscalls for this).

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 25, 2024

I am sorry for your experience.
@vitalybuka do you know if there are any docs on how to find appropriate reviewers for PRs? OR what's the process?
In syzkaller we use CODEOWNERS thing to auto-assign people to PRs based on path wildcards.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 25, 2024

For raw syscalls sanitizers provide these hooks, but they need to be inserted manually into code.

Potentially tsan can be made to delay signals only for the duration of a single runtime callback, but that's a large endeavor and it's not completely clear how to do it.

@canova
Copy link
Contributor

canova commented Mar 25, 2024

@dvyukov Thanks for the answer!

For raw syscalls sanitizers provide these hooks, but they need to be inserted manually into code.

Oh, do you mean that they need to be inserted by the program that's calling the syscall (Rust stdlib internals in our case) or be inserted somewhere in the compiler-rt? I actually tried adding these but couldn't make them work. I can give it another try.

Potentially tsan can be made to delay signals only for the duration of a single runtime callback, but that's a large endeavor and it's not completely clear how to do it.

Yeah, tbh even though this PR wasn't "technically correct", I was happy to have it because current async signal implementation in TSan looks pretty flaky. We have encountered these 2 functions that are marked incorrectly while testing the Firefox Profiler, but there is no guarantee that we would not encounter another one in the future (or when these are solved).

Currently nearly all the Firefox Profiler tests are disabled on TSan builds in CI because of hangs and timeouts. And I wanted to enable them to give us more confidence but that proved to be a big challenge.

I'm happy to help to make it better but indeed this suggested mechanism sounds like a big change.

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 25, 2024

Oh, do you mean that they need to be inserted by the program that's calling the syscall

Yes, unfortunately, in the program code. Currently sanitizers don't have any other means to intercept syscalls.
(there were some discussions re using some special kernel hooks or seccomp, but it never materialized)

@dvyukov
Copy link
Collaborator

dvyukov commented Mar 25, 2024

Yeah, tbh even though this PR wasn't "technically correct", I was happy to have it because current async signal implementation in TSan looks pretty flaky. We have encountered these 2 functions that are marked incorrectly while testing the Firefox Profiler, but there is no guarantee that we would not encounter another one in the future (or when these are solved).

I would say signal handling is the single major source of problems and fixes for tsan.

ErichDonGubler pushed a commit to ErichDonGubler/firefox that referenced this pull request Mar 25, 2024
…t builds r=sergesanspaille,glandium

Due to SIGPROF being async, it was hanging on some cases because some functions
were incorrectly marked non-blocking. This patch is merge to LLVM in:
llvm/llvm-project#85188 But we want to patch our clang
here to start benefiting from that quickly.

We are also patching our rustc here. Even though they are not used by default
during our normal builds, this custom rustc is needed for building and
running TSan already:
https://firefox-source-docs.mozilla.org/tools/sanitizer/tsan.html#llvm-clang-rust

Differential Revision: https://phabricator.services.mozilla.com/D204631
@vitalybuka
Copy link
Collaborator

I am sorry for your experience. @vitalybuka do you know if there are any docs on how to find appropriate reviewers for PRs? OR what's the process? In syzkaller we use CODEOWNERS thing to auto-assign people to PRs based on path wildcards.

Probably "first time commiters" can't pick reviewers anyway.

I guess we suppose to see some notifications, I remember @MaskRay set ed up something. I assumes it will auto-assign at some point?

This patch is in https://github.com/orgs/llvm/teams/pr-subscribers-compiler-rt-sanitizer and I guess at least ~12 people have it in mailbox, but didn't act on that yet.

@MaskRay
Copy link
Member

MaskRay commented Mar 25, 2024

I am sorry for your experience. @vitalybuka do you know if there are any docs on how to find appropriate reviewers for PRs? OR what's the process? In syzkaller we use CODEOWNERS thing to auto-assign people to PRs based on path wildcards.

Probably "first time commiters" can't pick reviewers anyway.

I guess we suppose to see some notifications, I remember @MaskRay set ed up something. I assumes it will auto-assign at some point?

This patch is in github.com/orgs/llvm/teams/pr-subscribers-compiler-rt-sanitizer and I guess at least ~12 people have it in mailbox, but didn't act on that yet.

https://llvm.org/docs/Contributing.html#how-to-submit-a-patch provides some information about reviewer selection.
If a user cannot add reviewers, @user0 @user1 is a good alternative.

llvm-project notifications are of huge volume and I think nobody can keep track of everything... I personally use quite a few filters and often miss "Team mention" ones (lot of times "Mention" as well; a lot of work is volunteer work).

Apply label "Mention": from:(notifications@github.com) to:(mention@noreply.github.com)
Apply label "Team mention": from:(notifications@github.com) to:(team_mention@noreply.github.com)

dvyukov pushed a commit that referenced this pull request Mar 26, 2024
…tsan (#86537)

Fixes #83844.

This PR adds callbacks to mark futex syscalls as blocking. Unfortunately
we didn't have a mechanism before to mark syscalls as a blocking call,
so I had to implement it, but it mostly reuses the `BlockingCall`
implementation
[here](https://github.com/llvm/llvm-project/blob/96819daa3d095cf9f662e0229dc82eaaa25480e8/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp#L362-L380).

The issue includes some information but this issue was discovered
because Rust uses futexes directly. So most likely we need to update
Rust as well to use these callbacks.

Also see the latest comments in #85188 for some context.
I also sent another PR #84162 to mark `pthread_*_lock` calls as
blocking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants