Add ThreadSanitizer support#12114
Conversation
|
Thanks! That was a great read. I have some naïve question as someone Performance impactMy first question was about the performance impact. This is answered in the PR discussion, but at the very end:
Follow-up question: is there a way to run a TSan-instrumented program with the TSan runtime disabled? In this mode I would expect the performance overhead to be very small. Does this exist, what is the cost? The question comes from the fact that having to move back and forth between two switches for a given problem is a pain, so being able to work within one given switch with a good enough "fast" mode would be very convenient in practice for race-hunting-and-also-normal-development sessions. Stack unwinding
Thinking out loud: If we used separate C stack fragments for C calls Out of curiosity: how do weird-stack runtimes such as Go do it, do (I don't see a big issue with relying on libunwind, I was just curious Instrumenting the runtime C codeDo we really want to instrument the runtime C code? Naively I would
For the GC code in particular, this is maybe 20-30% of our program (More precisely, an "instrument everything" mode is useful to catch (P.S.: This is briefly mentioned as future work.) false positives and false negatives
One could read between the line that your approach has false volatile
I should probably know as I was involved as a non-expert in a volatile (I was generally of the impression that the Multicore GC design was made precisely to not require any change to FFI code.) |
|
Many thanks to all those who have been involved in this work which I
find absolutely impressive and great.
I few comments about the `Add tsan configure flag` commit.
Re: the `WITH_TSAN` variable. Currently it's defined in
`Makefile.config.in` so its configured value will be in
`Makefile.config`, which will be installed. Since I am hoping that we
will reach a point where `Makefile.config` is notused by anybody and we
then can stop installingit, it would feel better to me not to provide
even more variables to users, with the risk that they start to use them
and then have to stop doing so. My suggestion would thus be tomove this
definition to `Makefile.build_config.in, which is private.
Also, regardingits name: configure takes options of the forms --with-ffo
and --enable-foo. You use the --enable-foo form which seems the right
choice to me but then it feels confusing to me that the build variable
has a `WITH_` in his name. How a bout calling itjust `TSAN`, especially
given that it's already how it's called in the configure script.
Regarding the logic that detects whether TSAn can be supported, I am
wonderingwhether it needs to be that complexor whether it could be
simplified a bit? For instance, to me the cases for thelinux and MacOS
systems look very similar, just the error messages seem different to me
and I don't even know whether this differenceis relevant or accidental.
Couldthese two cases be merged?
One alternative way to test these conditions could be to introduce 3
boolean variables: `arch_supports_sanitizer`, `cc_supports_sanitizer`
and `os_supports_sanitizer`. These variables oculd be initialized to
false and set to true as soon as we know on whicharch and OS we are and
which C compiler is used. You could then use an AS_CASE in the three
variables at the same time, given that the only combination which would
be okay would be true-true-true. In theelse clause, you could either
write a generic error message, or you could investigate what's missing
and be more precise.
It's jsut a suggestion, I am not sayingit will necessarily give a
better-lookingcode but you may want to try.
|
I’m not sure I understand what you mean by whole stack fragments. OCaml code can switch stacks several times in its life. It starts in a C stack from Every time a function is entered or exited, TSan must be notified ( To do this, we use frame descriptors to unwind OCaml stacks, and when on top of "chunk", if the stack has switched from a C stack, we use libuwind to unwind the C stack chunk, and so on. So we unwind "fragment/chunk" by "fragment/chunk", jumping to/from C to/from OCaml stacks where required. |
|
Thank you @gasche for your questions. Performance impact
It looks like there is no way to disable the TSan runtime in a way that makes the program faster, or at least no documented way that we could find. The only thing that exists is to add In principle one could link with a mock runtime (the TSan runtime is dynamically linked with the instrumented program) whose instrumentation calls do nothing. The program would still pay the non-negligible cost of external calls, but it would presumably run faster than with the actual TSan runtime (tough it’s hard to predict how much faster). Stack unwindingI also am not sure of how to understand stack fragments, do you suggest to spawn new fibers for calls from OCaml to C? Looking at how Go did it is a good idea! Instrumenting the runtime C code
Regarding the first kind of function, showing to TSan only an abstract view instead of instrumenting them would require some sort of annotation functions provided by TSan to signal synchronizing events. We did not find such functions in the TSan ABI (to make things harder, it is entirely undocumented), but I can check. Regarding your second point, I agree that it is desirable; we already un-instrument a number of functions for performance, and there is certainly more to be done this way. We see it as improvements that can be made incrementally, either in this PR if the reviewers deem it necessary, or later. To avoid introducing too many TSan-related annotations in the codebase, we could consider removing instrumentation of an entire C file where possible. false positives and false negatives
TSan can indeed miss races for two reasons. The first one is that, being a runtime detector, TSan will not detect races on code paths that are not visited. The second one is that TSan remembers a limited number of accesses for each memory word. (The default is 4 and can be increased to 7 via an env variable, at the cost of more RAM usage.) This can also, in principle, cause TSan to miss races. I edited the PR message to clarify this. volatile
I am referring to the discussions around the The important part above is “the use of |
|
Another naive question: it looks like both gcc and clang offer a thread sanitizer. Which one is used here? Can one use both? Why would one favor one over the other? |
|
It’s the same runtime library: initially developed as part of LLVM, it is now provided by GCC as well. Both gcc and clang vendor it as a dynamic library placed in their install directory. There should be no difference between a gcc build and a clang build in terms of race detection. |
|
Gabriel Scherer (2023/03/17 13:26 -0700):
Another naive question: it looks like both gcc and clang offer a
thread sanitizer. Which one is used here? Can one use both? Why would
one favor one over the other?
I started to review the code and my understanding is that the TS which
will be used depends on the detected compiler. If ocnfigure detects gcc
then its thread sanitizer will be used, same for clang. I will let the
authors of the PR correct me if I am wrong.
|
|
That is correct. |
8dfcd65 to
bd84d57
Compare
|
It looks like there is a lot of work that has gone into getting correct backtraces. How important is having the backtraces when a race is detected? I don't have a good sense of their utility when races are detected, and I'm genuinely curious. Have you found that having backtraces was useful in debugging the races that you discovered.
I also don't understand the relationship between frame pointer compilation mode and the use of frametable to unwind on the OCaml side and the use of Has this approach been considered and not pursued? |
|
Now that #12108 has been merged, you may want to rebase this PR on latest trunk. |
bd84d57 to
d69a3a5
Compare
|
@shindere thanks for the review, we will address your comments shortly!
In my experience with C and C++ programs having a full backtrace (at both the write and read/write points) is very useful. In a very simple program, such as a library test, it may not be necessary but in a larger scope program it quickly becomes indispensable.
The backtrace shown on both sides of a race are generated from the calls to The use of the Most of the function instrumentation for TSan backtrace is straightforward: it's done at the CMM level, adding call to |
|
Thanks @fabbing for the answers.
Indeed. It is imperative to have backtraces here. (Digressing a bit) It is surprising that TSan requires |
|
It is TSan’s way of producing backtraces of execution points in the past. A data race involves two memory accesses, and while the backtrace of the most recent access could be computed by simply unwinding the stack at that program point, that method does not work for the older, conflicting memory access. In addition, computing and storing the backtrace of every memory access would be prohibitively costly. That’s why TSan resorts to these calls to |
|
Thanks for the explanation @OlivierNicole. |
damiendoligez
left a comment
There was a problem hiding this comment.
After an interactive reviewing session with @fabbing and @OlivierNicole, I left a few cryptic comments as reminders for them.
988b2d2 to
63fb28d
Compare
|
Many thanks to @damiendoligez for his review and to @shindere for reviewing the build system aspects. We have addressed all the points raised in this first round of reviews. The build system changes requested by @shindere were directly squashed into the first 3 commits of the PR, while the changes requested by @damiendoligez have been added as atomic commits on top of the PR. We also rebased this PR to resolve conflicts. |
|
@damiendoligez have the new changes addressed your concerns? |
|
Reg |
|
No, |
You are right. They are inserted with |
gadmm
left a comment
There was a problem hiding this comment.
Here and in other related PRs, we see that there is a non-trivial interaction between the present PR and the current issues of the memory model.
- We do not know yet whether something like
Load_fieldwill be introduced; if this is important for the soundness of ThreadSanitizer implementation, would it be useful to make progress on #10992 first? - In another PR you told me the strategy to deal with initializing writes (which is to not instrument them) does not work currently when those initializing writes are performed from C code. Is there a plan to remove instrumentation for those C-side initializations in the runtime? In addition, there might remain a problem for C-side initializations when done from user C code, as permitted by the FFI documentation.
- What kind of examples should trigger the false positives you mentioned? I think these (non-)races can happen legitimately, and if they do not trigger false positives, it is probably interesting investigating why. I am not convinced yet that they will remain fringe occurrences.
One comment below is about references to #11040 as justifications to apply CAMLno_user_tsan; I do not understand why those bugs are not simply fixed (separately).
| /* Macro used to deactivate thread and address sanitizers on some | ||
| functions. */ | ||
| #define CAMLno_tsan | ||
| #define CAMLno_asan |
There was a problem hiding this comment.
You are moving this outside of the CAML_INTERNALS guard. Are you doing this because you expect programmers to use CAMLno_user_tsan?
There was a problem hiding this comment.
We pushed a commit that puts CAMLno_asan back to where it was, guarded by CAML_INTERNALS. Regarding CAMLno_user_tsan, it is used by Hd_val defined in mlvalues.h, which is not guarded by CAML_INTERNALS, so we are forced to expose CAMLno_user_tsan and CAMLno_tsan as well.
| } | ||
|
|
||
|
|
||
| CAMLno_user_tsan /* Disable TSan reports from this function (see #11040) */ |
There was a problem hiding this comment.
What is the best solution to keep track of the CAMLno_user_tsans that should eventually be removed after fixing the programming errros? Perhaps with an issue with a check list? Or TODO comments in the code?
To be precise about what can be meant by the word “soundness” that you use here: we are talking about false positives. False negatives from TSan have two well-defined sources—mentioned in the PR message—and I don’t think the problems with macros like So far we only have examples of false positives we have found are in the OCaml runtime. For instance the GC marking function We just pushed a new version of this PR that treats
Yes, I just pushed a commit that removes instrumentation from
That’s true. I don’t see a way to handle those properly. So this is a potential source of false positives. |
|
What do we want to do with the commit history here? I would be in favor of not squashing (keeping separate commits when meaningful), but the current history is a bit messy. |
|
Agree that the current history is messy showing how the PR evolved over time. I'm not against squashing everything into a single commit. |
|
Personally, I would prefer not to squash everything into a single commit. Most of the commits result from the review process. If there are too many, maybe we can squash the small, non-critical changes into one of the big 5 initial commits, and keep separate the commits fixing bugs or adding features. |
|
Indeed, for all commits that can be easily fixupped into one of the "main" commit, and tweaks the main commit instead of adding something new -- the right way to read the code is to pretend it had been there from the start -- fixupping them into the main commits is the right move. Then there are commits that make sense on their own (if you retold the story from scratch today, you could leave them out until later), and those can be kept. Finally, sometimes there are review commits that touch too many different things, or conflict with earlier commits you want to keep and it would be too painful to merge them properly; in this case it can be a compromise to keep them as-is and tolerate some history cruft to reduce your rebase work. You decide. |
This flag is unused for now. Co-authored-by: Olivier Nicole <olivier@chnik.fr>
Restore libunwind detection when TSan is enabled at configure time. This is based on the commit b694e84 by Seb Hinderer (@shindere), which was undone by ocaml#9948, with significant changes: - libunwind detection is only attempted when tsan is enabled - if tsan is enabled, libunwind is requested and not optional - libunwind_include_dirs and libunwind_link_dirs become libunwind_cppflags and libunwind_ldflags, respectively
ThreadSanitizer support is in two parts: instrumentation of the generated native code, and runtime support. The Cmm instrumentation pass is in asmcomp/thread_sanitizer.ml[i]. The new C file runtime/tsan.c handles the task of letting TSan know about function entries and exits when raising exceptions or handling effects. Finally, some of the instrumentation calls have to be inserted directly into the assembly routines of runtime/amd64.S. Co-authored-by: Fabrice Buoro <fabrice@tarides.com> Co-authored-by: Anmol Sahoo <anmol.sahoo25@gmail>
- Add a new set of tests in testsuite/tsan/ - A small number of tests have to be disabled when --enable-tsan is set, due to the fact that a call tree of depth >64k causes the ThreadSanitizer runtime to crash. (This is a limitation on the ThreadSanitizer side.) - Add the no-tsan action to ocamltest, in order to test whether --enable-tsan is set. Co-authored-by: Olivier Nicole <olivier@chnik.fr>
TSan warns about data races on these functions. We reported those warnings in ocaml#11040, and silence them to avoid facing users of TSan with data race reports that are not related to their code. This is done in two ways: either un-instrumenting those functions, or adding their name in __tsan_default_suppressions in tsan.c. Co-authored-by: Fabrice Buoro <fabrice@tarides.com>
The functions that we un-instrument are called often but should not contain data races with user code. Co-authored-by: Olivier Nicole <olivier@chnik.fr>
With ocamltest compiled with clang and TSan enabled, the second call to wait never returns causing ocamltest to hang forever.
|
We've squashed the commits that addressed the reviews comments into the 6 main commits, which are atomic in their subject. |
|
Merged! Congratulations @OlivierNicole and @fabbing, this is tons of really nice work. Thanks also @damiendoligez for the wizard review. |
|
A change in this PR breaks the CI that uses the address sanitizer and the UB sanitizer: Before this PR, |
|
Indeed, sorry about that! diff --git a/runtime/caml/misc.h b/runtime/caml/misc.h
index f0f131ac70..4b6f9c2e5b 100644
--- a/runtime/caml/misc.h
+++ b/runtime/caml/misc.h
@@ -566,7 +566,7 @@ CAMLextern int caml_snwprintf(wchar_t * buf,
#if defined(__has_feature)
# if __has_feature(address_sanitizer)
# undef CAMLno_asan
-# define CAMLno_asan __attribute__((disable_sanitizer_instrumentation))
+# define CAMLno_asan __attribute__((no_sanitize_address))
# endif
#else
# if __SANITIZE_ADDRESS__ |
|
OK for |
|
I agree, |
|
I remember making that change to
And I changed |
FWIW it looks like newer GCCs (like GCC14) have now learned about Switching the order of ifdefs seems to make it work on both gcc and clang, although if See https://discuss.ocaml.org/t/dune-3-15-3-ocaml-5-2-data-race-false-positive/14906/4?u=edwin for details, I'll probably open a PR to fix |
This PR introduces a new configure-time flag
--enable-tsanto enable compilation with ThreadSanitizer (TSan) instrumentation. This is a joint work with Fabrice Buoro (@fabbing), based on the work of Anmol Sahoo (@anmolsahoo25). We also had help from @jhjourdan and @maranget on memory models, @shindere on the build system, and are grateful to @art-w and @gadmm for their useful feedback.TSan is an approach developed by Google to locate data races in code bases. It works by instrumenting your executables to keep a history of previous memory accesses (at a certain performance cost), in order to detect data races, even when they have no visible effect on the execution. TSan instrumentation has been implemented in various compilers (gcc, clang, as well as the Go and Swift compilers), and has proved very effective in detecting hundreds of concurrency bugs in large projects.
Executables instrumented with TSan report data races without false positives. However, data races in code paths that are not visited will not be detected.
Example
A data race consists in two accesses to the same location, at least one of them being a write, without any order being enforced between them:
Compiling this program with a TSan-enabled compiler is done by the usual command
ocamlopt -g -I +unix unix.cmxa program.ml(-gallows to print line numbers in the reports). Running it will output a report looking like this:Many other examples can be found in the test suite. More information about ThreadSanitizer usage are available in the readme of the ocaml-tsan repo, which has been released as a fork based on 5.0.0 (release date December 16, 2022). This approach already allowed to detect a number of data races in OCaml libraries:
The generated executables are not impacted by this change when
--enable-tsanis not set. Compilation times are the same as before without--enable-tsan.Caveats
How TSan works
Executables are instrumented with calls to the TSan runtime library, which tracks accesses to shared data and reports races.
Internally the runtime library associates with each word of application memory at least 2 "shadow words". Each shadow word contains information about a recent memory access to that word, including a "scalar clock". Those clocks serve to establish a happens-before relation.
This information is maintained as a "shadow state" in a separate memory region, and updated at every (instrumented) memory access. A data race is reported every time two memory accesses are made to overlapping memory regions, and:
Instrumentation pass
Instrumentation calls are inserted in several places:
Memory accesses are instrumented with calls to the TSan runtime, with functions of the form
__tsan_read8,__tsan_atomic_store; those calls are inserted into theCmmrepresentation of code. We exploit mutability information to improve performance: immutable loads are translated to non-instrumented memory reads.Function entries and exits are instrumented with calls to
__tsan_func_entryand__tsan_func_exit. This is used by TSan to provide backtraces in data race reports (including backtraces of past memory accesses).The C runtime is instrumented as well, using the built-in ThreadSanitizer support of GCC or Clang. This is necessary as
Enabling(edit: no longer true). The OCaml compiler'stsanalso causes C code to be built with-fno-omit-frame-pointers--enable-frame-pointersconfigure flag is however independent of--enable-tsan.Instrumentation of function entries and exits is simple in C or C++ (where raising an exception always unwinds the stack), but challenging in OCaml which has non-local control flow due to exceptions and effect handlers. In order to keep correct backtraces in all circumstances, it is necessary to emit ad hoc calls to
__tsan_func_entryand/or__tsan_func_exitwhen an exception is raised, an effect is performed, or a continuation is resumed, in order to keep TSan’s view of the call stack up-to-date.Doing this requires unwinding the stack. When possible, we use the already-present mechanism of
frame_descrdescriptors, embedded in the executable, to do it. However, exceptions can be raised across C stack frames, and frame descriptors only exist for OCaml frames. In order to unwind the C parts of the stack, we use the libunwind library. As a consequence, one must install libunwind to build OCaml with--enable-tsan.__tsan_func_entrytakes as an argument the return address in the current stack frame. This required adding a constructorCreturn_addrtoCmmexpressions andIreturn_addrto the typeMach.operation, whose semantics is to fetch the return address from the stack frame.Embedding of the OCaml memory model into the C11 one
TSan’s underlying memory model is the C11 model. Therefore, we have to map OCaml memory operations to C11 operations. The relations between the two models have been the subject of many discussions (#10995, #10992, #10972 (comment)), which we took as a basis for our work. Conceptually, the instrumentation translates OCaml memory operations to C11 operations. Race-free OCaml programs are translated to race-free C programs, while OCaml programs containing races (in the OCaml sense) are translated to C programs that are racy (in the C11 sense).
The resulting runtime analysis does not report false positives, in the sense that all races reported by TSan are indeed data races in the OCaml sense1.
As for false negatives, does TSan have any? Yes, of two kinds. Being a runtime detector, it will not report data races on code paths that are not visited. The second limitation is that TSan remembers a limited number of accesses for each memory word. (The default is 4 and can be increased to 7 via an env variable, at the cost of more RAM usage.) This can also, in principle, cause TSan to miss races.
The mapping between OCaml memory operations and C11 operations signaled to TSan can be found here. In the same thread, the previous post contains a justification of the absence of false positives. In a meeting, we presented these to @maranget and @jhjourdan, who agreed that it should have the correctness properties we claim it has.
Performance cost of the instrumentation
First of all, there is no compilation or runtime cost associated with the change unless the compiler is configured with
--enable-tsan.When TSan instrumentation is enabled, it incurs a slowdown and increases memory consumption. Preliminary benchmarks show a slowdown in the range
7x-13x(edit: now 2x-7x) and a memory consumption increase in the range2x-7x(edit: now 4x-7x). Some pathological cases exist, such astestsuite/tests/lf_skiplist/test_parallel.mlfor which time and memory usage blow up by 23x and 27x, respectively. (We haven’t investigated further yet.)The memory consumption increase is better than for C/C++/Go (this conference talk reports 5x-10x). The slowdown is generally not as big as the one observed in C/C++/Go (5x-15x), which is expected as OCaml programs manipulate less mutable values than C on average. There may still be remaining low-hanging fruits in terms of optimization. As future work we can remove instrumentation from the runtime where it is not strictly necessary to correctly detect races in user programs. We can also investigate the aforementioned pathological case.
Organization of the changes
This PR is best reviewed commit by commit.
It is based on the commits of #12108 which introduces native-only C compiler flags. As a consequence, the changes actually belonging to this PR start at the 10th commit, “Add tsan configure flag”.Edit: #12108 has been merged.More than 60 % of the diff for this PR is due to a new set of tests in
testsuite/tsan/. A small number of tests have to be disabled when--enable-tsanis set, due to the fact that a call tree of depth >64k causes the ThreadSanitizer runtime to crash. (This is a limitation on the ThreadSanitizer side.)In the main “Add ThreadSanitizer support” commit, big changes are limited to a few files: the
Cmminstrumentation pass is inasmcomp/thread_sanitizer.ml[i]. The new C fileruntime/tsan.chandles the task of letting TSan know about function entries and exits when raising exceptions or handling effects. Finally, some of the instrumentation calls have to be inserted directly into the assembly routines ofruntime/amd64.S.In a separate commit, we disable TSan on a number of functions in the runtime. This is because TSan warns about data races on these functions. We reported those warnings in #11040, and silence them so that TSan users are not faced with data race reports that are not related to their code. To save performance, we also un-instrument some functions that are executed often, but should not create races with user code.
(Edit: fixed the command to compile the example, added mention of false negatives.)
Work left to do on this PR
--enable-tsanin case they reveal a failure in the implementation (without necessarily aiming for complete x86 macOS support)Footnotes
There is one exception, namely when at least one of the memory accesses in done from C (either from the OCaml runtime or from an FFI function). In that case, the definition of data races currently admitted by the OCaml maintainers is more restrictive than that of C11. See OCaml multicore memory model and C (runtime, FFI, VM) #10992. The takeaway is that in some rare cases TSan may report a race when we consider there is none, either due to a data dependency, or because we consider
volatilewrites to be like relaxed atomic stores. ↩