Skip to content

Add ThreadSanitizer support#12114

Merged
gasche merged 8 commits into
ocaml:trunkfrom
fabbing:tsan_patch
Jul 12, 2023
Merged

Add ThreadSanitizer support#12114
gasche merged 8 commits into
ocaml:trunkfrom
fabbing:tsan_patch

Conversation

@OlivierNicole

@OlivierNicole OlivierNicole commented Mar 16, 2023

Copy link
Copy Markdown
Contributor

This PR introduces a new configure-time flag --enable-tsan to enable compilation with ThreadSanitizer (TSan) instrumentation. This is a joint work with Fabrice Buoro (@fabbing), based on the work of Anmol Sahoo (@anmolsahoo25). We also had help from @jhjourdan and @maranget on memory models, @shindere on the build system, and are grateful to @art-w and @gadmm for their useful feedback.

TSan is an approach developed by Google to locate data races in code bases. It works by instrumenting your executables to keep a history of previous memory accesses (at a certain performance cost), in order to detect data races, even when they have no visible effect on the execution. TSan instrumentation has been implemented in various compilers (gcc, clang, as well as the Go and Swift compilers), and has proved very effective in detecting hundreds of concurrency bugs in large projects.

Executables instrumented with TSan report data races without false positives. However, data races in code paths that are not visited will not be detected.

Example

A data race consists in two accesses to the same location, at least one of them being a write, without any order being enforced between them:

type t = { mutable x : int }

let v = { x = 0 }

let () =
  let t1 = Domain.spawn (fun () -> v.x <- 10; Unix.sleep 1) in
  let t2 = Domain.spawn (fun () -> ignore v.x; Unix.sleep 1) in
  Domain.join t1;
  Domain.join t2

Compiling this program with a TSan-enabled compiler is done by the usual command ocamlopt -g -I +unix unix.cmxa program.ml (-g allows to print line numbers in the reports). Running it will output a report looking like this:

==================
WARNING: ThreadSanitizer: data race (pid=4170290)
  Read of size 8 at 0x7f072bbfe498 by thread T4 (mutexes: write M0):
    #0 camlSimpleRace__fun_524 /tmp/simpleRace.ml:7 (simpleRace.exe+0x43dc9d)
    #1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
    #2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
    #3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
    #4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
    #5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)

  Previous write of size 8 at 0x7f072bbfe498 by thread T1 (mutexes: write M1):
    #0 camlSimpleRace__fun_520 /tmp/simpleRace.ml:6 (simpleRace.exe+0x43dc45)
    #1 camlStdlib__Domain__body_696 /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/stdlib/domain.ml:202 (simpleRace.exe+0x47b5dc)
    #2 caml_start_program ??:? (simpleRace.exe+0x4f51c3)
    #3 caml_callback_exn /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:168 (simpleRace.exe+0x4c2b93)
    #4 caml_callback /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/callback.c:256 (simpleRace.exe+0x4c36e3)
    #5 domain_thread_func /home/olivier/.opam/5.0.0+tsan/.opam-switch/build/ocaml-variants.5.0.0+tsan/runtime/domain.c:1093 (simpleRace.exe+0x4c6ad1)

  Mutex M0 (0x000000567ad8) created at:
    #0 pthread_mutex_init /home/olivier/other_projects/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1316 (libtsan.so.0+0x3cafb)
    [...]

SUMMARY: ThreadSanitizer: data race /tmp/simpleRace.ml:7 in camlSimpleRace__fun_524
==================
ThreadSanitizer: reported 1 warnings

Many other examples can be found in the test suite. More information about ThreadSanitizer usage are available in the readme of the ocaml-tsan repo, which has been released as a fork based on 5.0.0 (release date December 16, 2022). This approach already allowed to detect a number of data races in OCaml libraries:

The generated executables are not impacted by this change when --enable-tsan is not set. Compilation times are the same as before without --enable-tsan.

Caveats

  • We inherit the portability limitations of ThreadSanitizer of not being available on Windows.
  • Currently, only instrumentation of x86_64 executables is supported; future work could include ARM64 support.

How TSan works

Executables are instrumented with calls to the TSan runtime library, which tracks accesses to shared data and reports races.

Internally the runtime library associates with each word of application memory at least 2 "shadow words". Each shadow word contains information about a recent memory access to that word, including a "scalar clock". Those clocks serve to establish a happens-before relation.

This information is maintained as a "shadow state" in a separate memory region, and updated at every (instrumented) memory access. A data race is reported every time two memory accesses are made to overlapping memory regions, and:

  • one of them is a write, and
  • there is no established happens-before relation between them.

Instrumentation pass

Instrumentation calls are inserted in several places:

  • Memory accesses are instrumented with calls to the TSan runtime, with functions of the form __tsan_read8, __tsan_atomic_store; those calls are inserted into the Cmm representation of code. We exploit mutability information to improve performance: immutable loads are translated to non-instrumented memory reads.

  • Function entries and exits are instrumented with calls to __tsan_func_entry and __tsan_func_exit. This is used by TSan to provide backtraces in data race reports (including backtraces of past memory accesses).

  • The C runtime is instrumented as well, using the built-in ThreadSanitizer support of GCC or Clang. This is necessary as

    1. many relevant memory accesses are made from the runtime, and
    2. the runtime handles thread-synchronizing operations such as locking and unlocking, operations which TSan needs to know about.

    Enabling tsan also causes C code to be built with -fno-omit-frame-pointers (edit: no longer true). The OCaml compiler's --enable-frame-pointers configure flag is however independent of --enable-tsan.

Instrumentation of function entries and exits is simple in C or C++ (where raising an exception always unwinds the stack), but challenging in OCaml which has non-local control flow due to exceptions and effect handlers. In order to keep correct backtraces in all circumstances, it is necessary to emit ad hoc calls to __tsan_func_entry and/or __tsan_func_exit when an exception is raised, an effect is performed, or a continuation is resumed, in order to keep TSan’s view of the call stack up-to-date.

Doing this requires unwinding the stack. When possible, we use the already-present mechanism of frame_descr descriptors, embedded in the executable, to do it. However, exceptions can be raised across C stack frames, and frame descriptors only exist for OCaml frames. In order to unwind the C parts of the stack, we use the libunwind library. As a consequence, one must install libunwind to build OCaml with --enable-tsan.

__tsan_func_entry takes as an argument the return address in the current stack frame. This required adding a constructor Creturn_addr to Cmm expressions and Ireturn_addr to the type Mach.operation, whose semantics is to fetch the return address from the stack frame.

Embedding of the OCaml memory model into the C11 one

TSan’s underlying memory model is the C11 model. Therefore, we have to map OCaml memory operations to C11 operations. The relations between the two models have been the subject of many discussions (#10995, #10992, #10972 (comment)), which we took as a basis for our work. Conceptually, the instrumentation translates OCaml memory operations to C11 operations. Race-free OCaml programs are translated to race-free C programs, while OCaml programs containing races (in the OCaml sense) are translated to C programs that are racy (in the C11 sense).

The resulting runtime analysis does not report false positives, in the sense that all races reported by TSan are indeed data races in the OCaml sense1.

As for false negatives, does TSan have any? Yes, of two kinds. Being a runtime detector, it will not report data races on code paths that are not visited. The second limitation is that TSan remembers a limited number of accesses for each memory word. (The default is 4 and can be increased to 7 via an env variable, at the cost of more RAM usage.) This can also, in principle, cause TSan to miss races.

The mapping between OCaml memory operations and C11 operations signaled to TSan can be found here. In the same thread, the previous post contains a justification of the absence of false positives. In a meeting, we presented these to @maranget and @jhjourdan, who agreed that it should have the correctness properties we claim it has.

Performance cost of the instrumentation

First of all, there is no compilation or runtime cost associated with the change unless the compiler is configured with --enable-tsan.

When TSan instrumentation is enabled, it incurs a slowdown and increases memory consumption. Preliminary benchmarks show a slowdown in the range 7x-13x (edit: now 2x-7x) and a memory consumption increase in the range 2x-7x (edit: now 4x-7x). Some pathological cases exist, such as testsuite/tests/lf_skiplist/test_parallel.ml for which time and memory usage blow up by 23x and 27x, respectively. (We haven’t investigated further yet.)

The memory consumption increase is better than for C/C++/Go (this conference talk reports 5x-10x). The slowdown is generally not as big as the one observed in C/C++/Go (5x-15x), which is expected as OCaml programs manipulate less mutable values than C on average. There may still be remaining low-hanging fruits in terms of optimization. As future work we can remove instrumentation from the runtime where it is not strictly necessary to correctly detect races in user programs. We can also investigate the aforementioned pathological case.

Organization of the changes

This PR is best reviewed commit by commit.

It is based on the commits of #12108 which introduces native-only C compiler flags. As a consequence, the changes actually belonging to this PR start at the 10th commit, “Add tsan configure flag”. Edit: #12108 has been merged.

More than 60 % of the diff for this PR is due to a new set of tests in testsuite/tsan/. A small number of tests have to be disabled when --enable-tsan is set, due to the fact that a call tree of depth >64k causes the ThreadSanitizer runtime to crash. (This is a limitation on the ThreadSanitizer side.)

In the main “Add ThreadSanitizer support” commit, big changes are limited to a few files: the Cmm instrumentation pass is in asmcomp/thread_sanitizer.ml[i]. The new C file runtime/tsan.c handles the task of letting TSan know about function entries and exits when raising exceptions or handling effects. Finally, some of the instrumentation calls have to be inserted directly into the assembly routines of runtime/amd64.S.

In a separate commit, we disable TSan on a number of functions in the runtime. This is because TSan warns about data races on these functions. We reported those warnings in #11040, and silence them so that TSan users are not faced with data race reports that are not related to their code. To save performance, we also un-instrument some functions that are executed often, but should not create races with user code.

(Edit: fixed the command to compile the example, added mention of false negatives.)

Work left to do on this PR

  • Finish fixing the test suite for clang
  • For one disabled test the reason for the disabling is unclear (see https://github.com/ocaml-multicore/ocaml-tsan#31), address it
  • The macOS build seems broken in our CI due to a wrong flag, address it
  • Add documentation comments to explain what the Cmm instrumentation pass does and why, and explain the modifications made in the assembly stubs.
  • Investigate test failure on macOS x86 with --enable-tsan in case they reveal a failure in the implementation (without necessarily aiming for complete x86 macOS support)

Footnotes

  1. There is one exception, namely when at least one of the memory accesses in done from C (either from the OCaml runtime or from an FFI function). In that case, the definition of data races currently admitted by the OCaml maintainers is more restrictive than that of C11. See OCaml multicore memory model and C (runtime, FFI, VM) #10992. The takeaway is that in some rare cases TSan may report a race when we consider there is none, either due to a data dependency, or because we consider volatile writes to be like relaxed atomic stores.

@gasche

gasche commented Mar 16, 2023

Copy link
Copy Markdown
Member

Thanks! That was a great read. I have some naïve question as someone
who knows nothing about TSan.

Performance impact

My first question was about the performance impact. This is answered in the PR discussion, but at the very end:

  • 7x-13x speed slowdown (pathological cases at 23x),
  • 2x-7x memory increase (pathological cases at 27x).

Follow-up question: is there a way to run a TSan-instrumented program with the TSan runtime disabled? In this mode I would expect the performance overhead to be very small. Does this exist, what is the cost?

The question comes from the fact that having to move back and forth between two switches for a given problem is a pain, so being able to work within one given switch with a good enough "fast" mode would be very convenient in practice for race-hunting-and-also-normal-development sessions.

Stack unwinding

Doing this requires unwinding the stack. When possible, we use the
already-present mechanism of frame_descr descriptors, embedded in
the executable, to do it. However, exceptions can be raised across
C stack frames, and frame descriptors only exist for OCaml
frames. In order to unwind the C parts of the stack, we use the
libunwind library.

Thinking out loud: If we used separate C stack fragments for C calls
that are separated by OCaml calls, then we could express an OCaml
exception (either raised from C or traversing a C stack fragment) as
just unwinding the whole stack fragment.

Out of curiosity: how do weird-stack runtimes such as Go do it, do
they also rely on libunwind?

(I don't see a big issue with relying on libunwind, I was just curious
about this.)

Instrumenting the runtime C code

Do we really want to instrument the runtime C code? Naively I would
expect that for many function provided by the runtime, one of the
following holds:

  • there is an "abstract" view of the effect of the function that could
    be taught to the OCaml instrumentation pass, which could be simpler
    than instrumenting the implementation itself (typically: Mutex or
    Atomic primitives.)

  • assuming that the runtime code is correct, no instrumentation should
    be necessary (typically: the GC code)

For the GC code in particular, this is maybe 20-30% of our program
time that is spent reading and mutating memory constantly, doing
relaxed reads. I would expect the performance impact of
instrumentation to be extremely high, for something that (assuming the
runtime is correct) we assume is never necessary.

(More precisely, an "instrument everything" mode is useful to catch
synchronization bugs in the runtime, but it generates reports that are
confusing to users and that they are helpless to fix, so maybe we
don't want it by default.)

(P.S.: This is briefly mentioned as future work.)

false positives and false negatives

The resulting runtime analysis does not report false positives, in
the sense that all races reported by TSan are indeed data races in
the OCaml sense.

One could read between the line that your approach has false
negatives, that is, it misses some data races in the OCaml sense. Is
this true? (If yes, then maybe you should state it clearly and
ideally explain how this can happen.)

volatile

False positives due to volatile can only appear in legacy FFI code,
not written with parallelism in mind.

I should probably know as I was involved as a non-expert in a volatile
discussion already, but I don't. What is this "legacy FFI code not
written with parallelism in mind", and what is the correct way to
update it to "modern FFI code written with parallelism in mind"? Is this explained/documented somewhere?

(I was generally of the impression that the Multicore GC design was made precisely to not require any change to FFI code.)

@shindere

shindere commented Mar 17, 2023 via email

Copy link
Copy Markdown
Contributor

@fabbing

fabbing commented Mar 17, 2023

Copy link
Copy Markdown
Contributor

Stack unwinding

Thinking out loud: If we used separate C stack fragments for C calls that are separated by OCaml calls, then we could express an OCaml exception (either raised from C or traversing a C stack fragment) as just unwinding the whole stack fragment.

Out of curiosity: how do weird-stack runtimes such as Go do it, do they also rely on libunwind?

(I don't see a big issue with relying on libunwind, I was just curious about this.)

I’m not sure I understand what you mean by whole stack fragments.

OCaml code can switch stacks several times in its life. It starts in a C stack from main to caml_program, then switches to an OCaml stack (a fiber), and the flow of execution may cause it to switch back to the C stack to call primitives, GC operations... and maybe back to the C stack if Callbacks are involved.
Effects also cause switching to another fiber.

Every time a function is entered or exited, TSan must be notified (__tsan_func_entry and __tsan_func_exit) to be able to give a backtrace where a race occurred (on both sides of the race). TSan doesn't know where stack switching happens, they are invisible to TSan. When an exception is raised, an effect is performed or a continuation is resumed, the standard execution flow is broken and we inform TSan of functions that we leave definitely (in case of an exception) or maybe temporarily (in case of performing an effect) or re-enter (in case of resuming a continuation), so that new function calls happen at the correct place in terms of stacktrace.

To do this, we use frame descriptors to unwind OCaml stacks, and when on top of "chunk", if the stack has switched from a C stack, we use libuwind to unwind the C stack chunk, and so on. So we unwind "fragment/chunk" by "fragment/chunk", jumping to/from C to/from OCaml stacks where required.

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

Thank you @gasche for your questions.

Performance impact

is there a way to run a TSan-instrumented program with the TSan runtime disabled? In this mode I would expect the performance overhead to be very small. Does this exist, what is the cost?

It looks like there is no way to disable the TSan runtime in a way that makes the program faster, or at least no documented way that we could find. The only thing that exists is to add report_bugs=0 to the TSAN_OPTIONS environment variable, which disables warnings, but does not accelerate the program.

In principle one could link with a mock runtime (the TSan runtime is dynamically linked with the instrumented program) whose instrumentation calls do nothing. The program would still pay the non-negligible cost of external calls, but it would presumably run faster than with the actual TSan runtime (tough it’s hard to predict how much faster).

Stack unwinding

I also am not sure of how to understand stack fragments, do you suggest to spawn new fibers for calls from OCaml to C?

Looking at how Go did it is a good idea!

Instrumenting the runtime C code

Naively I would expect that for many function provided by the runtime, one of the following holds:

  • there is an "abstract" view of the effect of the function that could be taught to the OCaml instrumentation pass, which could be simpler than instrumenting the implementation itself (typically: Mutex or Atomic primitives.)

  • assuming that the runtime code is correct, no instrumentation should be necessary (typically: the GC code)

Regarding the first kind of function, showing to TSan only an abstract view instead of instrumenting them would require some sort of annotation functions provided by TSan to signal synchronizing events. We did not find such functions in the TSan ABI (to make things harder, it is entirely undocumented), but I can check.

Regarding your second point, I agree that it is desirable; we already un-instrument a number of functions for performance, and there is certainly more to be done this way. We see it as improvements that can be made incrementally, either in this PR if the reviewers deem it necessary, or later. To avoid introducing too many TSan-related annotations in the codebase, we could consider removing instrumentation of an entire C file where possible.

false positives and false negatives

One could read between the line that your approach has false
negatives, that is, it misses some data races in the OCaml sense. Is
this true? (If yes, then maybe you should state it clearly and
ideally explain how this can happen.)

TSan can indeed miss races for two reasons. The first one is that, being a runtime detector, TSan will not detect races on code paths that are not visited. The second one is that TSan remembers a limited number of accesses for each memory word. (The default is 4 and can be increased to 7 via an env variable, at the cost of more RAM usage.) This can also, in principle, cause TSan to miss races. I edited the PR message to clarify this.

volatile

What is this "legacy FFI code not
written with parallelism in mind", and what is the correct way to
update it to "modern FFI code written with parallelism in mind"? Is this explained/documented somewhere?

(I was generally of the impression that the Multicore GC design was made precisely to not require any change to FFI code.)

I am referring to the discussions around the Field macro in #10992. My understanding of the current situation is the following: the use of Field as an l-value (i.e., for reading) in a multicore program should not cause crashes in practice, but is discouraged in new code and a new macro Load_field should be preferred (although as of now this macro hasn’t been added to OCaml). In sequential code it is okay to use Field.

The important part above is “the use of Field should not cause crashes in practice”. This is achieved via the volatile qualifier and TSan does not support this out of the box, so it may report such uses of Field as races, even though in practice it’s okay. Maybe TSan support can be improved in the future to understand volatile which would suppress those edge cases. Such false positives have not show up yet and I believe they will remain fringe occurrences.

@gasche

gasche commented Mar 17, 2023

Copy link
Copy Markdown
Member

Another naive question: it looks like both gcc and clang offer a thread sanitizer. Which one is used here? Can one use both? Why would one favor one over the other?

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

It’s the same runtime library: initially developed as part of LLVM, it is now provided by GCC as well. Both gcc and clang vendor it as a dynamic library placed in their install directory. There should be no difference between a gcc build and a clang build in terms of race detection.

@shindere

shindere commented Mar 17, 2023 via email

Copy link
Copy Markdown
Contributor

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

That is correct.

@kayceesrk

kayceesrk commented Mar 18, 2023

Copy link
Copy Markdown
Contributor

It looks like there is a lot of work that has gone into getting correct backtraces. How important is having the backtraces when a race is detected? I don't have a good sense of their utility when races are detected, and I'm genuinely curious. Have you found that having backtraces was useful in debugging the races that you discovered.

Enabling tsan also causes C code to be built with -fno-omit-frame-pointers. The OCaml compiler's --enable-frame-pointers configure flag is however independent of --enable-tsan.

I also don't understand the relationship between frame pointer compilation mode and the use of frametable to unwind on the OCaml side and the use of libunwind on the C side. Given that TSan is an optional mode, would it be easier if you compiled with frame pointers both on the OCaml and the C side when TSan is enabled. In my naive understanding of how things work with frame pointers, if you had frame pointers, then you wouldn't need to do any extra work in the OCaml compiler and the TSan library should be able to produce backtraces just using the framepointers. Is this correct?

Has this approach been considered and not pursued?

@shindere

Copy link
Copy Markdown
Contributor

Now that #12108 has been merged, you may want to rebase this PR on latest trunk.

@fabbing

fabbing commented Mar 20, 2023

Copy link
Copy Markdown
Contributor

@shindere thanks for the review, we will address your comments shortly!

@kayceesrk

It looks like there is a lot of work that has gone into getting correct backtraces. How important is having the backtraces when a race is detected? I don't have a good sense of their utility when races are detected, and I'm genuinely curious. Have you found that having backtraces was useful in debugging the races that you discovered.

In my experience with C and C++ programs having a full backtrace (at both the write and read/write points) is very useful. In a very simple program, such as a library test, it may not be necessary but in a larger scope program it quickly becomes indispensable.
For example, most mutations in OCaml code are done through the C function caml_modify. Without backtrace you would only be informed of a race at some memory address in caml_modify making it difficult to link to a specific code.

I also don't understand the relationship between frame pointer compilation mode and the use of frametable to unwind on the OCaml side and the use of libunwind on the C side. Given that TSan is an optional mode, would it be easier if you compiled with frame pointers both on the OCaml and the C side when TSan is enabled. In my naive understanding of how things work with frame pointers, if you had frame pointers, then you wouldn't need to do any extra work in the OCaml compiler and the TSan library should be able to produce backtraces just using the framepointers. Is this correct?

Has this approach been considered and not pursued?

The backtrace shown on both sides of a race are generated from the calls to __tsan_func_entry and __tsan_func_exit.
If we didn't emit these calls, at the very least the older side of a race would have no a associated backtrace, as this information is stored by TSan, even if you could get the current access backtrace from DWARF.

The use of the -fno-omit-frame-pointers flag was required in previous documentation, but this no longer seems to be the case (ThreadSanitizer) and the TSan testsuite passes without it, so we will remove this flag from the patch.
In summary, from now on, there should be no relationship between enabling TSan and frame-pointer business.

Most of the function instrumentation for TSan backtrace is straightforward: it's done at the CMM level, adding call to __tsan_func_entry at entry and __tsan_func_exit on exit (with some special consideration for tail calls).
We only do the stack unwinding ourselves in the special control flow case of exceptions and effects, in order to emit the necessary calls to entry/exit. Without this, TSan backtraces would become unbalanced and quickly run into backtracce under or overflow, causing crashes.

@kayceesrk

kayceesrk commented Mar 20, 2023

Copy link
Copy Markdown
Contributor

Thanks @fabbing for the answers.

For example, most mutations in OCaml code are done through the C function caml_modify. Without backtrace you would only be informed of a race at some memory address in caml_modify making it difficult to link to a specific code.

Indeed. It is imperative to have backtraces here.

(Digressing a bit) It is surprising that TSan requires __tsan_func_entry and __tsan_fun_exit. Is it useful for anything other than producing backtraces.

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

It is TSan’s way of producing backtraces of execution points in the past. A data race involves two memory accesses, and while the backtrace of the most recent access could be computed by simply unwinding the stack at that program point, that method does not work for the older, conflicting memory access. In addition, computing and storing the backtrace of every memory access would be prohibitively costly.

That’s why TSan resorts to these calls to __tsan_func_entry and __tsan_func_exit. Function entry and exit events are stored in a large event log (implemented as a ring buffer), from which past backtraces can be reconstructed. This is also used to show backtraces of mutex locking points and thread creation points. So __tsan_func_entry and __tsan_func_exit are only used to produce backtraces, but they are quite necessary to do this.

@kayceesrk

Copy link
Copy Markdown
Contributor

Thanks for the explanation @OlivierNicole.

@damiendoligez damiendoligez left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an interactive reviewing session with @fabbing and @OlivierNicole, I left a few cryptic comments as reminders for them.

Comment thread asmcomp/amd64/emit.mlp
Comment thread asmcomp/cmm_helpers.ml Outdated
Comment thread asmcomp/cmm_helpers.ml Outdated
Comment thread asmcomp/thread_sanitizer.ml Outdated
Comment thread asmcomp/thread_sanitizer.ml Outdated
Comment thread configure.ac Outdated
Comment thread runtime/amd64.S Outdated
Comment thread runtime/amd64.S Outdated
Comment thread runtime/tsan.c Outdated
Comment thread testsuite/tests/lf_skiplist/test_parallel.ml Outdated
@gasche gasche added this to the 5.1 milestone Apr 21, 2023
@OlivierNicole

OlivierNicole commented Apr 27, 2023

Copy link
Copy Markdown
Contributor Author

Many thanks to @damiendoligez for his review and to @shindere for reviewing the build system aspects. We have addressed all the points raised in this first round of reviews. The build system changes requested by @shindere were directly squashed into the first 3 commits of the PR, while the changes requested by @damiendoligez have been added as atomic commits on top of the PR.

We also rebased this PR to resolve conflicts.

@kayceesrk

Copy link
Copy Markdown
Contributor

@damiendoligez have the new changes addressed your concerns?

@kayceesrk

Copy link
Copy Markdown
Contributor

Reg __tsan_func_entry and __tsan_func_exit, these functions are directly called on the OCaml stack, if I am reading correctly. How do I know whether I have enough space on the OCaml stack to execute these C functions?

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

No, __tsan_func_entry and __tsan_func_exit should be executed on the C stack.

@kayceesrk

Copy link
Copy Markdown
Contributor

No, __tsan_func_entry and __tsan_func_exit should be executed on the C stack.

You are right. They are inserted with Cextcall in thread_sanitizer.ml. Thanks for the clarification.

@gadmm gadmm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other related PRs, we see that there is a non-trivial interaction between the present PR and the current issues of the memory model.

  • We do not know yet whether something like Load_field will be introduced; if this is important for the soundness of ThreadSanitizer implementation, would it be useful to make progress on #10992 first?
  • In another PR you told me the strategy to deal with initializing writes (which is to not instrument them) does not work currently when those initializing writes are performed from C code. Is there a plan to remove instrumentation for those C-side initializations in the runtime? In addition, there might remain a problem for C-side initializations when done from user C code, as permitted by the FFI documentation.
  • What kind of examples should trigger the false positives you mentioned? I think these (non-)races can happen legitimately, and if they do not trigger false positives, it is probably interesting investigating why. I am not convinced yet that they will remain fringe occurrences.

One comment below is about references to #11040 as justifications to apply CAMLno_user_tsan; I do not understand why those bugs are not simply fixed (separately).

Comment thread runtime/caml/mlvalues.h Outdated
/* Macro used to deactivate thread and address sanitizers on some
functions. */
#define CAMLno_tsan
#define CAMLno_asan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are moving this outside of the CAML_INTERNALS guard. Are you doing this because you expect programmers to use CAMLno_user_tsan?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pushed a commit that puts CAMLno_asan back to where it was, guarded by CAML_INTERNALS. Regarding CAMLno_user_tsan, it is used by Hd_val defined in mlvalues.h, which is not guarded by CAML_INTERNALS, so we are forced to expose CAMLno_user_tsan and CAMLno_tsan as well.

Comment thread runtime/fiber.c Outdated
}


CAMLno_user_tsan /* Disable TSan reports from this function (see #11040) */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take-away from #11040 is that those TSan reports hide actual programming errors. In this case, the solution would not be to use CAMLno_user_tsan, but to fix them in a separate PR. (Idem for other mentions of #11040 below.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I failed to make this clear. Silencing those reports is only a temporary measure until the problems in #11040 are fixed, which we attempt to start doing in #12030.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the best solution to keep track of the CAMLno_user_tsans that should eventually be removed after fixing the programming errros? Perhaps with an issue with a check list? Or TODO comments in the code?

@OlivierNicole

OlivierNicole commented May 11, 2023

Copy link
Copy Markdown
Contributor Author

Here and in other related PRs, we see that there is a non-trivial interaction between the present PR and the current issues of the memory model.

  • We do not know yet whether something like Load_field will be introduced; if this is important for the soundness of ThreadSanitizer implementation, would it be useful to make progress on OCaml multicore memory model and C (runtime, FFI, VM) #10992 first?
    […]
  • What kind of examples should trigger the false positives you mentioned? I think these (non-)races can happen legitimately, and if they do not trigger false positives, it is probably interesting investigating why. I am not convinced yet that they will remain fringe occurrences.

To be precise about what can be meant by the word “soundness” that you use here: we are talking about false positives. False negatives from TSan have two well-defined sources—mentioned in the PR message—and I don’t think the problems with macros like Field interfere with this.

So far we only have examples of false positives we have found are in the OCaml runtime. For instance the GC marking function mark_stack_push_block can read block fields using the Field macro concurrently with another domain writing to the same field via caml_modify. At your instigation, the compiler maintainers (at least for now) have made Field perform a volatile read and do not consider this a race; if TSan does not take volatile into account, it will report a race there.

We just pushed a new version of this PR that treats volatile reads as relaxed atomic loads. This should remove these false positives.

  • In another PR you told me the strategy to deal with initializing writes (which is to not instrument them) does not work currently when those initializing writes are performed from C code. Is there a plan to remove instrumentation for those C-side initializations in the runtime?

Yes, I just pushed a commit that removes instrumentation from caml_initialize.

In addition, there might remain a problem for C-side initializations when done from user C code, as permitted by the FFI documentation.

That’s true. I don’t see a way to handle those properly. So this is a potential source of false positives.

@gasche

gasche commented Jul 12, 2023

Copy link
Copy Markdown
Member

What do we want to do with the commit history here? I would be in favor of not squashing (keeping separate commits when meaningful), but the current history is a bit messy.

@kayceesrk

Copy link
Copy Markdown
Contributor

Agree that the current history is messy showing how the PR evolved over time. I'm not against squashing everything into a single commit.

@OlivierNicole

Copy link
Copy Markdown
Contributor Author

Personally, I would prefer not to squash everything into a single commit.

Most of the commits result from the review process. If there are too many, maybe we can squash the small, non-critical changes into one of the big 5 initial commits, and keep separate the commits fixing bugs or adding features.

@gasche

gasche commented Jul 12, 2023

Copy link
Copy Markdown
Member

Indeed, for all commits that can be easily fixupped into one of the "main" commit, and tweaks the main commit instead of adding something new -- the right way to read the code is to pretend it had been there from the start -- fixupping them into the main commits is the right move. Then there are commits that make sense on their own (if you retold the story from scratch today, you could leave them out until later), and those can be kept. Finally, sometimes there are review commits that touch too many different things, or conflict with earlier commits you want to keep and it would be too painful to merge them properly; in this case it can be a compromise to keep them as-is and tolerate some history cruft to reduce your rebase work. You decide.

fabbing and others added 8 commits July 12, 2023 12:10
This flag is unused for now.

Co-authored-by: Olivier Nicole <olivier@chnik.fr>
Restore libunwind detection when TSan is enabled at configure time. This
is based on the commit b694e84 by Seb Hinderer (@shindere), which was
undone by ocaml#9948, with significant changes:

- libunwind detection is only attempted when tsan is enabled
- if tsan is enabled, libunwind is requested and not optional
- libunwind_include_dirs and libunwind_link_dirs become
  libunwind_cppflags and libunwind_ldflags, respectively
ThreadSanitizer support is in two parts: instrumentation of the
generated native code, and runtime support.

The Cmm instrumentation pass is in asmcomp/thread_sanitizer.ml[i]. The
new C file runtime/tsan.c handles the task of letting TSan know about
function entries and exits when raising exceptions or handling effects.
Finally, some of the instrumentation calls have to be inserted directly
into the assembly routines of runtime/amd64.S.

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>
Co-authored-by: Anmol Sahoo <anmol.sahoo25@gmail>
- Add a new set of tests in testsuite/tsan/
- A small number of tests have to be disabled when --enable-tsan is set,
  due to the fact that a call tree of depth >64k causes the
  ThreadSanitizer runtime to crash. (This is a limitation on the
  ThreadSanitizer side.)
- Add the no-tsan action to ocamltest, in order to test whether
  --enable-tsan is set.

Co-authored-by: Olivier Nicole <olivier@chnik.fr>
TSan warns about data races on these functions. We reported those
warnings in ocaml#11040, and silence them to avoid facing users of
TSan with data race reports that are not related to their code.
This is done in two ways: either un-instrumenting those functions, or
adding their name in __tsan_default_suppressions in tsan.c.

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>
The functions that we un-instrument are called often but should not
contain data races with user code.

Co-authored-by: Olivier Nicole <olivier@chnik.fr>
With ocamltest compiled with clang and TSan enabled, the second call to wait
never returns causing ocamltest to hang forever.
@fabbing

fabbing commented Jul 12, 2023

Copy link
Copy Markdown
Contributor

We've squashed the commits that addressed the reviews comments into the 6 main commits, which are atomic in their subject.
There is also a small commit for ocamltest that fixes an issue that arose when testing TSan, it was reviewed and improved by @shindere.

@gasche gasche merged commit c717b6b into ocaml:trunk Jul 12, 2023
@gasche

gasche commented Jul 12, 2023

Copy link
Copy Markdown
Member

Merged! Congratulations @OlivierNicole and @fabbing, this is tons of really nice work. Thanks also @damiendoligez for the wizard review.

@xavierleroy

Copy link
Copy Markdown
Contributor

A change in this PR breaks the CI that uses the address sanitizer and the UB sanitizer:

./configure CC=clang-13 CFLAGS=-O1 -fno-omit-frame-pointer -fsanitize=address -fsanitize-trap=bool,builtin,bounds,enum,nonnull-attribute,nullability,object-size,pointer-overflow,returns-nonnull-attribute,shift-exponent,unreachable --disable-stdlib-manpages --enable-dependency-generation
[...]
runtime/fail_nat.c:94:1: error: unknown attribute 'disable_sanitizer_instrumentation' ignored [-Werror,-Wunknown-attributes]
CAMLno_asan
^~~~~~~~~~~
./runtime/caml/misc.h:568:40: note: expanded from macro 'CAMLno_asan'
#    define CAMLno_asan __attribute__((disable_sanitizer_instrumentation))
                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before this PR, CAMLno_asan was defined as __attribute__((no_sanitize("address"))).

@fabbing

fabbing commented Jul 18, 2023

Copy link
Copy Markdown
Contributor

Indeed, sorry about that!
disable_sanitizer_instrumentation (https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#id356) is only available as of clang 14. And we should probably stick with no_sanitize_address for consistency with gcc and to prevent false positive reports.

diff --git a/runtime/caml/misc.h b/runtime/caml/misc.h
index f0f131ac70..4b6f9c2e5b 100644
--- a/runtime/caml/misc.h
+++ b/runtime/caml/misc.h
@@ -566,7 +566,7 @@ CAMLextern int caml_snwprintf(wchar_t * buf,
 #if defined(__has_feature)
 #  if __has_feature(address_sanitizer)
 #    undef CAMLno_asan
-#    define CAMLno_asan __attribute__((disable_sanitizer_instrumentation))
+#    define CAMLno_asan __attribute__((no_sanitize_address))
 #  endif
 #else
 #  if __SANITIZE_ADDRESS__

@xavierleroy

Copy link
Copy Markdown
Contributor

OK for no_sanitize_address. It's been supported by Clang since version 8, just like no_sanitize. But again the original definition __attribute__((no_sanitize("address"))) was just fine, right? Why change it?

@fabbing

fabbing commented Jul 18, 2023

Copy link
Copy Markdown
Contributor

I agree, no_sanitize("address") and no_sanitize_address are synonymous, so no_sanitize("address") was fine.
I couldn't find the reason for this change while digging through our commit history...

@OlivierNicole

OlivierNicole commented Jul 18, 2023

Copy link
Copy Markdown
Contributor Author

I remember making that change to CAMLno_tsan because it disabled any instrumentation (which is more efficient), whereas no_sanitize("thread") might not.

This is not the same as __attribute__((no_sanitize(...))), which depending on the tool may still insert instrumentation to prevent false positive reports.
Source: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#disable-sanitizer-instrumentation

And I changed CAMLno_asan too, probably for consistency. I didn’t know that it only worked on recent clangs. Apologies. We should revert it.

@edwintorok

Copy link
Copy Markdown
Contributor

I agree, no_sanitize("address") and no_sanitize_address are synonymous, so no_sanitize("address") was fine.
I couldn't find the reason for this change while digging through our commit history...

FWIW it looks like newer GCCs (like GCC14) have now learned about __has_feature, but haven't implemented all of clang's attributes, so the defined (__has_feature) == clang assumption in the header is now wrong, and CAMLno_tsan doesn't work on recent GCC.

Switching the order of ifdefs seems to make it work on both gcc and clang, although if clang also supports the no_sanitize_address and no_sanitize_thread then perhaps dropping the clang/gcc ifdef machinery would future proof this (e.g. if clang decides to emulate GCC by defining __SANITIZE_ADDRESS__).

See https://discuss.ocaml.org/t/dune-3-15-3-ocaml-5-2-data-race-false-positive/14906/4?u=edwin for details, I'll probably open a PR to fix CAMLno_tsan (and asan) on GCC once I've done some more testing on various compiler versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.