Add manual chapter about ThreadSanitizer support #12802

OlivierNicole · 2023-12-01T17:02:01Z

This adds documentation of the compiler support for ThreadSanitizer, its usage, what can be expected in terms of correctness, the guidelines to follow when linking with external code, and a few caveats to be aware of.

It attempts to be complete (documenting some facts that the sparse documentation available about TSan does not specify) and to give a few examples.

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>

manual/src/cmds/tsan.etex

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>

shindere · 2023-12-01T17:38:36Z

I find the work really great, thanks! This sentence may be wrong: Suppressing data races is useful for intentional races or libraries that cannot be fixed Shouldn't it say something like suppressing reports, rather than suppressing data rraces?

shindere · 2023-12-01T17:42:23Z

When you say C code interacting with OCaml should always be built through the \texttt{ocamlc} or \texttt{ocamlopt} commands, I'd suggest to refer only to ocamlopt because since only native programs can be instrumented, it makes no sense to me to mention ocamlc here. Especially givne that, hopefully, only ocamlopt will use the TSan related C flags.

shindere · 2023-12-01T17:44:41Z

Finally, IIUC the history size is related to what is mentionned earlier in the chapter about TSan remembering only the 4 last accesses, but hte documentation does not make that link clear. Perhaps it could?

fabbing

(Reviewed synchronously with @OlivierNicole)

gasche

There is no discussion of the performance overhead at all in this chapter, I think there should be one. What I expect as a reader is to get a ballpark of performance overhead: is this thing designed to make my program 20% slower, or 5x, or 40x? You could tell me about an actual overhead observed on a real program, also maybe an interval of slowdown observed in extremal (but still representative of real program) cases, and say a word on what sort of workloads get a smaller or larger overhead.
What I vaguely remember from our discussions is "typically 3-5x slower, but there is no overhead if you don't use mutable state and it can be up to 40x slower if you use it a lot". Something like that, but with numbers that I didn't just make up, would be useful to guide people decisions on when and how to use this feature.

Relatedly: what is the recommended development workflow for users? Is the idea that they typically do all their multicore hacking in a TSan-enabled switch, or that they typically program in a non-TSan switch and then occasionally move to a TSan switch for debugging?

gasche · 2023-12-02T14:32:28Z

manual/src/cmds/tsan.etex

+\end{verbatim}
+
+Increasing the history size can sometimes be necessary to obtain the second
+stack trace.


(Is there a performance cost to doing that?)

Yes, I added a mention to that effect in cbe7759.

gasche · 2023-12-02T14:44:54Z

Other questions:

If I build a program with a TSan-enabled compiler, is it possible to run it without any alarms at all? Is there a way to do this that reduces the performance overhead to something very small?
We found that it was useful in the runtime codebase to systematically add a comment on each CAMLno_tsan function to explain why instrumentation is disabled. Maybe you could suggest this as well? I would be interested also in a short explanation of why one may want to disable instrumentation for certain functions. (I guess: "to reduce the performance overhead in certain corner cases" and "to silence alarms about real but already-known issues to avoid hiding new alarms in the noise"; is "benine races" ever a valid reason or should people use relaxed atomics instead?)
Is there a way to disable instrumentation in the OCaml code? (An attribute?) If not, I think it would be useful to say so explicitly.

OlivierNicole · 2023-12-04T14:52:28Z

Thanks to all the reviewers for their reviews. I pushed cbe7759 to take your comments into account.

Finally, IIUC the history size is related to what is mentionned earlier
in the chapter about TSan remembering only the 4 last accesses, but hte
documentation does not make that link clear. Perhaps it could?

In fact, these are two different things. I tried to clarify this in the paragraph discussing history_size.

I added a section about performance implications.

Relatedly: what is the recommended development workflow for users? Is the idea that they typically do all their multicore hacking in a TSan-enabled switch, or that they typically program in a non-TSan switch and then occasionally move to a TSan switch for debugging?

From what I observed, it depends on how much performance matters during development, but a typical workflow is to develop on a regular switch and periodically run the test suite in a TSan-enabled switch.

If I build a program with a TSan-enabled compiler, is it possible to run it without any alarms at all? Is there a way to do this that reduces the performance overhead to something very small?

TSAN_OPTIONS=report_bugs=0 turns off alarms entirely, but there is no way to reduce the performance overhead.

We found that it was useful in the runtime codebase to systematically add a comment on each CAMLno_tsan function to explain why instrumentation is disabled. Maybe you could suggest this as well? I would be interested also in a short explanation of why one may want to disable instrumentation for certain functions. (I guess: "to reduce the performance overhead in certain corner cases" and "to silence alarms about real but already-known issues to avoid hiding new alarms in the noise";

I try to explain this in the new version.

is "benine races" ever a valid reason or should people use relaxed atomics instead?)

People should use relaxed atomics unless they are comfortable with UB.

manual/src/cmds/tsan.etex

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>

Co-authored-by: Antonin Décimo <antonin@tarides.com>

manual/src/cmds/tsan.etex

kayceesrk · 2023-12-06T05:54:29Z

manual/src/cmds/tsan.etex

+OCaml, since version 5.0, allows shared-memory parallelism and thus mutation of
+shared data. This creates the possibility of data races, i.e., unordered
+accesses to the same data (one of them being a write). Data races can be
+dangerous as they are easy to miss and capable of causing unexpected results.


I don't like the use of the word "dangerous". Data races are "dangerous" in C/C++ because the semantics is undefined. In OCaml, data races only lead to more non-determinism than found under SC. Without data races, the program will have SC semantics but still have non-determinism due to thread interleaving. Would you call this non-determinism "dangerous"?

How about, "In OCaml, data races are easy to introduce, and the behaviour of programs with data races can be unintuitive -- the observed behaviours cannot be explained by simply interleaving operations from different concurrent threads".

I have a different take on this.

Currently most non-persistent data structures in OCaml are not thread-safe or domain-safe, so using them in a multithreaded context is a bug that is very likely to (silent) incorrect program behavior. Embedding protections against this in the data-structure themselves -- so that it fails loudly on such incorrect usage -- is not easy and not standard practice. TSan is currently our best chance of detecting when we mistakenly do this.

So I would treat a data race as "dangerous" because it very likely indicates that a part of the program is used outside its specification, in the "unspecified behavior" or even possibly "a segfault because people used unsafe programming patterns that were fine for OCaml 4 and not correct for Multicore" (as we have seen in Buffer, Dynarray, etc.).

If I understand correctly, your mental model is that benign races on non-atomic locations are fine, because they are not UB. I have a stricter mental model in mind where they signal a programming bug (despite preserving safety) and should be avoided. A counterpoint is that we don't have relaxed atomics to express benign races -- I think that we could and probably should have something like that, some way for programmers to express their intent to perform benign races. But benign races are very rare anyway, probably almost non-existent in idiomatic high-level code.

I agree in part with both of you. Data races produce unintuitive behaviour, and that unintuitive behaviour can definitely be dangerous if it is unexpected. That being said,

"In OCaml, data races are easy to introduce, and the behaviour of programs with data races can be unintuitive -- the observed behaviours cannot be explained by simply interleaving operations from different concurrent threads".

looks fine to me.

A counterpoint is that we don't have relaxed atomics to express benign races -- I think that we could and probably should have something like that, some way for programmers to express their intent to perform benign races.

I disagree on this point, as to me non-atomic accesses in OCaml already fill that role, as I have expressed in a previous discussion. A non-atomic access, properly commented and marked as [@no_tsan]—if one day we introduce that attribute—is what I’d want to use.

That being said,

"In OCaml, data races are easy to introduce, and the behaviour of programs with data races can be unintuitive -- the observed behaviours cannot be explained by simply interleaving operations from different concurrent threads".

looks fine to me.

I use this phrasing in the new version, let me know if this works for everyone!

I agree with the first 2 paragraphs, but that does not mean "data races are dangerous". This phrase has a particular connotation for C/C++ (and Rust) programmers, where having a data race is a catastrophic bug by itself -- your program has undefined semantics. This is not the case with OCaml. However, as you suggest, the presence of a data race indicates that a part of the program is likely used outside of its specification. Here data race itself is not at fault, but is an indication of issue elsewhere.

The question is whether we need to highlight this distinction here. I feel that we should. TSan is a dynamic data race detector. I don't want the reader to take away that "Data races are dangerous and OCaml TSan does not rule out all data races. So parallel OCaml programs in practice are likely to have data races. So most parallel OCaml program have undefined semantics due to data races."

We've worked hard to establish a result where we provide well-defined semantics for programs with data races. Calling "data races are dangerous" seems to throw this away entirely.

manual/src/cmds/tsan.etex

kayceesrk

I've left a number of comments. Ok to merge when they're addressed.

OlivierNicole · 2023-12-06T11:50:49Z

Thank you very much for your thorough review, I believe I have addressed it in my last commits.

Add manual page about ThreadSanitizer support

2bdfb02

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>

dustanddreams approved these changes Dec 1, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

Improve wording

6aea866

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>

OlivierNicole changed the title ~~Add manual page about ThreadSanitizer support~~ Add manual chapter about ThreadSanitizer support Dec 1, 2023

kayceesrk added the thread-sanitizer Related to data races, thread sanitizer label Dec 2, 2023

fabbing approved these changes Dec 2, 2023

View reviewed changes

gasche reviewed Dec 2, 2023

View reviewed changes

Address comments from reviewers

cbe7759

dustanddreams reviewed Dec 4, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

OlivierNicole and others added 2 commits December 4, 2023 16:11

Fix typo

2e4ecb3

Co-authored-by: Miod Vallat <118974489+dustanddreams@users.noreply.github.com>

Fix typo

515913e

Co-authored-by: Antonin Décimo <antonin@tarides.com>

OlivierNicole force-pushed the tsan_manual branch from 5848113 to 515913e Compare December 5, 2023 13:50

Add Changes entry

6093e1d

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Outdated Show resolved Hide resolved

kayceesrk reviewed Dec 6, 2023

View reviewed changes

manual/src/cmds/tsan.etex Show resolved Hide resolved

kayceesrk approved these changes Dec 6, 2023

View reviewed changes

OlivierNicole added 2 commits December 6, 2023 12:42

Address review by @kayceesrk

7d7f47e

Update Changes

4993e1e

gasche merged commit f8d4eb2 into ocaml:trunk Dec 6, 2023
9 checks passed

OlivierNicole deleted the tsan_manual branch December 6, 2023 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual chapter about ThreadSanitizer support #12802

Add manual chapter about ThreadSanitizer support #12802

OlivierNicole commented Dec 1, 2023 •

edited

shindere commented Dec 1, 2023 via email

shindere commented Dec 1, 2023 via email

shindere commented Dec 1, 2023 via email

fabbing left a comment

gasche left a comment •

edited

gasche Dec 2, 2023

OlivierNicole Dec 4, 2023

gasche commented Dec 2, 2023

OlivierNicole commented Dec 4, 2023

kayceesrk Dec 6, 2023

gasche Dec 6, 2023

OlivierNicole Dec 6, 2023

OlivierNicole Dec 6, 2023

kayceesrk Dec 6, 2023 •

edited

kayceesrk left a comment

OlivierNicole commented Dec 6, 2023

Add manual chapter about ThreadSanitizer support #12802

Add manual chapter about ThreadSanitizer support #12802

Conversation

OlivierNicole commented Dec 1, 2023 • edited

shindere commented Dec 1, 2023 via email

shindere commented Dec 1, 2023 via email

shindere commented Dec 1, 2023 via email

fabbing left a comment

Choose a reason for hiding this comment

gasche left a comment • edited

Choose a reason for hiding this comment

gasche Dec 2, 2023

Choose a reason for hiding this comment

OlivierNicole Dec 4, 2023

Choose a reason for hiding this comment

gasche commented Dec 2, 2023

OlivierNicole commented Dec 4, 2023

kayceesrk Dec 6, 2023

Choose a reason for hiding this comment

gasche Dec 6, 2023

Choose a reason for hiding this comment

OlivierNicole Dec 6, 2023

Choose a reason for hiding this comment

OlivierNicole Dec 6, 2023

Choose a reason for hiding this comment

kayceesrk Dec 6, 2023 • edited

Choose a reason for hiding this comment

kayceesrk left a comment

Choose a reason for hiding this comment

OlivierNicole commented Dec 6, 2023

OlivierNicole commented Dec 1, 2023 •

edited

gasche left a comment •

edited

kayceesrk Dec 6, 2023 •

edited