-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve backtrace abstractions inside runtime #12383
Conversation
86a961a
to
10f435b
Compare
I tried to review this PR quickly and gave up / timed out because there are many changes together: there are various code style changes/improvements all over the place (which make for a relatively large diff), plus some actual API changes. Here is what I understood by looking at the current PR and also #12379:
My gut feeling for how this PR could be made easier to review:
|
I will refactor this branch into a few commits as you suggest. Although it is true that in #12379 I change the strategy of storing the backtrace for an allocation callback (from a shared-heap allocation to a C heap allocation), that's largely independent of this PR, which is about adding two pieces of functionality not present on trunk but needed for statmemprof:
In adding functionality (1) I chose to also refactor the code at the heart of backtracing, so that it is shared between the new function and the two existing use cases (a) and (b), and so that it avoids the ugly and potentially slow double stack iteration (in the traditional way, using a dynamic size-doubling array). This is similar to the way it was done for statmemprof in 4.12 by @stedolan, but some additional complications in multicore (such as fibers) mean that code doesn't port directly across (and result in the complex "candy-machine" interface in this PR). The implementation of (2) is a direct steal from 4.14 (although I couldn't do it with a cherry-pick as too much else has changed). 4.14 does (1) using a single large backtrace buffer in As I say, I'll restructure this PR into separate commits for these distinct items (and maybe lose some of the minor stylistic changes on the way). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I wrote some comments while looking at the code, but forgot to commit/publish them, sorry.)
That sounds good. I'm not asking for the code to be exactly similar to 4.14 (it's not like it was perfect in the first place), but I was planning to review actual behavior changes by doing a three-way compare between 4.14, trunk and your PR: it helps if they are clearly separated and, when you did things differently from 4.14, you can briefly mention why (in the PR discussion, this needs not be a code comment). |
(I'm generally supportive of changes that add comments and clarify the codebase in a way that makes it easier for future people to understand. It's best if they are clearly marked as such and separated from the rest, because it is a very different review mindset from behavior changes. Then some things such as whether we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have previously reviewed the "trial PR" proposing these changes (NickBarnes#1), and I think this refactoring looks good.
I have since been informed that it breaks compilation with --enable-tsan
, so I propose these minor changes to make it compatible:
diff --git a/runtime/backtrace_nat.c b/runtime/backtrace_nat.c
index 6a376ffcab..62b0c91c43 100644
--- a/runtime/backtrace_nat.c
[fabrice@t15p ocaml]$ git diff > diff.txt
[fabrice@t15p ocaml]$ cat diff.txt
diff --git a/runtime/backtrace_nat.c b/runtime/backtrace_nat.c
index 6a376ffcab..62b0c91c43 100644
--- a/runtime/backtrace_nat.c
+++ b/runtime/backtrace_nat.c
@@ -34,7 +34,7 @@
/* Returns the next frame descriptor (or NULL if none is available),
and updates *pc and *sp to point to the following one. */
-static frame_descr *next_frame_descriptor
+frame_descr *next_frame_descriptor
(caml_frame_descrs fds, uintnat *pc, char **sp, struct stack_info *stack)
{
frame_descr *d;
diff --git a/runtime/caml/frame_descriptors.h b/runtime/caml/frame_descriptors.h
index 71142a5550..4d10b54263 100644
--- a/runtime/caml/frame_descriptors.h
+++ b/runtime/caml/frame_descriptors.h
@@ -149,7 +149,7 @@ caml_frame_descrs caml_get_frame_descrs(void);
frame_descr* caml_find_frame_descr(caml_frame_descrs fds, uintnat pc);
-frame_descr * caml_next_frame_descriptor
+frame_descr * next_frame_descriptor
(caml_frame_descrs fds, uintnat * pc, char ** sp, struct stack_info* stack);
#endif /* CAML_INTERNALS */
diff --git a/runtime/tsan.c b/runtime/tsan.c
index 348d761b99..0e10d63ed3 100644
--- a/runtime/tsan.c
+++ b/runtime/tsan.c
@@ -130,7 +130,7 @@ void caml_tsan_exit_on_raise(uintnat pc, char* sp, char* trapsp)
/* iterate on each frame */
while (1) {
- frame_descr* descr = caml_next_frame_descriptor(fds, &next_pc, &sp,
+ frame_descr* descr = next_frame_descriptor(fds, &next_pc, &sp,
domain_state->current_stack);
if (descr == NULL) {
@@ -214,7 +214,7 @@ void caml_tsan_exit_on_perform(uintnat pc, char* sp)
/* iterate on each frame */
while (1) {
- frame_descr* descr = caml_next_frame_descriptor(fds, &next_pc, &sp, stack);
+ frame_descr* descr = next_frame_descriptor(fds, &next_pc, &sp, stack);
caml_tsan_debug_log_pc("forced__tsan_func_exit for", pc);
__tsan_func_exit(NULL);
@@ -243,7 +243,7 @@ CAMLreally_no_tsan void caml_tsan_entry_on_resume(uintnat pc, char* sp,
caml_frame_descrs fds = caml_get_frame_descrs();
uintnat next_pc = pc;
- caml_next_frame_descriptor(fds, &next_pc, &sp, (struct stack_info*)stack);
+ next_frame_descriptor(fds, &next_pc, &sp, (struct stack_info*)stack);
if (next_pc == 0) {
stack = stack->handler->parent;
if (!stack) {
10f435b
to
a5c4de9
Compare
I've redone this whole PR, breaking it into a few separate commits with distinct effects. I've dropped the candy-machine version of |
Building the compiler with ThreadSanitizer and running the testsuite caused too many reports in OCaml 5 and was disabled (see ocaml#11040). Since then, the work on TSan support for OCaml programs has led to fix a number of those data races and temporarily silence the ones that are waiting to be investigated (see ocaml#11040 again). As a result, running the testsuite with `--enable-tsan` is now a cheap and effective way of detecting new data races that may be introduced in the runtime . A second good reason to restore the TSan CI is that it will detect early if a recent change has accidentally broken TSan instrumentation (as has happened before as an accidental consequence of removing a symbol ocaml#12383 (review)), or other issues (e.g. a new test revealed a TSan limitation with signals ocaml#12561 (comment)). Adding this test to the Github Actions CI arguably lengthens the runs a bit much (a GHA run on amd64 Linux takes about 50 minutes). A good compromise seems to be the Inria CI which is run on every merge.
…caused too many reports in OCaml 5 and was disabled (see ocaml#11040). Since then, the work on TSan support for OCaml programs has led to fix a number of those data races and temporarily silence the ones that are waiting to be investigated (see ocaml#11040 again). As a result, running the testsuite with `--enable-tsan` is now a cheap and effective way of detecting new data races that may be introduced in the runtime. A second good reason to restore the TSan CI is that it will detect early if a recent change has accidentally broken TSan instrumentation (as has happened before as an accidental consequence of removing a symbol ocaml#12383 (review)), or other issues (e.g. a new test revealed a TSan limitation with signals ocaml#12561 (comment)). Adding this test to the Github Actions CI arguably lengthens the runs (a GHA run on amd64 Linux with TSan takes about 50 minutes). This PR therefore suggests the compromise of enabling it on the Inria CI which is run on every merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get all the details, but the new changes look good to me... and also, TSan still compiles!
…caused too many reports in OCaml 5 and was disabled (see ocaml#11040). Since then, the work on TSan support for OCaml programs has led to fix a number of those data races and temporarily silence the ones that are waiting to be investigated (see ocaml#11040 again). As a result, running the testsuite with `--enable-tsan` is now a cheap and effective way of detecting new data races that may be introduced in the runtime. A second good reason to restore the TSan CI is that it will detect early if a recent change has accidentally broken TSan instrumentation (as has happened before as an accidental consequence of removing a symbol ocaml#12383 (review)), or other issues (e.g. a new test revealed a TSan limitation with signals ocaml#12561 (comment)). Adding this test to the Github Actions CI arguably lengthens the runs (a GHA run on amd64 Linux with TSan takes about 50 minutes). This PR therefore suggests the compromise of enabling it on the Inria CI which is run on every merge.
9686aa7
to
9a019bb
Compare
I've rebased this, and it's approved by @fabbing. Can we merge it? |
* Building the compiler with ThreadSanitizer and running the testsuite caused too many reports in OCaml 5 and was disabled (see #11040). Since then, the work on TSan support for OCaml programs has led to fix a number of those data races and temporarily silence the ones that are waiting to be investigated (see #11040 again). As a result, running the testsuite with `--enable-tsan` is now a cheap and effective way of detecting new data races that may be introduced in the runtime. A second good reason to restore the TSan CI is that it will detect early if a recent change has accidentally broken TSan instrumentation (as has happened before as an accidental consequence of removing a symbol #12383 (review)), or other issues (e.g. a new test revealed a TSan limitation with signals #12561 (comment)). Adding this test to the Github Actions CI arguably lengthens the runs (a GHA run on amd64 Linux with TSan takes about 50 minutes). This PR therefore suggests the compromise of enabling it on the Inria CI which is run on every merge. * Disable tests parallel/catch_break with tsan * CI sanitizers: Use clang 14 clang 13 thread sanitizer produces different, less precise traces. Also, clang 14 is the default version in Ubuntu 22.04 LTS. --------- Co-authored-by: Xavier Leroy <xavier.leroy@college-de-france.fr>
c14cd70
to
d4290e9
Compare
rebased. |
comballoc allocations.
iterate over the stack once instead of twice, dynamically reallocating the backtrace as required. Allow it to take an existing backtrace buffer and allocated size in arguments. Together, this will allow statmemprof to collect backtraces efficiently and reuse the same buffer for multiple backtraces without statically allocating a large buffer per domain. Also tidied up some variable names for consistency ("backtrace" and "slots", not "frames" - the distinction is important for the native backend; changed the bytecode backend for consistency).
runtimes. This may be used by statmemprof to efficiently capture stacks at allocation points.
b753991
to
1ab959d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @NickBarnes
In the trunk runtime, the backtrace API allows backtraces to be obtained either as (a) a single per-domain buffer, reserved for the "backtrace at the last exception raise", or (b) an object on the Caml heap, reflecting the current backtrace. This is insufficient for statmemprof, which needs the current backtrace at arbitrary allocation points, when Caml heap allocation might not be possible (or might change the backtrace, by garbage collection). So this branch changes the
backtrace.h
abstraction by addingcaml_get_callstack()
.I took the opportunity to also improve the code which actually iterates over the stack gathering backtrace entries—in each runtime (native-code and bytecode) there's now a single piece of code which does this, and it only iterates over the stack once. This key function (
get_callstack()
in bothbacktrace_nat.c
andbacktrace_byt.c
) has a rather ugly ("candy machine") interface, because it turns out that the semantics of this are pretty twisty, depending for instance on whether the callstack should trace up into parent stack fibers or just to the current fiber, which depends (in the native-code runtime) on exactly why we want a backtrace. I'm not sure the current semantics are all that great but I believe I've preserved them.I was tempted to improve the abstraction further, by making a new type (
callstack_t
) of C-heap-allocated backtrace objects which have slots for some metadata such as frame count (number of live entries) and size (for re-sizing), and support a small number of operations. That would allow this code to be further improved, removing some duplicated code in each runtime and also moving some source code frombacktrace_nat.c
andbacktrace_byt.c
intobacktrace.c
. However, I've deferred that for now.This is one of a number of small PRs working towards restoring statmemprof on trunk. The large (unmergeable) PR #12379 shows where these are heading.