Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve backtrace abstractions inside runtime #12383

Merged
merged 6 commits into from
Nov 24, 2023

Conversation

NickBarnes
Copy link
Contributor

@NickBarnes NickBarnes commented Jul 17, 2023

In the trunk runtime, the backtrace API allows backtraces to be obtained either as (a) a single per-domain buffer, reserved for the "backtrace at the last exception raise", or (b) an object on the Caml heap, reflecting the current backtrace. This is insufficient for statmemprof, which needs the current backtrace at arbitrary allocation points, when Caml heap allocation might not be possible (or might change the backtrace, by garbage collection). So this branch changes the backtrace.h abstraction by adding caml_get_callstack().

I took the opportunity to also improve the code which actually iterates over the stack gathering backtrace entries—in each runtime (native-code and bytecode) there's now a single piece of code which does this, and it only iterates over the stack once. This key function (get_callstack() in both backtrace_nat.c and backtrace_byt.c) has a rather ugly ("candy machine") interface, because it turns out that the semantics of this are pretty twisty, depending for instance on whether the callstack should trace up into parent stack fibers or just to the current fiber, which depends (in the native-code runtime) on exactly why we want a backtrace. I'm not sure the current semantics are all that great but I believe I've preserved them.

I was tempted to improve the abstraction further, by making a new type (callstack_t) of C-heap-allocated backtrace objects which have slots for some metadata such as frame count (number of live entries) and size (for re-sizing), and support a small number of operations. That would allow this code to be further improved, removing some duplicated code in each runtime and also moving some source code from backtrace_nat.c and backtrace_byt.c into backtrace.c. However, I've deferred that for now.

This is one of a number of small PRs working towards restoring statmemprof on trunk. The large (unmergeable) PR #12379 shows where these are heading.

@kayceesrk kayceesrk added the statmemprof PRs and issues related to statmemprof label Jul 18, 2023
@gasche
Copy link
Member

gasche commented Aug 21, 2023

I tried to review this PR quickly and gave up / timed out because there are many changes together: there are various code style changes/improvements all over the place (which make for a relatively large diff), plus some actual API changes.

Here is what I understood by looking at the current PR and also #12379:

  1. the 4.14 statmemprof implementation solves the problem of "sometimes you cannot call caml_alloc" by exposing a variant that calls caml_alloc_shr (allocates on the major heap, does not run a GC). Statmemprof for multicore [do not merge] #12379 uses a different strategy where the variant calls caml_stat_alloc, and then we copy the data to a caml_alloc block at the point where we decide to call the callback that was postponed. I am not sure why you are using a different strategy (is the previous strategy not valid anymore? is it related to some performance concerns? an issue with the domain calling the callback not necessarily the same as the one recording the callstack? sometehing else?)

  2. This PR contains minor refactorings but also cherry-picking of changes of 4.14 that were not applied in the Multicore branch -- for example various Slot_* macros and the alloc_idx business. Sometimes the changes are directly copied from the 4.14 branch, sometimes they are reimplemented differently.

My gut feeling for how this PR could be made easier to review:

  • no minor style changes, or minor style changes in separate commits that are clearly marked as preserving the behavior exactly (we can review them without thinking much)
  • make it easy to review the new non-minor-change code by comparing to 4.14 -- you can point out the places where changes were needed and briefly mention why

@NickBarnes
Copy link
Contributor Author

I will refactor this branch into a few commits as you suggest.

Although it is true that in #12379 I change the strategy of storing the backtrace for an allocation callback (from a shared-heap allocation to a C heap allocation), that's largely independent of this PR, which is about adding two pieces of functionality not present on trunk but needed for statmemprof:

  1. obtaining a backtrace from the current callstack which is neither (a) stored in the single (per-domain) most-recent-exception backtrace buffer (as caml_stash_backtrace does) nor (b) allocated on the Caml minor heap (as caml_get_current_callstack and caml_get_continuation_callstack do); and
  2. allowing a backtrace to include the debuginfo for an allocation point, as the slot at the hot end of the backtrace, without changing the Printexc Caml interface. This allows statmemprof to accurately profile "Comballoc" allocations.

In adding functionality (1) I chose to also refactor the code at the heart of backtracing, so that it is shared between the new function and the two existing use cases (a) and (b), and so that it avoids the ugly and potentially slow double stack iteration (in the traditional way, using a dynamic size-doubling array). This is similar to the way it was done for statmemprof in 4.12 by @stedolan, but some additional complications in multicore (such as fibers) mean that code doesn't port directly across (and result in the complex "candy-machine" interface in this PR).

The implementation of (2) is a direct steal from 4.14 (although I couldn't do it with a cherry-pick as too much else has changed).

4.14 does (1) using a single large backtrace buffer in memprof.c (called callstack_buffer). Obviously this isn't going to work for multicore, so rather than have a large per-domain static buffer I have a per-domain dynamically-grown buffer. But that's for a later PR.

As I say, I'll restructure this PR into separate commits for these distinct items (and maybe lose some of the minor stylistic changes on the way).

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wrote some comments while looking at the code, but forgot to commit/publish them, sorry.)

runtime/backtrace_nat.c Outdated Show resolved Hide resolved
runtime/backtrace_nat.c Show resolved Hide resolved
@gasche
Copy link
Member

gasche commented Aug 22, 2023

That sounds good. I'm not asking for the code to be exactly similar to 4.14 (it's not like it was perfect in the first place), but I was planning to review actual behavior changes by doing a three-way compare between 4.14, trunk and your PR: it helps if they are clearly separated and, when you did things differently from 4.14, you can briefly mention why (in the PR discussion, this needs not be a code comment).

@gasche
Copy link
Member

gasche commented Aug 22, 2023

(I'm generally supportive of changes that add comments and clarify the codebase in a way that makes it easier for future people to understand. It's best if they are clearly marked as such and separated from the rest, because it is a very different review mindset from behavior changes. Then some things such as whether we use !d or d != NULL are mostly noise from my perspective: I would support keeping the codebase consistent in this respect and I can tell that you have the C experience to make informed changes there, but I'm not terribly excited when they show up in a review diff.)

Copy link
Contributor

@fabbing fabbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have previously reviewed the "trial PR" proposing these changes (NickBarnes#1), and I think this refactoring looks good.

I have since been informed that it breaks compilation with --enable-tsan, so I propose these minor changes to make it compatible:

diff --git a/runtime/backtrace_nat.c b/runtime/backtrace_nat.c
index 6a376ffcab..62b0c91c43 100644
--- a/runtime/backtrace_nat.c
[fabrice@t15p ocaml]$ git diff > diff.txt
[fabrice@t15p ocaml]$ cat diff.txt 
diff --git a/runtime/backtrace_nat.c b/runtime/backtrace_nat.c
index 6a376ffcab..62b0c91c43 100644
--- a/runtime/backtrace_nat.c
+++ b/runtime/backtrace_nat.c
@@ -34,7 +34,7 @@
 
 /* Returns the next frame descriptor (or NULL if none is available),
    and updates *pc and *sp to point to the following one.  */
-static frame_descr *next_frame_descriptor
+frame_descr *next_frame_descriptor
     (caml_frame_descrs fds, uintnat *pc, char **sp, struct stack_info *stack)
 {
   frame_descr *d;
diff --git a/runtime/caml/frame_descriptors.h b/runtime/caml/frame_descriptors.h
index 71142a5550..4d10b54263 100644
--- a/runtime/caml/frame_descriptors.h
+++ b/runtime/caml/frame_descriptors.h
@@ -149,7 +149,7 @@ caml_frame_descrs caml_get_frame_descrs(void);
 frame_descr* caml_find_frame_descr(caml_frame_descrs fds, uintnat pc);
 
 
-frame_descr * caml_next_frame_descriptor
+frame_descr * next_frame_descriptor
     (caml_frame_descrs fds, uintnat * pc, char ** sp, struct stack_info* stack);
 
 #endif /* CAML_INTERNALS */
diff --git a/runtime/tsan.c b/runtime/tsan.c
index 348d761b99..0e10d63ed3 100644
--- a/runtime/tsan.c
+++ b/runtime/tsan.c
@@ -130,7 +130,7 @@ void caml_tsan_exit_on_raise(uintnat pc, char* sp, char* trapsp)
 
   /* iterate on each frame  */
   while (1) {
-    frame_descr* descr = caml_next_frame_descriptor(fds, &next_pc, &sp,
+    frame_descr* descr = next_frame_descriptor(fds, &next_pc, &sp,
         domain_state->current_stack);
 
     if (descr == NULL) {
@@ -214,7 +214,7 @@ void caml_tsan_exit_on_perform(uintnat pc, char* sp)
 
   /* iterate on each frame  */
   while (1) {
-    frame_descr* descr = caml_next_frame_descriptor(fds, &next_pc, &sp, stack);
+    frame_descr* descr = next_frame_descriptor(fds, &next_pc, &sp, stack);
 
     caml_tsan_debug_log_pc("forced__tsan_func_exit for", pc);
     __tsan_func_exit(NULL);
@@ -243,7 +243,7 @@ CAMLreally_no_tsan void caml_tsan_entry_on_resume(uintnat pc, char* sp,
   caml_frame_descrs fds = caml_get_frame_descrs();
   uintnat next_pc = pc;
 
-  caml_next_frame_descriptor(fds, &next_pc, &sp, (struct stack_info*)stack);
+  next_frame_descriptor(fds, &next_pc, &sp, (struct stack_info*)stack);
   if (next_pc == 0) {
     stack = stack->handler->parent;
     if (!stack) {

@NickBarnes
Copy link
Contributor Author

I've redone this whole PR, breaking it into a few separate commits with distinct effects. I've dropped the candy-machine version of get_callstack(), which was only really present to support caml_stash_callstack() - because in that instance (callstack for the most recent exception) there are several unique requirements (it doesn't necessarily start at the hot end of the stack, it doesn't necessarily start at the beginning of the returned buffer, and it doesn't trace up into parent fibers).

OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Oct 9, 2023
Building the compiler with ThreadSanitizer and running the testsuite caused too
many reports in OCaml 5 and was disabled (see ocaml#11040).

Since then, the work on TSan support for OCaml programs has led to fix a number
of those data races and temporarily silence the ones that are waiting to be
investigated (see ocaml#11040 again). As a result, running the testsuite with
`--enable-tsan` is now a cheap and effective way of detecting new data races
that may be introduced in the runtime .

A second good reason to restore the TSan CI is that it will detect early if a
recent change has accidentally broken TSan instrumentation (as has happened
before as an accidental consequence of removing a symbol
ocaml#12383 (review)), or
other issues (e.g. a new test revealed a TSan limitation with signals
ocaml#12561 (comment)).

Adding this test to the Github Actions CI arguably lengthens the runs a bit
much (a GHA run on amd64 Linux takes about 50 minutes). A good compromise seems
to be the Inria CI which is run on every merge.
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Oct 9, 2023
…caused too

many reports in OCaml 5 and was disabled (see ocaml#11040).

Since then, the work on TSan support for OCaml programs has led to fix a number
of those data races and temporarily silence the ones that are waiting to be
investigated (see ocaml#11040 again). As a result, running the testsuite with
`--enable-tsan` is now a cheap and effective way of detecting new data races
that may be introduced in the runtime.

A second good reason to restore the TSan CI is that it will detect early if a
recent change has accidentally broken TSan instrumentation (as has happened
before as an accidental consequence of removing a symbol
ocaml#12383 (review)), or
other issues (e.g. a new test revealed a TSan limitation with signals
ocaml#12561 (comment)).

Adding this test to the Github Actions CI arguably lengthens the runs (a GHA
run on amd64 Linux with TSan takes about 50 minutes). This PR therefore
suggests the compromise of enabling it on the Inria CI which is run on every
merge.
Copy link
Contributor

@fabbing fabbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get all the details, but the new changes look good to me... and also, TSan still compiles!

OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Oct 13, 2023
…caused too

many reports in OCaml 5 and was disabled (see ocaml#11040).

Since then, the work on TSan support for OCaml programs has led to fix a number
of those data races and temporarily silence the ones that are waiting to be
investigated (see ocaml#11040 again). As a result, running the testsuite with
`--enable-tsan` is now a cheap and effective way of detecting new data races
that may be introduced in the runtime.

A second good reason to restore the TSan CI is that it will detect early if a
recent change has accidentally broken TSan instrumentation (as has happened
before as an accidental consequence of removing a symbol
ocaml#12383 (review)), or
other issues (e.g. a new test revealed a TSan limitation with signals
ocaml#12561 (comment)).

Adding this test to the Github Actions CI arguably lengthens the runs (a GHA
run on amd64 Linux with TSan takes about 50 minutes). This PR therefore
suggests the compromise of enabling it on the Inria CI which is run on every
merge.
@NickBarnes
Copy link
Contributor Author

I've rebased this, and it's approved by @fabbing. Can we merge it?

xavierleroy added a commit that referenced this pull request Oct 17, 2023
* Building the compiler with ThreadSanitizer and running the testsuite caused too
many reports in OCaml 5 and was disabled (see #11040).

Since then, the work on TSan support for OCaml programs has led to fix a number
of those data races and temporarily silence the ones that are waiting to be
investigated (see #11040 again). As a result, running the testsuite with
`--enable-tsan` is now a cheap and effective way of detecting new data races
that may be introduced in the runtime.

A second good reason to restore the TSan CI is that it will detect early if a
recent change has accidentally broken TSan instrumentation (as has happened
before as an accidental consequence of removing a symbol
#12383 (review)), or
other issues (e.g. a new test revealed a TSan limitation with signals
#12561 (comment)).

Adding this test to the Github Actions CI arguably lengthens the runs (a GHA
run on amd64 Linux with TSan takes about 50 minutes). This PR therefore
suggests the compromise of enabling it on the Inria CI which is run on every
merge.

* Disable tests parallel/catch_break with tsan

* CI sanitizers: Use clang 14

clang 13 thread sanitizer produces different, less precise traces.
Also, clang 14 is the default version in Ubuntu 22.04 LTS.

---------

Co-authored-by: Xavier Leroy <xavier.leroy@college-de-france.fr>
@NickBarnes
Copy link
Contributor Author

rebased.

NickBarnes added a commit to NickBarnes/ocaml that referenced this pull request Nov 7, 2023
iterate over the stack once instead of twice, dynamically
reallocating the backtrace as required. Allow it to take an existing
backtrace buffer and allocated size in arguments. Together, this
will allow statmemprof to collect backtraces efficiently and reuse the
same buffer for multiple backtraces without statically allocating a
large buffer per domain. Also tidied up some variable names for
consistency ("backtrace" and "slots", not "frames" - the distinction
is important for the native backend; changed the bytecode backend for
consistency).
runtimes. This may be used by statmemprof to efficiently capture
stacks at allocation points.
NickBarnes added a commit to NickBarnes/ocaml that referenced this pull request Nov 24, 2023
Copy link
Contributor

@sadiqj sadiqj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @NickBarnes

@sadiqj sadiqj merged commit 70d7f4e into ocaml:trunk Nov 24, 2023
9 checks passed
NickBarnes added a commit to NickBarnes/ocaml that referenced this pull request Nov 25, 2023
NickBarnes added a commit to NickBarnes/ocaml that referenced this pull request Nov 27, 2023
NickBarnes added a commit to NickBarnes/ocaml that referenced this pull request Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
statmemprof PRs and issues related to statmemprof
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants