Better call stacks when a C call is involved in byte code mode. #8641

jhjourdan · 2019-04-24T12:53:24Z

The previous mechanism only worked in the case the C call in question raises an exception. This mechanism actually adds the PC in the interpreter stack, so that the backtrace mechanism always sees it.

The result is that the callstack includes the PC which actually performed the C call, contrarily to the previous behavior, where this PC was ignored. I would argue that this is the right behavior, there is no reason the top frame should be ignored. However, if an external is declared in the .ml file and exposed in the .mli file as a val, then ocamlc generates a non-localized wrapper, which adds a spurious entry in the stack frame. In this PR, this change in behavior results in the re-declaration of Printexc.get_callstack as an external instead of a val, so that the spurious stack frame does not appear in callstacks obtained from Printexc.get_callstack.

@xavierleroy This changes Setup_for_c_call/Restore_after_c_call, which, given the git history, you are the only expert of. I am far from sure that changing this does not have impact on some other part of the runtime, such as, e.g., the debugger. Also, I removed the following two lines in the interpreter:

if (pc != NULL) pc += 2;
    /* +2 adjustment for the sole purpose of backtraces */

I do not understand their purpose, and it seems backtraces are working without them. But I am not sure I did not actually break anything.

jhjourdan · 2019-04-24T20:15:49Z

I actually just found that there is another issue related with that one: in bytecode mode, when exceptions are thrown in a callback across a C stack frame, backtraces usually only contain the part of the stack which is less recent than the C call, which is pointless (we cannot even know where the exception has been thrown !).

I have some idea about how to fix this, but this may require other changes to the Setup_for_xxx/Restore_after_xxx macros. As explained above, I am not sure whether this will potentially break anything. In particular, this comment in interp.c is particularly enigmatic to me:

/* An event frame must look like accu + a C_CALL frame + a RETURN 1 frame */

Why must it look like that? Is that related to the debugger? I think even if this PR does not get merged, some more comment/refactoring would be welcome to make this part of the interpreter easier to read.

xavierleroy

Looks globally good to me. Two suggestions and one question below.

runtime/interp.c

stdlib/printexc.mli

xavierleroy · 2019-04-26T16:37:31Z

Mystery number 1: the "+2 adjustment for the sole purpose of backtraces" is partially explained in my comment below. The idea is that the PC stored in the backtrace should better point to "after" the C_CALL instruction, to help event_for_location find the corresponding debug event. Depending on the C_CALL instruction used, the next instruction is at +1 or +2. But there is some tolerance built in event_for_location that might explain why +2 was always good.

xavierleroy · 2019-04-26T16:42:08Z

Mystery number 2: the reason why "An event frame must look like accu + a C_CALL frame + a RETURN 1 frame" was the now-defunct VM thread library. A context switch could occur either cooperatively, as a result of calling a "yield" primitive, or preemptively, as a result of receiving a timeout signal. Stacks for suspended threads should have the same shape in these two cases, so that they can be restarted as if the "yield" primitive returned.

Now that VM threads are gone, I actually don't know what the minimal requirements are for Setup_for_event and Restore_after_event. Why don't you keep the existing code? Add a comment "for VM threads purposes" if you want.

jhjourdan · 2019-05-16T16:50:58Z

@xavierleroy, thanks for answering my questions!

I added a few comments to demystify the part of the code that I found hard to understand, updated Changes, rebased/squashed my commits on top of trunk, and fixed the related statmemprof tests.

Anything left to do before merging?

jhjourdan · 2019-05-28T12:39:29Z

I rebased on top of trunk.

Are we waiting for something before merging?

gasche · 2019-09-03T12:14:45Z

@xavierleroy gentle ping: I'm not sure who else would be qualified to review/approve of the PR (I just tried with Damien).

xavierleroy

This code is tough :-) but I read it again and it makes sense to me. I hope we have enough tests to exercise all relevant code paths.

jhjourdan · 2019-09-23T15:42:26Z

I fixed the statmemprof intern.ml test. CI should pass now.

By the way, this uncovered another (independent) issue: no debugging information is inserted when a C call is placed in tail-call position, even though it does correspond to a stack slot. This could be the subject of another PR on bytegen.ml.

jhjourdan · 2019-10-11T14:22:14Z

Wow. This is now getting weird. A test fails because of issues with line ending in Windows.

I have done almost no change in this PR for this test, so I am very surprised.

@dra27, is this a known issue? Is there some generic fix I can apply?

The previous mechanism only worked in the case the C call in question raises an exception.

dra27 · 2019-10-12T13:37:37Z

I can't reproduce the AppVeyor failure, but it might warrant some more thought, as it's nothing to do with line-endings (ocamltest compares the files in a line-ending agnostic way, but the display in does using diff - I propose adding --strip-trailing-cr in #8983).

So the actual problem is this:

@@ -7,6 +7,7 @@
 check_distrib 100000 10 0.900000
 check_callstack
 Raised by primitive operation at file "lists_in_minor.ml", line 14, characters 11-33
+Called from unknown location
 Called from file "lists_in_minor.ml", line 69, characters 2-26
 Called from file "lists_in_minor.ml", line 76, characters 2-20
 OK !

which might be a bit more serious?

jhjourdan · 2019-10-12T14:44:05Z

@dra27, I guess you used the wrong version. I already fixed the issue you are speaking about in bc11373. This was just a matter of updating the reference file.

The issue CI had in bc11373 was truly just a line ending issue.

Anyway, I just rebased on top of trunk and apparently CI is passing. So I guess this is just an Eisenbug which should be fixed by #8983. So now this is ready to merge, @xavierleroy, except if you have any other objection.

dra27 · 2019-10-12T16:22:08Z

@jhjourdan - I didn't use any wrong version, I'm looking at the AppVeyor log - see L4852 of the log for merging bc11373, which has that extra line?

xavierleroy · 2019-10-12T16:22:34Z

I ran a round of "precheck" on Inria's CI, just to make sure. No problems reported.

jhjourdan · 2019-10-12T22:10:56Z

@dra27 You're right, there is indeed an extra line in this CI log, even though it should not appear. I have no idea of what happened, this is very weird. I tried to execute the program many times on my PC, and resetting statmemprof's seed. I never reproduced the issue. I hope this is not some non-deterministic bug that will appear once every one thousand of executions... We will see...

dra27 · 2019-10-12T22:18:53Z

I’ve left a machine cycling that one test on mingw32... 1782 successful runs so far, so hopefully that AppVeyor log is just an unlucky fluke!

xavierleroy · 2019-10-20T08:28:59Z

I ran into another wrong backtrace in bytecode. Below, we have one extra entry in the backtrace (line 51 of thread.ml) corresponding to the call to Thread.preempt that implements signal-based preemption. So, it looks like signal handling leaves stuff on the bytecode interpreter stack, stuff that is picked up when building a backtrace.

Here is the log:

Running tests from 'tests/backtrace' ...
[...]
 ... testing 'callstack.ml' with 1.1.2 (bytecode) => failed (program output /home/barsac/ci/builds/workspace/extra-checks/testsuite/tests/backtrace/_ocamltest/tests/backtrace/callstack/ocamlc.byte/callstack.byte.output differs from reference /home/barsac/ci/builds/workspace/extra-checks/testsuite/tests/backtrace/callstack.reference: 
--- /home/barsac/ci/builds/workspace/extra-checks/testsuite/tests/backtrace/callstack.reference	2019-10-12 18:26:12.625956918 +0200
+++ /home/barsac/ci/builds/workspace/extra-checks/testsuite/tests/backtrace/_ocamltest/tests/backtrace/callstack/ocamlc.byte/callstack.byte.output	2019-10-19 22:42:55.968463798 +0200
@@ -11,4 +11,5 @@
 Called from file "callstack.ml", line 15, characters 27-32
 Called from file "thread.ml", line 39, characters 8-14
 Raised by primitive operation at file "callstack.ml", line 12, characters 38-66
+Called from file "thread.ml", line 51, characters 21-28
 Called from file "callstack.ml", line 23, characters 2-18
)

This comes from "extra-checks" CI, using clang-6.0 -fsanitize=thread.

jhjourdan · 2019-10-21T12:03:03Z

I ran into another wrong backtrace in bytecode.

See #9063. I don't think this is a bug in the runtime, but rather in the test itself.

… of that file. (#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: #8641 (comment)

… of that file. (ocaml#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: ocaml#8641 (comment) (cherry picked from commit 1b17cc4)

Fix tests/backtrace/callstack.ml by changing the order of the content of that file. (ocaml#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: ocaml#8641 (comment)

jhjourdan force-pushed the bytecode_c_call_backtrace branch from 21a124f to 5205384 Compare April 24, 2019 12:57

stedolan self-assigned this Apr 25, 2019

xavierleroy reviewed Apr 26, 2019

View reviewed changes

runtime/interp.c Outdated Show resolved Hide resolved

runtime/interp.c Outdated Show resolved Hide resolved

stdlib/printexc.mli Show resolved Hide resolved

stedolan mentioned this pull request May 3, 2019

Statistical memory profiling, part 1: blocks allocated in the major heap #8634

Merged

jhjourdan force-pushed the bytecode_c_call_backtrace branch 2 times, most recently from aae531a to 7d6523e Compare May 9, 2019 13:14

jhjourdan force-pushed the bytecode_c_call_backtrace branch from 7d6523e to 5507be3 Compare May 16, 2019 16:34

jhjourdan force-pushed the bytecode_c_call_backtrace branch 3 times, most recently from cc79753 to aa98088 Compare May 20, 2019 13:40

jhjourdan force-pushed the bytecode_c_call_backtrace branch from aa98088 to 0f392ba Compare May 28, 2019 12:37

jhjourdan force-pushed the bytecode_c_call_backtrace branch from 0f392ba to 20dbc7c Compare June 6, 2019 14:28

jhjourdan force-pushed the bytecode_c_call_backtrace branch from 20dbc7c to cb36edc Compare August 27, 2019 18:00

jhjourdan force-pushed the bytecode_c_call_backtrace branch from cb36edc to 2f768c8 Compare September 5, 2019 15:04

xavierleroy approved these changes Sep 23, 2019

View reviewed changes

jhjourdan force-pushed the bytecode_c_call_backtrace branch from 2f768c8 to bc11373 Compare September 23, 2019 15:37

Better call stacks when a C call is involved in byte code mode.

fed828b

The previous mechanism only worked in the case the C call in question raises an exception.

jhjourdan force-pushed the bytecode_c_call_backtrace branch from bc11373 to fed828b Compare October 12, 2019 11:35

xavierleroy merged commit 23e5bfa into ocaml:trunk Oct 12, 2019

jhjourdan deleted the bytecode_c_call_backtrace branch October 21, 2019 12:00

jhjourdan mentioned this pull request Oct 21, 2019

Fix tests/backtrace/callstack.ml by changing the order of the content of that file. #9063

Merged

jhjourdan mentioned this pull request Jan 24, 2020

Memprof support for native allocations #9230

Merged

stedolan mentioned this pull request Jan 28, 2020

Fix bytecode backtrace generation when large integers are present #9268

Merged

ctk21 mentioned this pull request Jun 24, 2020

Implementation of trunk PR8641 for multicore ocaml-multicore/ocaml-multicore#363

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better call stacks when a C call is involved in byte code mode. #8641

Better call stacks when a C call is involved in byte code mode. #8641

jhjourdan commented Apr 24, 2019 •

edited

jhjourdan commented Apr 24, 2019

xavierleroy left a comment

xavierleroy commented Apr 26, 2019

xavierleroy commented Apr 26, 2019

jhjourdan commented May 16, 2019

jhjourdan commented May 28, 2019

gasche commented Sep 3, 2019

xavierleroy left a comment

jhjourdan commented Sep 23, 2019

jhjourdan commented Oct 11, 2019

dra27 commented Oct 12, 2019 •

edited

jhjourdan commented Oct 12, 2019

dra27 commented Oct 12, 2019

xavierleroy commented Oct 12, 2019

jhjourdan commented Oct 12, 2019

dra27 commented Oct 12, 2019

xavierleroy commented Oct 20, 2019

jhjourdan commented Oct 21, 2019

Better call stacks when a C call is involved in byte code mode. #8641

Better call stacks when a C call is involved in byte code mode. #8641

Conversation

jhjourdan commented Apr 24, 2019 • edited

jhjourdan commented Apr 24, 2019

xavierleroy left a comment

Choose a reason for hiding this comment

xavierleroy commented Apr 26, 2019

xavierleroy commented Apr 26, 2019

jhjourdan commented May 16, 2019

jhjourdan commented May 28, 2019

gasche commented Sep 3, 2019

xavierleroy left a comment

Choose a reason for hiding this comment

jhjourdan commented Sep 23, 2019

jhjourdan commented Oct 11, 2019

dra27 commented Oct 12, 2019 • edited

jhjourdan commented Oct 12, 2019

dra27 commented Oct 12, 2019

xavierleroy commented Oct 12, 2019

jhjourdan commented Oct 12, 2019

dra27 commented Oct 12, 2019

xavierleroy commented Oct 20, 2019

jhjourdan commented Oct 21, 2019

jhjourdan commented Apr 24, 2019 •

edited

dra27 commented Oct 12, 2019 •

edited