New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better call stacks when a C call is involved in byte code mode. #8641
Conversation
21a124f
to
5205384
Compare
I actually just found that there is another issue related with that one: in bytecode mode, when exceptions are thrown in a callback across a C stack frame, backtraces usually only contain the part of the stack which is less recent than the C call, which is pointless (we cannot even know where the exception has been thrown !). I have some idea about how to fix this, but this may require other changes to the
Why must it look like that? Is that related to the debugger? I think even if this PR does not get merged, some more comment/refactoring would be welcome to make this part of the interpreter easier to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks globally good to me. Two suggestions and one question below.
Mystery number 1: the "+2 adjustment for the sole purpose of backtraces" is partially explained in my comment below. The idea is that the PC stored in the backtrace should better point to "after" the |
Mystery number 2: the reason why "An event frame must look like accu + a C_CALL frame + a RETURN 1 frame" was the now-defunct VM thread library. A context switch could occur either cooperatively, as a result of calling a "yield" primitive, or preemptively, as a result of receiving a timeout signal. Stacks for suspended threads should have the same shape in these two cases, so that they can be restarted as if the "yield" primitive returned. Now that VM threads are gone, I actually don't know what the minimal requirements are for |
aae531a
to
7d6523e
Compare
7d6523e
to
5507be3
Compare
@xavierleroy, thanks for answering my questions! I added a few comments to demystify the part of the code that I found hard to understand, updated Changes, rebased/squashed my commits on top of trunk, and fixed the related statmemprof tests. Anything left to do before merging? |
cc79753
to
aa98088
Compare
aa98088
to
0f392ba
Compare
I rebased on top of trunk. Are we waiting for something before merging? |
0f392ba
to
20dbc7c
Compare
20dbc7c
to
cb36edc
Compare
@xavierleroy gentle ping: I'm not sure who else would be qualified to review/approve of the PR (I just tried with Damien). |
cb36edc
to
2f768c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is tough :-) but I read it again and it makes sense to me. I hope we have enough tests to exercise all relevant code paths.
2f768c8
to
bc11373
Compare
I fixed the statmemprof By the way, this uncovered another (independent) issue: no debugging information is inserted when a C call is placed in tail-call position, even though it does correspond to a stack slot. This could be the subject of another PR on |
Wow. This is now getting weird. A test fails because of issues with line ending in Windows. I have done almost no change in this PR for this test, so I am very surprised. @dra27, is this a known issue? Is there some generic fix I can apply? |
The previous mechanism only worked in the case the C call in question raises an exception.
bc11373
to
fed828b
Compare
I can't reproduce the AppVeyor failure, but it might warrant some more thought, as it's nothing to do with line-endings ( So the actual problem is this: @@ -7,6 +7,7 @@
check_distrib 100000 10 0.900000
check_callstack
Raised by primitive operation at file "lists_in_minor.ml", line 14, characters 11-33
+Called from unknown location
Called from file "lists_in_minor.ml", line 69, characters 2-26
Called from file "lists_in_minor.ml", line 76, characters 2-20
OK ! which might be a bit more serious? |
@dra27, I guess you used the wrong version. I already fixed the issue you are speaking about in bc11373. This was just a matter of updating the reference file. The issue CI had in bc11373 was truly just a line ending issue. Anyway, I just rebased on top of trunk and apparently CI is passing. So I guess this is just an Eisenbug which should be fixed by #8983. So now this is ready to merge, @xavierleroy, except if you have any other objection. |
@jhjourdan - I didn't use any wrong version, I'm looking at the AppVeyor log - see L4852 of the log for merging bc11373, which has that extra line? |
I ran a round of "precheck" on Inria's CI, just to make sure. No problems reported. |
@dra27 You're right, there is indeed an extra line in this CI log, even though it should not appear. I have no idea of what happened, this is very weird. I tried to execute the program many times on my PC, and resetting statmemprof's seed. I never reproduced the issue. I hope this is not some non-deterministic bug that will appear once every one thousand of executions... We will see... |
I’ve left a machine cycling that one test on mingw32... 1782 successful runs so far, so hopefully that AppVeyor log is just an unlucky fluke! |
I ran into another wrong backtrace in bytecode. Below, we have one extra entry in the backtrace (line 51 of thread.ml) corresponding to the call to Here is the log:
This comes from "extra-checks" CI, using |
See #9063. I don't think this is a bug in the runtime, but rather in the test itself. |
… of that file. (#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: #8641 (comment)
… of that file. (ocaml#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: ocaml#8641 (comment) (cherry picked from commit 1b17cc4)
Fix tests/backtrace/callstack.ml by changing the order of the content of that file. (ocaml#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: ocaml#8641 (comment)
Fix tests/backtrace/callstack.ml by changing the order of the content of that file. (ocaml#9063) The original test had a race condition between finalization and thread preemption. That was probably the cause for the wrong backtrace observed here: ocaml#8641 (comment)
The previous mechanism only worked in the case the C call in question raises an exception. This mechanism actually adds the PC in the interpreter stack, so that the backtrace mechanism always sees it.
The result is that the callstack includes the PC which actually performed the C call, contrarily to the previous behavior, where this PC was ignored. I would argue that this is the right behavior, there is no reason the top frame should be ignored. However, if an external is declared in the
.ml
file and exposed in the.mli
file as aval
, thenocamlc
generates a non-localized wrapper, which adds a spurious entry in the stack frame. In this PR, this change in behavior results in the re-declaration ofPrintexc.get_callstack
as anexternal
instead of aval
, so that the spurious stack frame does not appear in callstacks obtained fromPrintexc.get_callstack
.@xavierleroy This changes
Setup_for_c_call
/Restore_after_c_call
, which, given the git history, you are the only expert of. I am far from sure that changing this does not have impact on some other part of the runtime, such as, e.g., the debugger. Also, I removed the following two lines in the interpreter:I do not understand their purpose, and it seems backtraces are working without them. But I am not sure I did not actually break anything.