New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix delivery of effect-related exceptions, take 2 (#12486) #12535
Conversation
FYI, elsewhere with @dustanddreams we've refined the reproducer initially mentioned in #12486 (comment) Running with a reduced heap is sufficient to segfault reliably (5/5 runs on my local machine):
(you don't even need the seed |
I've built a local opam switch with frame pointers enabled
and can confirm that 10/10 runs of both |
@jmid can you also test with |
This is with the following
@OlivierNicole @fabbing - I am |
I gave it a try but both
|
runtime/amd64.S
Outdated
@@ -1177,7 +1178,10 @@ CFI_STARTPROC | |||
UPDATE_BASE_POINTER(%rcx) | |||
SWITCH_OCAML_STACKS | |||
jmp *(%rbx) | |||
2: TSAN_ENTER_FUNCTION(0) | |||
2: ENTER_FUNCTION | |||
TSAN_SAVE_CALLER_REGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of TSAN_SAVE/RESTORE_CALLER_REGS
here was deliberate for performance reasons: we found out taht saving too many unnecessary registers would hurt performance with TSan.
Considering that what we need to do here is almost exactly what is done in caml_c_call
prologue, and contrary to what I said over there, I suggest we go the same way you did with #12530 and rather jmp
to caml_c_call
prologue with the GCALL
label.
There will be a small performance cost, and hopefully this won't happen often, but also has the big benefit of unifying the amd64
code with s390x
.
@@ -1178,12 +1178,8 @@ CFI_STARTPROC
UPDATE_BASE_POINTER(%rcx)
SWITCH_OCAML_STACKS
jmp *(%rbx)
-2: ENTER_FUNCTION
- TSAN_SAVE_CALLER_REGS
- TSAN_ENTER_FUNCTION(0)
- TSAN_RESTORE_CALLER_REGS
- LEA_VAR(caml_raise_continuation_already_resumed, %rax)
- jmp LBL(caml_c_call)
+2: LEA_VAR(caml_raise_continuation_already_resumed, %rax)
+ jmp GCALL(caml_c_call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there used to be TSAN_ENTER_FUNCTION(0)
in this code path, I only added ENTER_FUNCTION
to set up the stack frame correctly and TSAN_{SAVE,RESTORE}_CALLER_REGS
to wrap the TSAN_ENTER_FUNCTION
call.
However I notice that caml_ml_array_bound_error
also uses TSAN_ENTER_FUNCTION
without the SAVE/RESTORE
dance.
I am pushing a new version of this code which mimics caml_ml_array_bound_error
in this case.
21fa63f
to
1f8e697
Compare
If we want to add a regression test case I've cooked up the following, which crashes my local open Effect
type _ t += Yield : unit t
let rec burn l =
if List.hd l > 12 then ()
else
burn (l @ l |> List.map (fun x -> x + 1))
let foo l =
burn l;
perform Yield
let bar i = foo [i]
let _ =
for _ = 1 to 10_000 do
try bar 8
with Unhandled _ -> ()
done
|
1f8e697
to
9618858
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the change makes the code strictly more correct, so I am approving. (It only makes a different with frame pointers enabled.)
I think it would be useful if the TSan people could leave an explicit trace in the code of the register-saving decisions: it would have made the work on this PR easier, and it may show up again in the future. (It will also be useful documentation if someone less knowledgeable tries to port the TSan runtime support to another backend.)
@dustanddreams could you add the testcase proposed by @jmid (it's a good testcase because it runs instantly) to the testsuite, for example as a new program (To run a new test file you can just run |
9618858
to
28a13ea
Compare
Sure, let me know if the file looks good to you. |
We've been able to reproduce the crash on |
@@ -0,0 +1,37 @@ | |||
(* TEST | |||
frame_pointers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want to restrict the test to only setups with frame pointers enabled? The current fix is for a bug that only occurs with frame pointers, but I assume that the test may have caught the non-frame-pointers-specific s390x issue, and could be helpful in the test harness of new backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To clarify: my gut feeling is that we do not want the frame_pointers
condition here, and I am waiting to see what @dustanddreams thinks before merging, with or without the condition.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right; although the code was only incorrect with frame pointers enabled, this was not the case on s390x. But then, should it really belong to the tests/frame-pointers
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, should it really belong to the tests/frame-pointers directory?
Honestly I don't really care, please choose the test directory that you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to effects
, then.
I agree; I intend to do it in our fix PR. |
We may have found one of the issues, the one involving the combination of |
28a13ea
to
d94dcc7
Compare
This repairs operation when the compiler is built with --enable-frame-pointers and either Effect.Unhandled or Effect.Continuation_already_resumed needs to be raised.
d94dcc7
to
c04755a
Compare
As a change for the fp and TSAN mode, it seems better to me to cherry-pick it to 5.1 (if there are no objections from reviewers) in prevision of the upcoming 5.1 release. |
That makes sense. Another fix for TSan + fp is #12561. |
TSan won't be part of new 5.1.x, will it? |
Ah, yes, it had slipped my mind that TSan won’t be in 5.1.1. What @fabbing said. |
Fix delivery of effect-related exceptions, take 2 (ocaml#12486) (cherry picked from commit 01f737a)
This is similar to #12530, but on amd64, in order to repair operation when the runtime is built with
--enable-frame-pointers
.Tail calls to
caml_c_call
are adjusted to always haveENTER_FUNCTION
as well as TSan instrumentation, so that the stack layout matches what is expected.