Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ARM64 bounds exception backtraces #11436

Merged
merged 5 commits into from Jul 25, 2022

Conversation

ctk21
Copy link
Contributor

@ctk21 ctk21 commented Jul 14, 2022

This PR fixes the ARM64 backend to correctly handle backtraces in leaf functions that throw bounds check exceptions. This ARM64 bug in the testcase is present in the 5.0 series and at least 4.14.

This PR implements mark_c_tailcall for the ARM64 backend and marks the function as not a leaf call if in debug mode. This is already done on AMD64 but for alignment purposes (see #6239 (comment)). In this PR we are doing this to save the ARM64 link register x30 to the stack such that caml_stash_backtrace can correctly walk the stack when the bounds exception is raised.

(This PR fixes the issue observed here)

@nojb
Copy link
Contributor

nojb commented Jul 14, 2022

From a distance this looks like a general issue that should apply to every native-code backend... can someone explain why this behaviour should not apply in general?

@xavierleroy
Copy link
Contributor

@nojb is correct: this is not specific to ARM64. I was able to reproduce the problem on ARM (32 bits) with 4.14...

To get accurate backtraces, when compiling with -g, all functions that perform bounds checking must save their return address on the stack. It doesn't matter for the two x86 ports, since the call instruction always pushes the return address on the stack. But all other ports save the return address only if the function has contains_calls = true.

So, in -g mode, all functions that perform bound checking must have contains_calls = true. It should suffice to have this as the default implementation of method mark_c_tailcall in asmcomp/selectgen.ml... Let me give this a try.

@xavierleroy
Copy link
Contributor

I made a branch against 4.14 to test this approach on as many platforms as possible: https://github.com/xavierleroy/ocaml/tree/fix-bounds-exn-backtrace . It's running through CI precheck right now.

… platforms

To this end, the leaf function optimization must be turned off for each
function that performs bounds checking and is compiled in -g mode,
so that the return address of the function is saved on stack.
@xavierleroy
Copy link
Contributor

Precheck with 4.14 was successful, I think this is the correct fix not just for ARM64. I pushed the corresponding code to this branch. Let me know what you think.

@ctk21 ctk21 force-pushed the fix-arm64-bounds-exn-backtrace branch from 14ec302 to b1a18af Compare July 15, 2022 14:35
@ctk21
Copy link
Contributor Author

ctk21 commented Jul 15, 2022

Thanks for looking into the other platforms and testing them. I don't see an issue with moving the change to be the default, but I wouldn't claim to be highly qualified on platforms beyond ARM64 & x86.

@xavierleroy
Copy link
Contributor

I wouldn't claim to be highly qualified on platforms beyond ARM64 & x86.

Oh, except x86-32 (which is odd in every possible ways) and x86-64 (which still has some idiosyncrasies), all the ocamlopt-supported platforms look quite the same...

@xavierleroy
Copy link
Contributor

This is a bug fix, and it still needs reviewing.

@mshinwell
Copy link
Contributor

This looks ok, modulo a couple of minor things about the test.

However I was looking at mark_instr in Selectgen and I wonder if we should change something there at the same time. I don't really understand the fix for #6239 in that function -- it seems that amd64/selection.ml overrides mark_c_tailcall to set Proc.contains_calls. So why would changing from mark_c_tailcall to mark_call make any difference? Maybe at the time of this PR the behaviour in amd64/selection.ml was different.

Additionally, this change from #6239 seems to have been required to avoid a stack misalignment and crash, not just for getting a "good stack backtrace" as the comment says.

Anyway, with the proposed fix here, maybe we could remove the comment and go back to calling mark_c_tailcall? That seems simpler and I think it is correct. Does anyone agree?

@xavierleroy
Copy link
Contributor

However I was looking at mark_instr in Selectgen and I wonder if we should change something there at the same time. I don't really understand the fix for #6239 in that function -- it seems that amd64/selection.ml overrides mark_c_tailcall to set Proc.contains_calls. So why would changing from mark_c_tailcall to mark_call make any difference? Maybe at the time of this PR the behaviour in amd64/selection.ml was different.

The original fix for #6239 is here: 948d520 . It predates the mark_call / mark_c_tailcall business.

Globally I agree that a raise (other than raise-notrace) should be treated as mark_c_tailcall once the default behavior of mark_c_tailcall is to mark the function as non-leaf if in debug mode. As long as mark_c_tailcall does nothing by default, using it would not produce a nice backtrace, since the leaf function that raises wouldn't show up in the trace.

However, I'd rather change this in a separate PR, as there is a small risk of a regression.

Additionally, this change from #6239 seems to have been required to avoid a stack misalignment and crash, not just for getting a "good stack backtrace" as the comment says.

The comment is part of commit fa0f96a and is probably wrong, yes.

Stack misalignment is an x86-specific issue (it's the only platform where every function is entered with a misaligned stack...), but should be handled properly by this PR.

@mshinwell
Copy link
Contributor

mshinwell commented Jul 21, 2022

Globally I agree that a raise (other than raise-notrace) should be treated as mark_c_tailcall once the default behavior of mark_c_tailcall is to mark the function as non-leaf if in debug mode. As long as mark_c_tailcall does nothing by default, using it would not produce a nice backtrace, since the leaf function that raises wouldn't show up in the trace.

However, I'd rather change this in a separate PR, as there is a small risk of a regression.

I'll make a separate PR for this. I tried to push the other minor fixes to this PR's branch so we can merge, but permission denied, so we'll need to wait for Tom.

@xavierleroy xavierleroy merged commit a9d4b4b into ocaml:trunk Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants