Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pseudo-instruction Ladjust_trap_depth #2322

Merged
merged 5 commits into from Jun 24, 2019

Conversation

gretay-js
Copy link
Contributor

Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of Iexit to notify assembler generation about updates to the stack. Ladjust_trap_depth is used to keep the virtual stack pointer in sync and emit dwarf information, without emitting any assembly instructions. It therefore avoids generating dead code.

There was already a comment in linearize.ml saying that this should be done.

This PR is based on #1482 @lthls. This PR also includes a more aggressive version of discard_dead_code in linearize, to eliminate dead code that results from Iexit at the end a try-with block, a common situation in large programs. This change along with #2321 eliminate 99% of dead code at the level of compliation unit found in our benchmarks.

This PR passes all tests in Inria's precheck (build # 218).

Here is an example program:

let r = ref 0

let bar t =
  ignore (Sys.opaque_identity ( 2 * t + 7));
  Printf.printf "%d\n" t

let foo t =
  match bar t with
  | () ->
    r := 3
  | exception exn ->
    r := 7;
    raise exn

Generated assembly (amd64, compiled with flambda) with dead code after the first "jmp .LL104"

camlTest5__foo_43:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_adjust_cfa_offset 8
.L106:
        .cfi_adjust_cfa_offset 16
        subq    $16, %rsp
        movq    %r14, (%rsp)
        leaq    .L105(%rip), %r14
        movq    %r14, 8(%rsp)
        movq    %rsp, %r14
        call    camlTest5__bar_15@PLT
.L103:
        popq    %r14
        .cfi_adjust_cfa_offset -8
        addq    $8, %rsp
        .cfi_adjust_cfa_offset -8
        jmp     .L104
        .cfi_adjust_cfa_offset 16
        subq    $16, %rsp
        movq    %r14, (%rsp)
        leaq    .L104(%rip), %r14
        movq    %r14, 8(%rsp)
        movq    %rsp, %r14
        popq    %r14
        .cfi_adjust_cfa_offset -8
        addq    $8, %rsp
        .cfi_adjust_cfa_offset -8
        jmp     .L104
        .align  4
.L105:
        movq    camlTest5__Pmakeblock_71@GOTPCREL(%rip), %rbx
        movq    (%rbx), %rbx
        movq    $15, (%rbx)
        movq    %r14, %rsp
        popq    %r14
        popq    %r11
        jmp     *%r11
        .align  4
.L104:
        movq    camlTest5__Pmakeblock_71@GOTPCREL(%rip), %rax
        movq    (%rax), %rax
        movq    $7, (%rax)
        movq    $1, %rax
        addq    $8, %rsp
        .cfi_adjust_cfa_offset -8
        ret

Copy link
Contributor

@lthls lthls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gretay-js for taking care of this.
I've looked at this and found it correct (with a few typos in comments).
It's only a small subset of the original #1482 PR, and should be uncontroversial.
I think it deserves a Changes entry, but otherwise should be good to go (I'm a bit biased on this, so an additional review wouldn't hurt though).

asmcomp/linearize.ml Outdated Show resolved Hide resolved
asmcomp/linearize.ml Outdated Show resolved Hide resolved
@gretay-js
Copy link
Contributor Author

Added Changes entry and fixed typos.

@mshinwell
Copy link
Contributor

I think @xavierleroy should sign off on this, although I suspect he is in favour, as he probably wrote the original comment about this.

@lthls To confirm: have you checked really carefully the part about removing dead code? I've always found that rather subtle.

@lthls
Copy link
Contributor

lthls commented Mar 13, 2019

I did check carefully, and I don't think it's that subtle. Dead code can be discarded without question, except that code which can modify the stack pointer has compile-time side-effects that need to be preserved, and need to emit the correct cfi instructions. By replacing the actual trap instructions by an adjust_trap_depth instruction, we can update correctly the stack offsets and generate the correct cfi directives without having to also emit unused instructions.
There is actually one other instruction which can affect stack offsets: Lprologue. It can never happen inside dead code, so the code is correct without the special case, but one could make a point of special casing it in discard_dead_code with assert false for safety and/or documentation.

The only part where I feel a bit less confident is that the old code used to keep all instructions after a dead Lpoptrap or Lpushtrap instruction, and I don't know why. I believe that this is simply a missed optimisation opportunity, and a quick check through the history of the function reinforces this opinion, but it would still be better to get some more informed arguments.

@gretay-js
Copy link
Contributor Author

The only part where I feel a bit less confident is that the old code used to keep all instructions after a dead Lpoptrap or Lpushtrap instruction, and I don't know why. I believe that this is simply a missed optimisation opportunity, and a quick check through the history of the function reinforces this opinion, but it would still be better to get some more informed arguments.

Prior to PR #2237, I think there was no other dead code possible between encountering the start of dead poptraps / pushtraps and the next label. The change to try-with introduced a jmp from the end of the body of the try-with to its continuation. Previously, there was a fallthrough. That new jmp which follows Lpoptrap becomes dead when the last instruction in the body of try-with is Iexit.

@lthls
Copy link
Contributor

lthls commented Mar 18, 2019

Thanks for the explanation. This confirms my understanding that this patch is correct.

@gretay-js
Copy link
Contributor Author

@xavierleroy Do you have any objections to this change?

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One minor suggestion below.

in
if delta_traps = 0 then next
else cons_instr (Ladjust_trap_depth { delta_traps }) next

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code could be simplified by the magic of recursion.

let rec adjust_trap_depth delta_traps next =
  match next.desc with
  | Ladjust_trap_depth { delta_traps = k } ->
      adjust_trap_depth (delta_traps + k) next.next
  | _ ->
      if delta_traps = 0 then next
      else cons_instr (Ladjust_trap_depth { delta_traps }) next

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Rebased and magicked.

Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482 @lthls
Replace Lpushtrap and Lpoptrap with the curresponding Ladjust_trap_depth.
@lpw25
Copy link
Contributor

lpw25 commented Jun 24, 2019

Merging based on Xavier's approval.

@lpw25 lpw25 merged commit d8b6a17 into ocaml:trunk Jun 24, 2019
lthls pushed a commit to lthls/ocaml that referenced this pull request Jul 17, 2019
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482
gretay-js added a commit to gretay-js/ocaml that referenced this pull request Aug 26, 2019
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482 @lthls
gretay-js added a commit to gretay-js/ocaml that referenced this pull request Nov 21, 2019
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482 @lthls
gretay-js added a commit to gretay-js/ocaml that referenced this pull request Nov 21, 2019
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482 @lthls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants