Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT][AArch64] Fixes assertion errors occurred when perf2bolt was executed #83394

Merged
merged 6 commits into from
Jul 5, 2024

Conversation

kaadam
Copy link
Contributor

@kaadam kaadam commented Feb 29, 2024

BOLT only checks for the most common indirect branch pattern during the branch analyzation.
Extended the logic with two other indirect patterns which slightly differ from the expected one.
Those patterns may be hit when statically linking libc (pattern 2 requires 'lld' linker).

As a workaround mark them as UNKNOWN branch for now.

Fixes: #83114

Copy link

github-actions bot commented Feb 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@yota9
Copy link
Member

yota9 commented Mar 4, 2024

Hello, thank you for a change.
But aren't these changes related to JT? We're loading table address, offset, add them and jump - I'm not completely sure why we want to skip it.
Also please add proper test.

@kaadam
Copy link
Contributor Author

kaadam commented Mar 26, 2024

Hi @yota9, Sorry for the delayed response.
Started to work on to handle these two patterns maybe you can give me some pointers about them.
Preprocessing indirect branch we should extract PCRelBase, and JumpTable (label address) + Offset if it is possible.

Considering this pattern, this adr instruction its address could be PCRelbase, as I tried to follow the original logic maybe it could be "JT" as well. But we only have the address of the entry in sigall_set "section", should we know about the label address of the sigall_set?

      //   adr     x6, 0x219fb0 <sigall_set+0x88> => PCRelBase, JT?
      //   add     x6, x6, x14, lsl #2
      //   ldr     w7, [x6]
      //   add     x6, x6, w7, sxtw
      //   br      x6

a snippet of the print-cfg:

.Ltmp7286 (10 instructions, align : 1)
  CFI State : 0
  Predecessors: .LFT5416
    00000268:   ldp     q5, q6, [x1], #0x20
    0000026c:   stp     q1, q2, [x3], #0x20
    00000270:   and     x3, x3, #0xfffffffffffffff0
    00000274:   sub     x2, x2, #0x20
    00000278:   hint    #0x0 # NOP: 1
    0000027c:   adr     x6, #-365276
    00000280:   add     x6, x6, x14, lsl #2
    00000284:   ldr     w7, [x6]
    00000288:   add     x6, x6, w7, sxtw
    0000028c:   br      x6 # UNKNOWN CONTROL FLOW
  CFI State: 0

Binary attached:
bubblesort_arm64_static.zip

@yota9
Copy link
Member

yota9 commented Mar 26, 2024

Hi @kaadam ! The signal_set is just closest (from above) symbol in the binary, probably it has nothing to do with JT. Consider 0x219fb0 as a base address.

@paschalis-mpeis
Copy link
Member

Two cases are identified in this patch, and TMU the first of which ((ShiftVal != 2)) is hit by users more often. Virtually any stage-0 binary that is statically linked should hit this.

With the patch as is (ie returning 'false'), BOLT won't be able to build the CFG for such functions, meaning less optimizations but a stable binary regardless.

@kaadam has made some good progress locally, to properly parse these patterns, but it's still incomplete.

So I would suggest that we do this on 2 steps:

1) have a quick workaround in place:

Merge for now something like the latest commit, plus some added test:

Any .c file, statically linked with libc triggers the first error.
Maybe we can emit some additional warning before returning false, or use --print-cfg or -v=1 and FileCheck that.

test.c:

int main() { return 1; }

in lit test:

clang -O0 test.c -static -o out
llvm-bolt out -o out.bolt

2) use follow-up patches for a proper solution:

Parse the 2 patterns, extract any relevant info (JumpTable/Offset/Scale/Base), and adjust tests as needed.
The DefJTBaseAdd being an ADR would need an extra test. There's already good local progress on this direction..

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 11, 2024

@llvm/pr-subscribers-bolt

Author: Ádám Kallai (kaadam)

Changes

BOLT only checks for the most common indirect branch pattern during the branch analyzation. Extended the logic with two other indirect patterns which slightly differ from the expected one.

Since these patterns are not related to JT, mark them as UNKNOWN branch.

Fixes: #83114


Full diff: https://github.com/llvm/llvm-project/pull/83394.diff

2 Files Affected:

  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+29-2)
  • (added) bolt/test/AArch64/test-indirect-branch.c (+33)
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index a74eda8e4a566..04296e3c263ed 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -706,8 +706,21 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     unsigned ShiftVal = AArch64_AM::getArithShiftValue(OperandExtension);
     AArch64_AM::ShiftExtendType ExtendType =
         AArch64_AM::getArithExtendType(OperandExtension);
-    if (ShiftVal != 2)
-      llvm_unreachable("Failed to match indirect branch! (fragment 2)");
+    if (ShiftVal != 2) {
+      // TODO: handle that case where ShiftVal != 2.
+      // The following code sequence below has no shift amount,
+      // the range could be 0 to 4.
+      // The pattern comes from libc, it occurs when the binary is static.
+      //   adr     x6, 0x219fb0 <sigall_set+0x88>
+      //   add     x6, x6, x14, lsl #2
+      //   ldr     w7, [x6]
+      //   add     x6, x6, w7, sxtw => no shift amount
+      //   br      x6
+      LLVM_DEBUG(
+        dbgs() << "Failed to match indirect branch: ShiftVAL != 2 \n"
+      );
+      return false;
+    }
 
     if (ExtendType == AArch64_AM::SXTB)
       ScaleValue = 1LL;
@@ -752,6 +765,20 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
       return true;
     }
 
+    if (DefJTBaseAdd->getOpcode() == AArch64::ADR) {
+      // TODO: handle that case when we do not have adrp/add pair.
+      // It also occurs when the binary is static.
+      //  adr     x13, 0x215a18 <_nl_value_type_LC_COLLATE+0x50>
+      //  ldrh    w13, [x13, w12, uxtw #1]
+      //  adr     x12, 0x247b30 <__gettextparse+0x5b0>
+      //  add     x13, x12, w13, sxth #2
+      //  br      x13
+      LLVM_DEBUG(
+        dbgs() << "Failed to match indirect branch: nop/adr instead of adrp/add \n"
+      );
+      return false;
+    }
+
     assert(DefJTBaseAdd->getOpcode() == AArch64::ADDXri &&
            "Failed to match jump table base address pattern! (1)");
 
diff --git a/bolt/test/AArch64/test-indirect-branch.c b/bolt/test/AArch64/test-indirect-branch.c
new file mode 100644
index 0000000000000..74ad8c288a9c3
--- /dev/null
+++ b/bolt/test/AArch64/test-indirect-branch.c
@@ -0,0 +1,33 @@
+// Test how BOLT handles indirect branch sequence of instructions in
+// AArch64MCPlus builder.
+// This test checks that case when we have no shift amount after add instruction.
+// This pattern comes from libc, so needs  to build '-static' binary to
+// reproduce the issue easily.
+//
+//   adr     x6, 0x219fb0 <sigall_set+0x88>
+//   add     x6, x6, x14, lsl #2
+//   ldr     w7, [x6]
+//   add     x6, x6, w7, sxtw => no shift amount
+//   br      x6
+// It also tests another case when we use '-fuse-ld=lld' along with '-static'
+// which produces the following sequence of intsructions:
+//
+//  nop   => nop/adr instead of adrp/add
+//  adr     x13, 0x215a18 <_nl_value_type_LC_COLLATE+0x50>
+//  ldrh    w13, [x13, w12, uxtw #1]
+//  adr     x12, 0x247b30 <__gettextparse+0x5b0>
+//  add     x13, x12, w13, sxth #2
+//  br      x13
+
+
+// REQUIRES: system-linux
+// RUN: %clang %s -o %t.exe -Wl,-q -static -fuse-ld=lld --target=aarch64-linux
+// RUN: llvm-bolt %t.exe -o %t.bolt --print-cfg \
+// RUN:  --debug 2>&1 | FileCheck --match-full-lines %s
+
+// CHECK: Failed to match indirect branch: nop/adr instead of adrp/add
+// CHECK: Failed to match indirect branch: ShiftVAL != 2
+
+int main() {
+  return 42;
+}

bolt/test/AArch64/test-indirect-branch.c Outdated Show resolved Hide resolved
bolt/test/AArch64/test-indirect-branch.c Outdated Show resolved Hide resolved
bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp Outdated Show resolved Hide resolved
@kaadam kaadam force-pushed the indirect_branch branch 2 times, most recently from a2a220e to 3c4f4f3 Compare June 11, 2024 13:21
@kaadam
Copy link
Contributor Author

kaadam commented Jun 13, 2024

Seems CIs are happy for now. @yota9 could you take a look at it again please? What do you think about it? Thanks!

@yota9
Copy link
Member

yota9 commented Jun 13, 2024

Hi! I don't really mind to add stubs like this, although I would prefer asm test, since it is too deeply relies on the compiler code generation. Let also @rafaelauler tell his opinion on this :)

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding tests:

I would prefer asm test, since it is too deeply relies on the compiler code generation.

I don't have a strong opinion, but I do agree that an asm specific test would be a better test.

Regarding the safety of relaxation of assertions to BOLT-WARNINGS:

With some small local tests, I've seen that BOLT simply won't modify such functions, so we won't have any correctness issues (hence the initial suggestion). But I wanted to be sure we weren't missing any corner cases, so I've discussed this yesterday during BOLT 'Office Hours' with Rafael and Maks.

From the discussion, such workaround would still allow moving the function around as a single piece, but it would not allow modifications within the function itself, which is what we want. The nops elimination pass was mentioned in particular as a potential corner case (there wasn't 100% certainty).

Testing with BOLT shows that after all analyses are complete, when we are ready to runAllPasses, any relevant BinaryFunction would return false for its hasCFG() member function (reflecting just it's BinaryFunction::State).
And such BinaryFunction would have an empty list of basic blocks, meaning RemoveNops or any other intra-procedural pass wouldn't perform any changes that could potentially affect jump table offsets.

So it looks be safe.


Other than my above discussion on tests, it LGTM.

bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp Outdated Show resolved Hide resolved
@kaadam
Copy link
Contributor Author

kaadam commented Jun 13, 2024

Thanks the review for you all, I create a minimized assembly test and will upload it. I cannot argue with that it deeply depend on how compiler generates the code.

@kaadam
Copy link
Contributor Author

kaadam commented Jun 19, 2024

Hi! I don't really mind to add stubs like this, although I would prefer asm test, since it is too deeply relies on the compiler code generation. Let also @rafaelauler tell his opinion on this :)

Hi @yota9, I've updated the PR. I enclosed an asm test instead of C one. Could you take a look at it again, please? Thanks Adam.

@yota9
Copy link
Member

yota9 commented Jun 25, 2024

LGTM, but maybe handle nop+adr case in this patch? It shouldn't be difficult

@kaadam
Copy link
Contributor Author

kaadam commented Jun 26, 2024

LGTM, but maybe handle nop+adr case in this patch? It shouldn't be difficult

Hi @yota9, Thanks for your comment. So You mean that to also detect 'nop' after adr instruction in 'analyzeIndirectBranchFragment' following these patterns below? Or are you thinking of a completely different pattern?

  nop
  adr     x6, 0x219fb0 <sigall_set+0x88>
  add     x6, x6, x14, lsl #2
  ldr     w7, [x6]
  add     x6, x6, w7, sxtw
  br      x6

  nop   => nop/adr instead of adrp/add
  adr     x13, 0x215a18 <_nl_value_type_LC_COLLATE+0x50>
  ldrh    w13, [x13, w12, uxtw #1]
  adr     x12, 0x247b30 <__gettextparse+0x5b0>
  add     x13, x12, w13, sxth #2
  br      x13

bolt/test/AArch64/test-indirect-branch.s Outdated Show resolved Hide resolved
// add x6, x6, w7, sxtw => no shift amount
// br x6
//
// It also tests another case where there is no adrp/add pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace

// It also tests another case where there is no adrp/add pair.
// The pattern also come from libc, and it only represents in the binary
// if the lld linker is used to create the static binary.
// It doesn't occur with GCC ld linker.

with something like:

// Pattern 2: nop/adr pair is used in place of adrp/add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaadam ,
thanks for the changes, looks good.
Applying the latest suggestion from @yota9 and then we are good to go!

@kaadam
Copy link
Contributor Author

kaadam commented Jul 4, 2024

@yota9 @paschalis-mpeis I've minimized the test case (just reorganized and removed superfluous instructions). Updated the PR (commit) description as well. What do you think?

@yota9
Copy link
Member

yota9 commented Jul 4, 2024

@kaadam Hi, could you please reformat test to smth like:

  .section .text
  .align 4
  .globl _start
  .type  _start, %function
_start:
  bl test1
  bl test2

 .globl test1
 .type test2, %function
test1:
  adr     x3, jump_table
  ldrh    w3, [x3, x1, lsl #1]
  adr     x1, .test1_0
  add     x3, x1, w3, sxth #2
  br      x3

test1_0:
  ret
test1_1:
  ret

 .globl test2
 .type test2, %function
test2:
  nop
  adr     x3, jump_table
  ldrh    w3, [x3, x1, lsl #1]
  adr     x1, .case0
  add     x3, x1, w3, sxth #2

  .section .rodata,"a",@progbits
jump_table:
  .hword  (.case0-.case0)>>2
  .hword  (.case1-.case0)>>2

test2_0:
  ret
rest2_1:
  ret
test2_2:
  ret

datatable:
  .word test2_0-datatable
  .word test2_1-datatable
  .word test2_2-datatable

So they won't be connected, this way it would be much more readable

…ecuted

BOLT only checks for the most common indirect branch pattern during
the branch analyzation. Needs to extend the current logic with other cases
which slightly differs from the expected one.
This instruction sequences comes from libc,
it occurs when the binary is static.

As a workaround mark it as UNKNOWN branch and add support for them
in a follow up PR later.

Fixes: llvm#83114
@kaadam
Copy link
Contributor Author

kaadam commented Jul 5, 2024

@yota9 @paschalis-mpeis Thanks for your review. I've updated the test based on yota suggestion. I had to pass '--strict' option to to llvm-bolt. (Since I got 'BOLT-ERROR: Cannot relax adr in non-simple function test1/test2', and test failed). Could you take a look at it again, please?

Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for test changing. I'm approving this tmp patch since handling the functions with IndirectBranchType::UNKNOWN is still better than aborting on these cases.

@kaadam
Copy link
Contributor Author

kaadam commented Jul 5, 2024

Hi! Thanks for test changing. I'm approving this tmp patch since handling the functions with IndirectBranchType::UNKNOWN is still better than aborting on these cases.

Thanks @yota9!
I'm not allowed to submit the PR, @yota9, @paschalis-mpeis may I ask one of you to merge the PR, if the PR has met the criteria for merging?

@yota9 yota9 merged commit e2cee2c into llvm:main Jul 5, 2024
6 checks passed
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…ecuted (llvm#83394)

BOLT only checks for the most common indirect branch pattern during the
branch analyzation.
Extended the logic with two other indirect patterns which slightly
differ from the expected one.
Those patterns may be hit when statically linking libc (pattern 2
requires 'lld' linker).

As a workaround mark them as UNKNOWN branch for now. 

Fixes: llvm#83114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BOLT] Perf2bolt is failing with assertion on AArch64
4 participants