Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MachineOutliner] Sort by Benefit to Cost Ratio #90264

Merged
merged 4 commits into from
Jun 7, 2024

Conversation

xuanzhang816
Copy link
Contributor

@xuanzhang816 xuanzhang816 commented Apr 26, 2024

This PR depends on #90260

We changed the order in which functions are outlined in Machine Outliner.

The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size.

run (CTMark/)   baseline (1) priority (2) diff (1 -> 2)
lencod          349624      349264      -0.1030%    
SPASS           219672      219480      -0.0874%    
kc              271956      251200      -7.6321%    
sqlite3         223920      223708      -0.0947%    
7zip-benchmark  405364      402624      -0.6759%    
bullet          139820      139500      -0.2289%    
consumer-typeset 295684      290196      -1.8560%    
pairlocalalign  72236       72092       -0.1993%    
tramp3d-v4      189572      189292      -0.1477%    

This is part of an enhanced version of machine outliner -- see RFC.

@xuanzhang816 xuanzhang816 changed the title Sort by benefit to cost ratio Sort by Benefit to Cost Ratio Apr 29, 2024
@xuanzhang816 xuanzhang816 changed the title Sort by Benefit to Cost Ratio [MachineOutliner] Sort by Benefit to Cost Ratio May 2, 2024
@xuanzhang816 xuanzhang816 force-pushed the sort-by-benefit-to-cost-ratio branch from 5b8fb07 to f1ddd0b Compare June 3, 2024 14:46
@xuanzhang816 xuanzhang816 force-pushed the sort-by-benefit-to-cost-ratio branch from f1ddd0b to 4046e8e Compare June 3, 2024 15:07
@xuanzhang816 xuanzhang816 marked this pull request as ready for review June 3, 2024 15:09
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 3, 2024

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-aarch64

Author: Xuan Zhang (xuanzh-meta)

Changes

This PR depends on #90260

We changed the order in which functions are outlined in Machine Outliner.

The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size.

run (CTMark/)   baseline (1) priority (2) diff (1 -> 2)
lencod          349624      349264      -0.1030%    
SPASS           219672      219480      -0.0874%    
kc              271956      251200      -7.6321%    
sqlite3         223920      223708      -0.0947%    
7zip-benchmark  405364      402624      -0.6759%    
bullet          139820      139500      -0.2289%    
consumer-typeset 295684      290196      -1.8560%    
pairlocalalign  72236       72092       -0.1993%    
tramp3d-v4      189572      189292      -0.1477%    

This is part of an enhanced version of machine outliner -- see RFC.


Patch is 29.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90264.diff

6 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineOutliner.cpp (+4-2)
  • (added) llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll (+96)
  • (modified) llvm/test/CodeGen/ARM/machine-outliner-calls.mir (+28-52)
  • (modified) llvm/test/CodeGen/ARM/machine-outliner-default.mir (+15-18)
  • (modified) llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir (+24-32)
  • (modified) llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-thumb.mir (+27-47)
diff --git a/llvm/lib/CodeGen/MachineOutliner.cpp b/llvm/lib/CodeGen/MachineOutliner.cpp
index 626e577a30bf3..9e2e6316dc6d1 100644
--- a/llvm/lib/CodeGen/MachineOutliner.cpp
+++ b/llvm/lib/CodeGen/MachineOutliner.cpp
@@ -828,10 +828,12 @@ bool MachineOutliner::outline(Module &M,
                     << "\n");
   bool OutlinedSomething = false;
 
-  // Sort by benefit. The most beneficial functions should be outlined first.
+  // Sort by priority where priority := getNotOutlinedCost / getOutliningCost.
+  // The function with highest priority should be outlined first.
   stable_sort(FunctionList,
               [](const OutlinedFunction &LHS, const OutlinedFunction &RHS) {
-                return LHS.getBenefit() > RHS.getBenefit();
+                return LHS.getNotOutlinedCost() * RHS.getOutliningCost() >
+                       RHS.getNotOutlinedCost() * LHS.getOutliningCost();
               });
 
   // Walk over each function, outlining them as we go along. Functions are
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll b/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll
new file mode 100644
index 0000000000000..00efc3c6e71c8
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll
@@ -0,0 +1,96 @@
+; This tests the order in which functions are outlined in MachineOutliner
+; There are TWO key OutlinedFunction in FunctionList
+;
+; ===================== First One =====================
+;   ```
+;     mov     w0, #1
+;     mov     w1, #2
+;     mov     w2, #3
+;     mov     w3, #4
+;     mov     w4, #5
+;   ```
+; It has:
+;   - `SequenceSize=20` and `OccurrenceCount=6`
+;   - each Candidate has `CallOverhead=12` and `FrameOverhead=4`
+;   - `NotOutlinedCost=20*6=120` and `OutliningCost=12*6+20+4=96`
+;   - `Benefit=120-96=24` and `Priority=120/96=1.25`
+;
+; ===================== Second One =====================
+;   ```
+;     mov     w6, #6
+;     mov     w7, #7
+;     b
+;   ```
+; It has:
+;   - `SequenceSize=12` and `OccurrenceCount=4`
+;   - each Candidate has `CallOverhead=4` and `FrameOverhead=0`
+;   - `NotOutlinedCost=12*4=48` and `OutliningCost=4*4+12+0=28`
+;   - `Benefit=120-96=20` and `Priority=48/28=1.71`
+;
+; Note that the first one has higher benefit, but lower priority.
+; Hence, when outlining per priority, the second one will be outlined first.
+
+; RUN: llc %s -enable-machine-outliner=always -filetype=obj -o %t
+; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=CHECK-SORT-BY-PRIORITY
+
+; RUN: llc %s -enable-machine-outliner=always -outliner-benefit-threshold=22 -filetype=obj -o %t
+; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=CHECK-THRESHOLD
+
+
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+declare i32 @_Z3fooiiii(i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef)
+
+define i32 @_Z2f1v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 11, i32 noundef 6, i32 noundef 7)
+  ret i32 %1
+}
+
+define i32 @_Z2f2v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 12, i32 noundef 6, i32 noundef 7)
+  ret i32 %1
+}
+
+define i32 @_Z2f3v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 13, i32 noundef 6, i32 noundef 7)
+  ret i32 %1
+}
+
+define i32 @_Z2f4v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 14, i32 noundef 6, i32 noundef 7)
+  ret i32 %1
+}
+
+define i32 @_Z2f5v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 15, i32 noundef 8, i32 noundef 9)
+  ret i32 %1
+}
+
+define i32 @_Z2f6v() minsize {
+  %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 16, i32 noundef 9, i32 noundef 8)
+  ret i32 %1
+}
+
+; CHECK-SORT-BY-PRIORITY: <_OUTLINED_FUNCTION_0>:
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w6, #0x6
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w7, #0x7
+; CHECK-SORT-BY-PRIORITY-NEXT: b
+
+; CHECK-SORT-BY-PRIORITY: <_OUTLINED_FUNCTION_1>:
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w0, #0x1
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w1, #0x2
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w2, #0x3
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w3, #0x4
+; CHECK-SORT-BY-PRIORITY-NEXT: mov     w4, #0x5
+; CHECK-SORT-BY-PRIORITY-NEXT: ret
+
+; CHECK-THRESHOLD: <_OUTLINED_FUNCTION_0>:
+; CHECK-THRESHOLD-NEXT: mov     w0, #0x1
+; CHECK-THRESHOLD-NEXT: mov     w1, #0x2
+; CHECK-THRESHOLD-NEXT: mov     w2, #0x3
+; CHECK-THRESHOLD-NEXT: mov     w3, #0x4
+; CHECK-THRESHOLD-NEXT: mov     w4, #0x5
+; CHECK-THRESHOLD-NEXT: ret
+
+; CHECK-THRESHOLD-NOT: <_OUTLINED_FUNCTION_1>:
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-calls.mir b/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
index a92c9dd28be5a..7634ecd6e863a 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
@@ -26,15 +26,15 @@ body:             |
   ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -4
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $r4, -8
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_2
   ; CHECK: bb.1:
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_2
   ; CHECK: bb.2:
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_2
   ; CHECK: bb.3:
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_2
   ; CHECK: bb.4:
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_2
   ; CHECK: bb.5:
   ; CHECK:   $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
   ; CHECK:   BX_RET 14 /* CC::al */, $noreg
@@ -139,13 +139,13 @@ body:             |
   ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -4
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $r4, -8
-  ; CHECK:   BL @OUTLINED_FUNCTION_1
+  ; CHECK:   BL @OUTLINED_FUNCTION_0
   ; CHECK: bb.1:
-  ; CHECK:   BL @OUTLINED_FUNCTION_1
+  ; CHECK:   BL @OUTLINED_FUNCTION_0
   ; CHECK: bb.2:
-  ; CHECK:   BL @OUTLINED_FUNCTION_1
+  ; CHECK:   BL @OUTLINED_FUNCTION_0
   ; CHECK: bb.3:
-  ; CHECK:   BL @OUTLINED_FUNCTION_1
+  ; CHECK:   BL @OUTLINED_FUNCTION_0
   ; CHECK: bb.4:
   ; CHECK:   $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
   ; CHECK:   BX_RET 14 /* CC::al */, $noreg
@@ -245,19 +245,19 @@ body:             |
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -4
   ; CHECK:   frame-setup CFI_INSTRUCTION offset $r4, -8
   ; CHECK:   BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
-  ; CHECK:   BL @OUTLINED_FUNCTION_2
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK: bb.1:
   ; CHECK:   BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
-  ; CHECK:   BL @OUTLINED_FUNCTION_2
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK: bb.2:
   ; CHECK:   BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
-  ; CHECK:   BL @OUTLINED_FUNCTION_2
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK: bb.3:
   ; CHECK:   BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
-  ; CHECK:   BL @OUTLINED_FUNCTION_2
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK: bb.4:
   ; CHECK:   BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
-  ; CHECK:   BL @OUTLINED_FUNCTION_2
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK: bb.5:
   ; CHECK:   $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
   ; CHECK:   BX_RET 14 /* CC::al */, $noreg
@@ -307,38 +307,17 @@ body:             |
   bb.0:
     BX_RET 14, $noreg
 
-
   ; CHECK-LABEL: name: OUTLINED_FUNCTION_0
   ; CHECK: bb.0:
-  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
-  ; CHECK:   early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
-  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -8
-  ; CHECK:   BL @bar, implicit-def dead $lr, implicit $sp
-  ; CHECK:   $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r2 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r3 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
-  ; CHECK:   MOVPCLR 14 /* CC::al */, $noreg
-
-  ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
-  ; CHECK: bb.0:
-  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
-  ; CHECK:   early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
-  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -8
-  ; CHECK:   BL @bar, implicit-def dead $lr, implicit $sp
+  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
   ; CHECK:   $r0 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r1 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r2 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r3 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r4 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
   ; CHECK:   TAILJMPd @bar, implicit $sp
 
-  ; CHECK-LABEL: name: OUTLINED_FUNCTION_2
+  ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
   ; CHECK: bb.0:
   ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
   ; CHECK:   $r0 = MOVi 3, 14 /* CC::al */, $noreg, $noreg
@@ -348,31 +327,28 @@ body:             |
   ; CHECK:   $r4 = MOVi 3, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   MOVPCLR 14 /* CC::al */, $noreg
 
+  ; CHECK-LABEL: name: OUTLINED_FUNCTION_2
+  ; CHECK: bb.0:
+  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+  ; CHECK:   $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r2 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r3 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   MOVPCLR 14 /* CC::al */, $noreg
+
   ; CHECK-LABEL: name: OUTLINED_FUNCTION_3
   ; CHECK: bb.0:
-  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
-  ; CHECK:   early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
-  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -8
-  ; CHECK:   tBL 14 /* CC::al */, $noreg, @bar, implicit-def dead $lr, implicit $sp
+  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
   ; CHECK:   $r0 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r1 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r2 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
   ; CHECK:   tTAILJMPdND @bar, 14 /* CC::al */, $noreg, implicit $sp
 
   ; CHECK-LABEL: name: OUTLINED_FUNCTION_4
   ; CHECK: bb.0:
-  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
-  ; CHECK:   early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
-  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -8
-  ; CHECK:   tBL 14 /* CC::al */, $noreg, @bar, implicit-def dead $lr, implicit $sp
+  ; CHECK:   liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
   ; CHECK:   $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
   ; CHECK:   tBX_RET 14 /* CC::al */, $noreg
-
-
-
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-default.mir b/llvm/test/CodeGen/ARM/machine-outliner-default.mir
index 6d0218dbfe636..de2b8f5596976 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-default.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-default.mir
@@ -19,17 +19,17 @@ body:             |
   ; CHECK: bb.0:
   ; CHECK:   liveins: $lr
   ; CHECK:   early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK:   $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.1:
   ; CHECK:   liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
   ; CHECK:   early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK:   $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.2:
   ; CHECK:   liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
   ; CHECK:   early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   BL @OUTLINED_FUNCTION_0
+  ; CHECK:   BL @OUTLINED_FUNCTION_1
   ; CHECK:   $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.3:
   ; CHECK:   liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
@@ -73,17 +73,17 @@ body:             |
   ; CHECK: bb.0:
   ; CHECK:   liveins: $lr
   ; CHECK:   early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
   ; CHECK:   $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.1:
   ; CHECK:   liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
   ; CHECK:   early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
   ; CHECK:   $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.2:
   ; CHECK:   liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
   ; CHECK:   early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+  ; CHECK:   tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
   ; CHECK:   $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
   ; CHECK: bb.3:
   ; CHECK:   liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
@@ -114,6 +114,15 @@ body:             |
 
   ; CHECK-LABEL: name: OUTLINED_FUNCTION_0
   ; CHECK: bb.0:
+  ; CHECK:   liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
+  ; CHECK:   $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   $r3 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+  ; CHECK:   tBX_RET 14 /* CC::al */, $noreg
+
+  ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
+  ; CHECK: bb.0:
   ; CHECK:   liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
   ; CHECK:   $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
@@ -122,15 +131,3 @@ body:             |
   ; CHECK:   $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   $r5 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
   ; CHECK:   MOVPCLR 14 /* CC::al */, $noreg
-
-  ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
-  ; CHECK: bb.0:
-  ; CHECK:   liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
-  ; CHECK:   $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   $r3 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
-  ; CHECK:   tBX_RET 14 /* CC::al */, $noreg
-
-
-
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir b/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
index ae5caa5b7c06d..e71edc8ceb3f6 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
@@ -18,6 +18,7 @@ body:             |
     liveins: $r0
     ; CHECK-LABEL: name:           CheckAddrMode_i12
     ; CHECK: $r1 = MOVr killed $r0, 14 /* CC::al */, $noreg, $noreg
+    ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
     ; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I12:[0-9]+]]
     ; CHECK-NEXT: $r6 = LDRi12 $sp, 4088, 14 /* CC::al */, $noreg
     $r1 = MOVr killed $r0, 14, $noreg, $noreg
@@ -47,6 +48,7 @@ body:             |
     liveins: $r1
     ; CHECK-LABEL: name:           CheckAddrMode3
     ; CHECK: $r0 = MOVr killed $r1, 14 /* CC::al */, $noreg, $noreg
+    ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
     ; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I3:[0-9]+]]
     ; CHECK-NEXT: $r6 = LDRSH $sp, $noreg, 248, 14 /* CC::al */, $noreg
     $r0 = MOVr killed $r1, 14, $noreg, $noreg
@@ -76,6 +78,7 @@ body:             |
     liveins: $r2
     ; CHECK-LABEL: name:           CheckAddrMode5
     ; CHECK: $r0 = MOVr killed $r2, 14 /* CC::al */, $noreg, $noreg
+    ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
     ; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I5:[0-9]+]]
     ; CHECK-NEXT: $d5 = VLDRD $sp, 254, 14 /* CC::al */, $noreg
     $r0 = MOVr killed $r2, 14, $noreg, $noreg
@@ -110,6 +113,7 @@ body:             |
     liveins: $r3
     ; CHECK-LABEL: name:           CheckAddrMode5FP16
     ; CHECK: $r0 = MOVr killed $r3, 14 /* CC::al */, $noreg, $noreg
+    ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
     ; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I5FP16:[0-9]+]]
     ; CHECK-NEXT: $s6 = VLDRH $sp, 252, 14, $noreg
     $r0 = MOVr killed $r3, 14, $noreg, $noreg
@@ -146,41 +150,29 @@ body:             |
     BX_RET 14, $noreg
 
     ;CHECK: name:           OUTLINED_FUNCTION_[[I5]]
-    ;CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-    ;CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 8
-    ;CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $lr, -8
-    ;CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
-    ;CHECK-NEXT: $d0 = VLDRD $sp, 2, 14 /* CC::al */, $noreg
-    ;CHECK-NEXT: $d1 = VLDRD $sp, 10, 14 /* CC::al */, $noreg
-    ;CHECK-NEXT: $d4 = VLDRD $sp, 255, 14 /* CC::al */, $noreg
-    ;CHECK-NEXT: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
+    ;CHECK: liveins: $r10, $r9, $r8, $r7, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+    ;CHECK: $d0 = VLDRD $sp, 0, 14 /* CC::al */, $noreg
+    ;CHECK-NEXT: $d1 = VLDRD $sp, 8, 14 /* CC::al */, $noreg
+    ;CHECK-NEXT: $d4 = VLDRD $sp, 253, 14 /* CC::al */, $noreg
+    ;CHECK-NEXT: BX_RET 14 /* CC::al */, $noreg
 
     ;CHECK: name:           OUTLINED_FUNCTION_[[I5FP16]]
-    ;CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
-    ;CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 8
-    ;CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $lr, -8
-    ;CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
-    ;CHECK-NEXT: $s1 = VLDRH $sp, 4, 14, $noreg
-    ;CHECK-NEXT: $s2 = VLDRH $sp, 12, 14, $noreg
-    ;CHECK-NEXT: $s5 = VLDRH $sp, 244, 14, $noreg
-    ;CHECK-NEXT: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
+    ;CHECK: liveins: $r10, $r9, $r8, $r7, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+    ;CHECK: $s1 = VLDRH $sp, 0, 14, $noreg
+    ;CHECK-NEXT: $s2 = VLDRH $...
[truncated]

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a MIR test to clearly demonstrate the actual instruction sequences? This would help ensure the implementation aligns with our expectations with regard to priority computation.

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but please wait a bit for others' comments if any.

@kyulee-com kyulee-com merged commit 3b16630 into llvm:main Jun 7, 2024
7 checks passed
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
kyulee-com pushed a commit that referenced this pull request Jun 18, 2024
This PR  depends on #90264

In the current implementation, only leaf children of each internal node
in the suffix tree are included as candidates for outlining. But all
leaf descendants are outlining candidates, which we include in the new
implementation. This is enabled on a flag `outliner-leaf-descendants`
which is default to be true.

The reason for _enabling this on a flag_ is because machine outliner is
not the only pass that uses suffix tree.

The reason for _having this default to be true_ is because including all
leaf descendants show consistent size win.
* For Clang/LLD, it shows around 3% reduction in text segment size when
compared to the baseline `-Oz` linker binary.
 * For selected benchmark tests in LLVM test suite 
 
| run (CTMark/) | only leaf children | all leaf descendants | reduction
% |

|------------------|--------------------|----------------------|-------------|
| lencod | 349624 | 348564 | -0.2004% |
| SPASS | 219672 | 218440 | -0.4738% |
| kc | 271956 | 250068 | -0.4506% |
| sqlite3 | 223920 | 222484 | -0.5471% |
| 7zip-benchmark | 405364 | 401244 | -0.3428% |
| bullet | 139820 | 138340 | -0.8315% |
| consumer-typeset | 295684 | 286628 | -1.2295% |
| pairlocalalign | 72236 | 71936 | -0.2164% |
| tramp3d-v4 | 189572 | 183676 | -2.9668% |

This is part of an enhanced version of machine outliner -- see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
clayborg pushed a commit to clayborg/llvm-project that referenced this pull request Jun 20, 2024
This PR  depends on llvm#90264

In the current implementation, only leaf children of each internal node
in the suffix tree are included as candidates for outlining. But all
leaf descendants are outlining candidates, which we include in the new
implementation. This is enabled on a flag `outliner-leaf-descendants`
which is default to be true.

The reason for _enabling this on a flag_ is because machine outliner is
not the only pass that uses suffix tree.

The reason for _having this default to be true_ is because including all
leaf descendants show consistent size win.
* For Clang/LLD, it shows around 3% reduction in text segment size when
compared to the baseline `-Oz` linker binary.
 * For selected benchmark tests in LLVM test suite 
 
| run (CTMark/) | only leaf children | all leaf descendants | reduction
% |

|------------------|--------------------|----------------------|-------------|
| lencod | 349624 | 348564 | -0.2004% |
| SPASS | 219672 | 218440 | -0.4738% |
| kc | 271956 | 250068 | -0.4506% |
| sqlite3 | 223920 | 222484 | -0.5471% |
| 7zip-benchmark | 405364 | 401244 | -0.3428% |
| bullet | 139820 | 138340 | -0.8315% |
| consumer-typeset | 295684 | 286628 | -1.2295% |
| pairlocalalign | 72236 | 71936 | -0.2164% |
| tramp3d-v4 | 189572 | 183676 | -2.9668% |

This is part of an enhanced version of machine outliner -- see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants