Skip to content

Conversation

aemerson
Copy link
Contributor

This optimization works well for most cases but implicitly assumes that the
backend will be able to generate jump tables (see "And lookup table optimization
should convert it into add %arg 1." in PR #67275 description).

When -fno-jump-tables is used however this regresses code size, which is
especially painful in targets like armv7 where there's already high register
pressure.

rdar://157858055

… and successor BB" optimization when not using jump tables.

This optimization works well for most cases but implicitly assumes that the
backend will be able to generate jump tables (see "And lookup table optimization
should convert it into add %arg 1." in PR #67275 description).

When -fno-jump-tables is used however this regresses code size, which is
especially painful in targets like armv7 where there's already high register
pressure.

rdar://157858055
Copy link
Contributor Author

@aemerson aemerson requested review from XChy and nikic September 24, 2025 17:01
@aemerson aemerson marked this pull request as ready for review September 24, 2025 17:01
@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Amara Emerson (aemerson)

Changes

This optimization works well for most cases but implicitly assumes that the
backend will be able to generate jump tables (see "And lookup table optimization
should convert it into add %arg 1." in PR #67275 description).

When -fno-jump-tables is used however this regresses code size, which is
especially painful in targets like armv7 where there's already high register
pressure.

rdar://157858055


Full diff: https://github.com/llvm/llvm-project/pull/160560.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+6)
  • (modified) llvm/test/Transforms/SimplifyCFG/branch-fold.ll (+34)
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 123881e276584..1a3d543160137 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -1159,6 +1159,12 @@ bool llvm::TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,
   bool BBPhisMergeable = BBKillable || CanRedirectPredsOfEmptyBBToSucc(
                                            BB, Succ, BBPreds, CommonPred);
 
+  // If the function has the "no-jump-tables" attribute, merging phis
+  // can make size worse at -Oz.
+  auto *MF = BB->getParent();
+  if (MF->hasMinSize() && MF->hasFnAttribute("no-jump-tables"))
+    BBPhisMergeable = false;
+
   if ((!BBKillable && !BBPhisMergeable) || introduceTooManyPhiEntries(BB, Succ))
     return false;
 
diff --git a/llvm/test/Transforms/SimplifyCFG/branch-fold.ll b/llvm/test/Transforms/SimplifyCFG/branch-fold.ll
index 8e7b91ea172be..2f4c5d9590e0e 100644
--- a/llvm/test/Transforms/SimplifyCFG/branch-fold.ll
+++ b/llvm/test/Transforms/SimplifyCFG/branch-fold.ll
@@ -145,8 +145,42 @@ Succ:
   ret i8 %phi2
 }
 
+define i8 @common_pred_no_jt(i8 noundef %arg, i1 %c1, i1 %c2) #0 {
+; CHECK-LABEL: @common_pred_no_jt(
+; CHECK-NEXT:  Pred:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br i1 [[C1:%.*]], label [[COMMONPRED:%.*]], label [[SUCC:%.*]]
+; CHECK:       CommonPred:
+; CHECK-NEXT:    call void @dummy()
+; CHECK-NEXT:    br i1 [[C2:%.*]], label [[SUCC1:%.*]], label [[SUCC]]
+; CHECK:       BB:
+; CHECK-NEXT:    [[PHI1:%.*]] = phi i8 [ 0, [[PRED:%.*]] ], [ 1, [[COMMONPRED]] ]
+; CHECK-NEXT:    br label [[SUCC1]]
+; CHECK:       Succ:
+; CHECK-NEXT:    [[PHI2:%.*]] = phi i8 [ [[PHI1]], [[SUCC]] ], [ 4, [[COMMONPRED]] ]
+; CHECK-NEXT:    ret i8 [[PHI2]]
+;
+Pred:
+call void @dummy()
+  br i1 %c1, label %CommonPred, label %BB
+
+CommonPred:
+call void @dummy()
+  br i1 %c2, label %Succ, label %BB
+
+BB:
+  %phi1 = phi i8 [ 0, %Pred ], [1, %CommonPred]
+  br label %Succ
+
+Succ:
+  %phi2 = phi i8 [ %phi1, %BB ], [ 4, %CommonPred ]
+  ret i8 %phi2
+}
 declare void @dummy()
 
+attributes #0 = { minsize "no-jump-tables"="true" }
+
+
 !0 = !{!"branch_weights", i32 3, i32 7}
 !1 = !{!"branch_weights", i32 11, i32 4}
 ;.

@nikic
Copy link
Contributor

nikic commented Sep 24, 2025

This optimization works well for most cases but implicitly assumes that the backend will be able to generate jump tables (see "And lookup table optimization should convert it into add %arg 1." in PR #67275 description).

When -fno-jump-tables is used however this regresses code size, which is especially painful in targets like armv7 where there's already high register pressure.

I'm not really clear about what no-jump-tables has to do with this transform. The specific example from #67275 does benefit from a switch to arithmetic conversion -- which, since recently, is not actually affected by "no-jump-tables" anymore (see https://llvm.godbolt.org/z/s7GrWxGnn for it being applied to that example in the presence of "no-jump-tables"). But the transform itself isn't really specific to that situation.

I also checked the codegen for you test case with and without this transform on armv7, and it seems to be the same: https://llvm.godbolt.org/z/673dbfGKs This particular example doesn't even generate a code structure where jump tables would be relevant.

@aemerson
Copy link
Contributor Author

I also checked the codegen for you test case with and without this transform on armv7, and it seems to be the same: https://llvm.godbolt.org/z/673dbfGKs This particular example doesn't even generate a code structure where jump tables would be relevant.

Right, sorry I should have made it clearer that the size issue isn't demonstrated by the test case, that was just trying to ensure that the general transform doesn't happen without jump tables.

I've got a reduced test case that shows a significant (7%+) size difference when compiling with and without this change. I'm attaching the interestingness script used to reduce this down.
interestingness.sh

GitHub won't let me attach IR so I've put it here: https://godbolt.org/z/7baoK9j1a

Compiling that with:

clang -c -arch armv6m -ffreestanding -static -fstack-protector -fno-PIC -Oz -nostdlibinc -ffreestanding -fomit-frame-pointer -mllvm -inline-threshold=10 -fno-jump-tables input.ll

should show the issue.

@aemerson
Copy link
Contributor Author

Since this probably isn't the right solution I'm going to close this.

@aemerson aemerson closed this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants