-
Notifications
You must be signed in to change notification settings - Fork 15.2k
SimplifyCFG: Enable switch replacements in more cases #156477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, some switch optimizations are only possible, if lookup table support is enabled. However, some of these optimizations are no lookup tables at all, but a series of simple instructions like mul, add or shift instructions. These tests will show how a future change will enable these other kinds of optimization, even if lookup tables are not supported.
Check if the SwitchReplacement is a lookup table or not.
llvm/test/Transforms/SimplifyCFG/switch-transformations-no-lut.ll
Outdated
Show resolved
Hide resolved
The option prevents any switch replacements from happening earlier in the compilation pipeline, even though it should only apply to lookup tables. Delay the check until we are sure that we would create a lookup table. If not, continue with the replacement.
Only check the target support for LUTs if we would create one. Otherwise, proceed with the optimization.
Allow the replacement of switches with anything other than a lookup table, even if the "no-jump-table" attribute is set.
fe0691a
to
eb89aab
Compare
// pipeline, because it would otherwise result in some | ||
// difficult-to-analyze code and make pruning branches much harder. | ||
// This is a problem if the switch expression itself can be restricted | ||
// by inlining or CVP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the llvm-opt-benchmark results, we should also guard BitMapKind behind ConvertSwitchToLookupTable. Similar to lookup tables, this representation may make further optimization harder.
(I'd also be okay with dropping the ConvertSwitchToLookupTable change from this PR to not block the rest of the changes on flushing out all the phase ordering issues.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into the benchmark and reviewing. I decided to guard the bitmap creation with ConvertSwitchToLookup for now instead of completely removing the ConvertSwitchToLookup change. Hopefully the new benchmark results will show no other regressions, so the other kinds of transformations can still be applied earlier without issues, otherwise I will remove it for now.
Early bitmap creation can, similar to lookup tables, cause missed optimizations because they are more difficult to analyze. Delay the creation until later in the pipeline, when the option `switch-to-lookup` is set in in SimplifyCFG.
@llvm/pr-subscribers-llvm-transforms Author: Jessica Del (OutOfCache) ChangesIn some cases, we can replace a switch with simpler instructions or a lookup table. However, lookup tables are not always supported. This PR enables the other kinds of replacements, even if lookup tables are not supported. Originally, lookup table creation was delayed until late stages of the compilation pipeline, because Patch is 20.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156477.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 02d6393dd5815..7485230127723 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -6466,6 +6466,12 @@ class SwitchReplacement {
/// Return the default value of the switch.
Constant *getDefaultValue();
+ /// Return true if the replacement is a lookup table.
+ bool isLookupTable();
+
+ /// Return true if the replacement is a bitmap.
+ bool isBitMap();
+
private:
// Depending on the switch, there are different alternatives.
enum {
@@ -6753,6 +6759,10 @@ static bool isTypeLegalForLookupTable(Type *Ty, const TargetTransformInfo &TTI,
Constant *SwitchReplacement::getDefaultValue() { return DefaultValue; }
+bool SwitchReplacement::isLookupTable() { return Kind == LookupTableKind; }
+
+bool SwitchReplacement::isBitMap() { return Kind == BitMapKind; }
+
static bool isSwitchDense(uint64_t NumCases, uint64_t CaseRange) {
// 40% is the default density for building a jump table in optsize/minsize
// mode. See also TargetLoweringBase::isSuitableForJumpTable(), which this
@@ -6918,16 +6928,12 @@ static void reuseTableCompare(
/// lookup tables.
static bool simplifySwitchLookup(SwitchInst *SI, IRBuilder<> &Builder,
DomTreeUpdater *DTU, const DataLayout &DL,
- const TargetTransformInfo &TTI) {
+ const TargetTransformInfo &TTI,
+ bool ConvertSwitchToLookupTable) {
assert(SI->getNumCases() > 1 && "Degenerate switch?");
BasicBlock *BB = SI->getParent();
Function *Fn = BB->getParent();
- // Only build lookup table when we have a target that supports it or the
- // attribute is not set.
- if (!TTI.shouldBuildLookupTables() ||
- (Fn->getFnAttribute("no-jump-tables").getValueAsBool()))
- return false;
// FIXME: If the switch is too sparse for a lookup table, perhaps we could
// split off a dense part and build a lookup table for that.
@@ -7086,6 +7092,34 @@ static bool simplifySwitchLookup(SwitchInst *SI, IRBuilder<> &Builder,
PhiToReplacementMap.insert({PHI, Replacement});
}
+ bool AnyLookupTables = any_of(
+ PhiToReplacementMap, [](auto &KV) { return KV.second.isLookupTable(); });
+
+ // A few conditions prevent the generation of lookup tables:
+ // 1. Not setting the ConvertSwitchToLookupTable option
+ // This option prevents the LUT creation until a later stage in the
+ // pipeline, because it would otherwise result in some
+ // difficult-to-analyze code and make pruning branches much harder.
+ // This is a problem if the switch expression itself can be restricted
+ // by inlining or CVP.
+ // 2. The target does not support lookup tables.
+ // 3. The "no-jump-tables" function attribute is set.
+ // However, these objections do not apply to other switch replacements, like
+ // the bitmap, so we only stop here if any of these conditions are met and we
+ // want to create a LUT. Otherwise, continue with the switch replacement.
+ if (AnyLookupTables &&
+ (!ConvertSwitchToLookupTable || !TTI.shouldBuildLookupTables() ||
+ Fn->getFnAttribute("no-jump-tables").getValueAsBool()))
+ return false;
+
+ bool AnyBitMaps = any_of(PhiToReplacementMap,
+ [](auto &KV) { return KV.second.isBitMap(); });
+
+ // Bitmaps can also cause missed optimizations due to difficult-to-analyze
+ // code. Delay the creation of bitmaps until later in the pipeline.
+ if (AnyBitMaps && !ConvertSwitchToLookupTable)
+ return false;
+
Builder.SetInsertPoint(SI);
// TableIndex is the switch condition - TableIndexOffset if we don't
// use the condition directly
@@ -7727,13 +7761,8 @@ bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
if (Options.ForwardSwitchCondToPhi && forwardSwitchConditionToPHI(SI))
return requestResimplify();
- // The conversion from switch to lookup tables results in difficult-to-analyze
- // code and makes pruning branches much harder. This is a problem if the
- // switch expression itself can still be restricted as a result of inlining or
- // CVP. Therefore, only apply this transformation during late stages of the
- // optimisation pipeline.
- if (Options.ConvertSwitchToLookupTable &&
- simplifySwitchLookup(SI, Builder, DTU, DL, TTI))
+ if (simplifySwitchLookup(SI, Builder, DTU, DL, TTI,
+ Options.ConvertSwitchToLookupTable))
return requestResimplify();
if (simplifySwitchOfPowersOfTwo(SI, Builder, DL, TTI))
diff --git a/llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll b/llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll
index c6ffb92d60d8d..bb1545deebe16 100644
--- a/llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll
+++ b/llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll
@@ -9,6 +9,7 @@ define internal i32 @table_switch(i32 %x) "branch-target-enforcement" {
; CHECK-NEXT: cmp r1, #3
; CHECK-NEXT: bhi .LBB0_6
; CHECK-NEXT: @ %bb.1: @ %entry
+; CHECK-NEXT: movs r0, #3
; CHECK-NEXT: .LCPI0_0:
; CHECK-NEXT: tbb [pc, r1]
; CHECK-NEXT: @ %bb.2:
@@ -22,7 +23,7 @@ define internal i32 @table_switch(i32 %x) "branch-target-enforcement" {
; CHECK-NEXT: movs r0, #2
; CHECK-NEXT: bx lr
; CHECK-NEXT: .LBB0_4: @ %bb3
-; CHECK-NEXT: movs r0, #3
+; CHECK-NEXT: movs r0, #1
; CHECK-NEXT: bx lr
; CHECK-NEXT: .LBB0_5: @ %bb4
; CHECK-NEXT: movs r0, #4
@@ -51,7 +52,7 @@ sw.epilog:
br label %return
return:
- %ret = phi i32 [ 0, %sw.epilog ], [ 1, %bb1 ], [ 2, %bb2 ], [ 3, %bb3 ], [ 4, %bb4 ]
+ %ret = phi i32 [ 0, %sw.epilog ], [ 3, %bb1 ], [ 2, %bb2 ], [ 1, %bb3 ], [ 4, %bb4 ]
ret i32 %ret
}
diff --git a/llvm/test/Transforms/SimplifyCFG/switch-transformations-no-lut.ll b/llvm/test/Transforms/SimplifyCFG/switch-transformations-no-lut.ll
new file mode 100644
index 0000000000000..01200136b6d40
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/switch-transformations-no-lut.ll
@@ -0,0 +1,467 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes='simplifycfg' < %s | FileCheck %s --check-prefix=OPTNOLUT
+; RUN: %if amdgpu-registered-target %{ opt -mtriple=amdgcn--amdpal -S -passes='simplifycfg<switch-to-lookup>' < %s | FileCheck %s --check-prefix=TTINOLUT %}
+;
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+define i32 @linear_transform_with_default(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @linear_transform_with_default(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*:]]
+; OPTNOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 4
+; OPTNOLUT-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul nsw i32 [[X]], 3
+; OPTNOLUT-NEXT: [[SWITCH_OFFSET:%.*]] = add nsw i32 [[SWITCH_IDX_MULT]], 1
+; OPTNOLUT-NEXT: [[IDX:%.*]] = select i1 [[TMP0]], i32 [[SWITCH_OFFSET]], i32 13
+; OPTNOLUT-NEXT: ret i32 [[IDX]]
+;
+; TTINOLUT-LABEL: define i32 @linear_transform_with_default(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*]]:
+; TTINOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 4
+; TTINOLUT-NEXT: br i1 [[TMP0]], label %[[SWITCH_LOOKUP:.*]], label %[[END:.*]]
+; TTINOLUT: [[SWITCH_LOOKUP]]:
+; TTINOLUT-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul nsw i32 [[X]], 3
+; TTINOLUT-NEXT: [[SWITCH_OFFSET:%.*]] = add nsw i32 [[SWITCH_IDX_MULT]], 1
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[END]]:
+; TTINOLUT-NEXT: [[IDX:%.*]] = phi i32 [ 13, %[[ENTRY]] ], [ [[SWITCH_OFFSET]], %[[SWITCH_LOOKUP]] ]
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %end [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+
+end:
+ %idx = phi i32 [ 1, %case0 ], [ 4, %case1 ], [ 7, %case2 ], [ 10, %case3 ], [ 13, %entry ]
+ ret i32 %idx
+}
+
+define i32 @linear_transform_with_outlier(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @linear_transform_with_outlier(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*]]:
+; OPTNOLUT-NEXT: switch i32 [[X]], label %[[END:.*]] [
+; OPTNOLUT-NEXT: i32 0, label %[[CASE0:.*]]
+; OPTNOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; OPTNOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; OPTNOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; OPTNOLUT-NEXT: i32 4, label %[[CASE4:.*]]
+; OPTNOLUT-NEXT: ]
+; OPTNOLUT: [[CASE0]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE1]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE2]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE3]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE4]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[END]]:
+; OPTNOLUT-NEXT: [[IDX:%.*]] = phi i32 [ 0, %[[CASE0]] ], [ 3, %[[CASE1]] ], [ 6, %[[CASE2]] ], [ 9, %[[CASE3]] ], [ 13, %[[CASE4]] ], [ 12, %[[ENTRY]] ]
+; OPTNOLUT-NEXT: ret i32 [[IDX]]
+;
+; TTINOLUT-LABEL: define i32 @linear_transform_with_outlier(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*]]:
+; TTINOLUT-NEXT: switch i32 [[X]], label %[[END:.*]] [
+; TTINOLUT-NEXT: i32 0, label %[[CASE0:.*]]
+; TTINOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; TTINOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; TTINOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; TTINOLUT-NEXT: i32 4, label %[[CASE4:.*]]
+; TTINOLUT-NEXT: ]
+; TTINOLUT: [[CASE0]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE1]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE2]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE3]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE4]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[END]]:
+; TTINOLUT-NEXT: [[IDX:%.*]] = phi i32 [ 0, %[[CASE0]] ], [ 3, %[[CASE1]] ], [ 6, %[[CASE2]] ], [ 9, %[[CASE3]] ], [ 13, %[[CASE4]] ], [ 12, %[[ENTRY]] ]
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %end [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ i32 4, label %case4
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+case4:
+ br label %end
+
+end:
+ %idx = phi i32 [ 0, %case0 ], [ 3, %case1 ], [ 6, %case2 ], [ 9, %case3 ], [ 13, %case4 ], [ 12, %entry ]
+ ret i32 %idx
+}
+
+define i32 @linear_transform_no_default(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @linear_transform_no_default(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*:]]
+; OPTNOLUT-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul nsw i32 [[X]], 3
+; OPTNOLUT-NEXT: ret i32 [[SWITCH_IDX_MULT]]
+;
+; TTINOLUT-LABEL: define i32 @linear_transform_no_default(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*:]]
+; TTINOLUT-NEXT: [[IDX:%.*]] = mul nsw i32 [[X]], 3
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ i32 4, label %case4
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+case4:
+ br label %end
+default:
+ unreachable
+
+end:
+ %idx = phi i32 [ 0, %case0 ], [ 3, %case1 ], [ 6, %case2 ], [ 9, %case3 ], [ 12, %case4 ]
+ ret i32 %idx
+}
+
+define i4 @bitmap_no_default(i32 %x) {
+; OPTNOLUT-LABEL: define i4 @bitmap_no_default(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*]]:
+; OPTNOLUT-NEXT: switch i32 [[X]], label %[[DEFAULT:.*]] [
+; OPTNOLUT-NEXT: i32 0, label %[[END:.*]]
+; OPTNOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; OPTNOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; OPTNOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; OPTNOLUT-NEXT: ]
+; OPTNOLUT: [[CASE1]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE2]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE3]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[DEFAULT]]:
+; OPTNOLUT-NEXT: unreachable
+; OPTNOLUT: [[END]]:
+; OPTNOLUT-NEXT: [[SWITCH_MASKED:%.*]] = phi i4 [ 2, %[[CASE1]] ], [ 4, %[[CASE2]] ], [ -8, %[[CASE3]] ], [ 0, %[[ENTRY]] ]
+; OPTNOLUT-NEXT: ret i4 [[SWITCH_MASKED]]
+;
+; TTINOLUT-LABEL: define i4 @bitmap_no_default(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*:]]
+; TTINOLUT-NEXT: [[SWITCH_CAST:%.*]] = trunc i32 [[X]] to i16
+; TTINOLUT-NEXT: [[SWITCH_SHIFTAMT:%.*]] = mul nuw nsw i16 [[SWITCH_CAST]], 4
+; TTINOLUT-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i16 -31712, [[SWITCH_SHIFTAMT]]
+; TTINOLUT-NEXT: [[IDX:%.*]] = trunc i16 [[SWITCH_DOWNSHIFT]] to i4
+; TTINOLUT-NEXT: ret i4 [[IDX]]
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+default:
+ unreachable
+
+end:
+ %idx = phi i4 [ 0, %case0 ], [ 2, %case1 ], [ 4, %case2 ], [ 8, %case3 ]
+ ret i4 %idx
+}
+
+define i4 @bitmap_with_default(i32 %x) {
+; OPTNOLUT-LABEL: define i4 @bitmap_with_default(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*]]:
+; OPTNOLUT-NEXT: switch i32 [[X]], label %[[DEFAULT:.*]] [
+; OPTNOLUT-NEXT: i32 0, label %[[END:.*]]
+; OPTNOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; OPTNOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; OPTNOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; OPTNOLUT-NEXT: ]
+; OPTNOLUT: [[CASE1]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE2]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE3]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[DEFAULT]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[END]]:
+; OPTNOLUT-NEXT: [[IDX:%.*]] = phi i4 [ 2, %[[CASE1]] ], [ 4, %[[CASE2]] ], [ -8, %[[CASE3]] ], [ -1, %[[DEFAULT]] ], [ 0, %[[ENTRY]] ]
+; OPTNOLUT-NEXT: ret i4 [[IDX]]
+;
+; TTINOLUT-LABEL: define i4 @bitmap_with_default(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*]]:
+; TTINOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 4
+; TTINOLUT-NEXT: br i1 [[TMP0]], label %[[SWITCH_LOOKUP:.*]], label %[[END:.*]]
+; TTINOLUT: [[SWITCH_LOOKUP]]:
+; TTINOLUT-NEXT: [[SWITCH_CAST:%.*]] = trunc i32 [[X]] to i16
+; TTINOLUT-NEXT: [[SWITCH_SHIFTAMT:%.*]] = mul nuw nsw i16 [[SWITCH_CAST]], 4
+; TTINOLUT-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i16 -31712, [[SWITCH_SHIFTAMT]]
+; TTINOLUT-NEXT: [[SWITCH_MASKED:%.*]] = trunc i16 [[SWITCH_DOWNSHIFT]] to i4
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[END]]:
+; TTINOLUT-NEXT: [[IDX:%.*]] = phi i4 [ [[SWITCH_MASKED]], %[[SWITCH_LOOKUP]] ], [ -1, %[[ENTRY]] ]
+; TTINOLUT-NEXT: ret i4 [[IDX]]
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+default:
+ br label %end
+
+end:
+ %idx = phi i4 [ 0, %case0 ], [ 2, %case1 ], [ 4, %case2 ], [ 8, %case3 ], [15, %default]
+ ret i4 %idx
+}
+
+define i32 @single_value_no_default(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @single_value_no_default(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*:]]
+; OPTNOLUT-NEXT: ret i32 2
+;
+; TTINOLUT-LABEL: define i32 @single_value_no_default(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*:]]
+; TTINOLUT-NEXT: ret i32 2
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ i32 4, label %case4
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+case4:
+ br label %end
+default:
+ unreachable
+
+end:
+ %idx = phi i32 [ 2, %case0 ], [ 2, %case1 ], [ 2, %case2 ], [ 2, %case3 ], [ 2, %case4 ]
+ ret i32 %idx
+}
+
+define i32 @single_value_withdefault(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @single_value_withdefault(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*:]]
+; OPTNOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 5
+; OPTNOLUT-NEXT: [[DOT:%.*]] = select i1 [[TMP0]], i32 2, i32 3
+; OPTNOLUT-NEXT: ret i32 [[DOT]]
+;
+; TTINOLUT-LABEL: define i32 @single_value_withdefault(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*:]]
+; TTINOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 5
+; TTINOLUT-NEXT: [[IDX:%.*]] = select i1 [[TMP0]], i32 2, i32 3
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ i32 4, label %case4
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+case4:
+ br label %end
+default:
+ br label %end
+
+end:
+ %idx = phi i32 [ 2, %case0 ], [ 2, %case1 ], [ 2, %case2 ], [ 2, %case3 ], [ 2, %case4 ], [ 3, %default ]
+ ret i32 %idx
+}
+
+define i32 @single_value_no_jump_tables(i32 %x) "no-jump-tables"="true" {
+; OPTNOLUT-LABEL: define i32 @single_value_no_jump_tables(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; OPTNOLUT-NEXT: [[ENTRY:.*:]]
+; OPTNOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 5
+; OPTNOLUT-NEXT: [[IDX:%.*]] = select i1 [[TMP0]], i32 2, i32 3
+; OPTNOLUT-NEXT: ret i32 [[IDX]]
+;
+; TTINOLUT-LABEL: define i32 @single_value_no_jump_tables(
+; TTINOLUT-SAME: i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; TTINOLUT-NEXT: [[ENTRY:.*:]]
+; TTINOLUT-NEXT: [[TMP0:%.*]] = icmp ult i32 [[X]], 5
+; TTINOLUT-NEXT: [[IDX:%.*]] = select i1 [[TMP0]], i32 2, i32 3
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %default [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ i32 4, label %case4
+ ]
+
+case0:
+ br label %end
+case1:
+ br label %end
+case2:
+ br label %end
+case3:
+ br label %end
+case4:
+ br label %end
+default:
+ br label %end
+
+end:
+ %idx = phi i32 [ 2, %case0 ], [ 2, %case1 ], [ 2, %case2 ], [ 2, %case3 ], [ 2, %case4 ], [ 3, %default ]
+ ret i32 %idx
+}
+
+define i32 @lookup_table(i32 %x) {
+; OPTNOLUT-LABEL: define i32 @lookup_table(
+; OPTNOLUT-SAME: i32 [[X:%.*]]) {
+; OPTNOLUT-NEXT: [[ENTRY:.*]]:
+; OPTNOLUT-NEXT: switch i32 [[X]], label %[[END:.*]] [
+; OPTNOLUT-NEXT: i32 0, label %[[CASE0:.*]]
+; OPTNOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; OPTNOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; OPTNOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; OPTNOLUT-NEXT: ]
+; OPTNOLUT: [[CASE0]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE1]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE2]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[CASE3]]:
+; OPTNOLUT-NEXT: br label %[[END]]
+; OPTNOLUT: [[END]]:
+; OPTNOLUT-NEXT: [[IDX:%.*]] = phi i32 [ 13, %[[CASE0]] ], [ 3, %[[CASE1]] ], [ 11, %[[CASE2]] ], [ 8, %[[CASE3]] ], [ 24, %[[ENTRY]] ]
+; OPTNOLUT-NEXT: ret i32 [[IDX]]
+;
+; TTINOLUT-LABEL: define i32 @lookup_table(
+; TTINOLUT-SAME: i32 [[X:%.*]]) {
+; TTINOLUT-NEXT: [[ENTRY:.*]]:
+; TTINOLUT-NEXT: switch i32 [[X]], label %[[END:.*]] [
+; TTINOLUT-NEXT: i32 0, label %[[CASE0:.*]]
+; TTINOLUT-NEXT: i32 1, label %[[CASE1:.*]]
+; TTINOLUT-NEXT: i32 2, label %[[CASE2:.*]]
+; TTINOLUT-NEXT: i32 3, label %[[CASE3:.*]]
+; TTINOLUT-NEXT: ]
+; TTINOLUT: [[CASE0]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE1]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE2]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[CASE3]]:
+; TTINOLUT-NEXT: br label %[[END]]
+; TTINOLUT: [[END]]:
+; TTINOLUT-NEXT: [[IDX:%.*]] = phi i32 [ 13, %[[CASE0]] ], [ 3, %[[CASE1]] ], [ 11, %[[CASE2]] ], [ 8, %[[CASE3]] ], [ 24, %[[ENTRY]] ]
+; TTINOLUT-NEXT: ret i32 [[IDX]]
+;
+entry:
+ switch i32 %x, label %end [
+ i32 0, label %case0
+ i32 1, label %case1
+ i32 2, label %case2
+ i32 3, label %case3
+ ]
+
+c...
[truncated]
|
We should move the update of statistics like NumBitMaps from the analysis phase to the transform phase, to avoid over-reporting if we bail out of the transform. |
Since the SwitchReplacement is created before we decide to replace the switch, only increase the table counters when we actually create them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The test exposes existing root profile propagation issues
…157961) The test exposes existing root profile propagation issues
While we do not want to form actual lookup tables early, we do want to perform some optimizations, as they may enable inlining of the much simpler form. Builds on llvm#156477, which originally included this change as well.
While we do not want to form actual lookup tables early, we do want to perform some optimizations, as they may enable inlining of the much simpler form. Builds on llvm#156477, which originally included this change as well.
In some cases, we can replace a switch with simpler instructions or a lookup table.
For instance, if every case results in the same value, we can simply replace the switch
with that single value.
However, lookup tables are not always supported.
Targets, function attributes and compiler options can deactivate lookup table creation.
Currently, even simpler switch replacements like the single value optimization do not
get applied, because we only enable these transformations if lookup tables are enabled.
This PR enables the other kinds of replacements, even if lookup tables are not supported.
First, it checks if the potential replacements are lookup tables.
If they are, then check if lookup tables are supported and whether to continue.
If they are not, then we can apply the other transformations.
Originally, lookup table creation was delayed until late stages of the compilation pipeline, because
it can result in difficult-to-analyze code and prevent other optimizations.
As a side effect of this change, we can also enable the simpler optimizations much earlier in the
compilation process.