[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166

david-arm · 2023-11-03T10:09:45Z

There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.

There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.

llvmbot · 2023-11-03T10:10:19Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: David Sherwood (david-arm)

Changes

There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.

Patch is 23.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71166.diff

17 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+6)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/arm64-jumptable.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/jump-table-32.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/jump-table-exynos.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/jump-table.ll (+6-6)
(modified) llvm/test/CodeGen/AArch64/max-jump-table.ll (+5-5)
(modified) llvm/test/CodeGen/AArch64/min-jump-table.ll (+92-4)
(modified) llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll (+23-18)
(modified) llvm/test/CodeGen/AArch64/switch-unreachable-default.ll (+1-1)
(modified) llvm/test/CodeGen/AArch64/win64-jumptable.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/wineh-bti.ll (+1-1)
(modified) llvm/test/CodeGen/Generic/machine-function-splitter.ll (+4-4)
(modified) llvm/test/DebugInfo/COFF/jump-table.ll (+1-1)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 291f0c8c5d991c6..002b95c68272f5c 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -144,6 +144,10 @@ static cl::opt<bool> EnableExtToTBL("aarch64-enable-ext-to-tbl", cl::Hidden,
 static cl::opt<unsigned> MaxXors("aarch64-max-xors", cl::init(16), cl::Hidden,
                                  cl::desc("Maximum of xors"));
 
+static cl::opt<unsigned> AArch64MinimumJumpTableEntries(
+    "aarch64-min-jump-table-entries", cl::init(13), cl::Hidden,
+    cl::desc("Set minimum number of entries to use a jump table on AArch64"));
+
 /// Value type used for condition codes.
 static const MVT MVT_CC = MVT::i32;
 
@@ -1653,6 +1657,8 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
   PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
 
   IsStrictFPEnabled = true;
+
+  setMinimumJumpTableEntries(AArch64MinimumJumpTableEntries);
 }
 
 void AArch64TargetLowering::addTypeForNEON(MVT VT) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
index 4e4297b7a5e22f8..9371edd439fe49c 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-switch.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-; RUN: llc -global-isel -mtriple aarch64 -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -simplify-mir -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
+; RUN: llc -global-isel -mtriple aarch64 -aarch64-min-jump-table-entries=4 -O0 -aarch64-enable-atomic-cfg-tidy=0 -stop-after=irtranslator -simplify-mir -verify-machineinstrs %s -o - 2>&1 | FileCheck %s
 
 define i32 @switch(i32 %argc) {
   ; CHECK-LABEL: name: switch
diff --git a/llvm/test/CodeGen/AArch64/arm64-jumptable.ll b/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
index d4ac9e72b28ff62..7d9adf92a6a95c5 100644
--- a/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-jumptable.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=arm64-apple-ios < %s | FileCheck %s
-; RUN: llc -mtriple=arm64-linux-gnu < %s | FileCheck %s --check-prefix=CHECK-LINUX
+; RUN: llc -mtriple=arm64-apple-ios -aarch64-min-jump-table-entries=4 < %s | FileCheck %s
+; RUN: llc -mtriple=arm64-linux-gnu -aarch64-min-jump-table-entries=4 < %s | FileCheck %s --check-prefix=CHECK-LINUX
 ; <rdar://11417675>
 
 define void @sum(i32 %a, ptr %to, i32 %c) {
diff --git a/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll b/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
index d9ed54fb86bf151..e3e7f42526f785c 100644
--- a/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
+++ b/llvm/test/CodeGen/AArch64/bti-branch-relaxation.ll
@@ -1,4 +1,4 @@
-; RUN: llc %s -o - | FileCheck %s
+; RUN: llc %s -aarch64-min-jump-table-entries=4 -o - | FileCheck %s
 target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 target triple = "aarch64"
 
diff --git a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
index 73b180dc4ae765f..0f208f8ed905241 100644
--- a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
+++ b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -mtriple=arm64-apple-ios < %s | FileCheck %s
+; RUN: llc -aarch64-min-jump-table-entries=4 -mtriple=arm64-apple-ios < %s | FileCheck %s
 
 ; Check there's no assert in spilling from implicit-def operands on an
 ; IMPLICIT_DEF.
diff --git a/llvm/test/CodeGen/AArch64/jump-table-32.ll b/llvm/test/CodeGen/AArch64/jump-table-32.ll
index d8572e901af29ec..e65813de8943e3b 100644
--- a/llvm/test/CodeGen/AArch64/jump-table-32.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table-32.ll
@@ -1,4 +1,4 @@
-; RUN: llc -verify-machineinstrs -o - %s -mtriple=arm64_32-apple-ios7.0 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=arm64_32-apple-ios7.0 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
 
 define i32 @test_jumptable(i32 %in) {
 ; CHECK: test_jumptable
diff --git a/llvm/test/CodeGen/AArch64/jump-table-exynos.ll b/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
index b5b400ecfbffcac..61b0df5de2af3a4 100644
--- a/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table-exynos.ll
@@ -1,5 +1,5 @@
-; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mattr=+force-32bit-jump-tables -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
-; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m3 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mattr=+force-32bit-jump-tables -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m3 -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
 
 ; Exynos doesn't want jump tables to be compressed for now.
 
diff --git a/llvm/test/CodeGen/AArch64/jump-table.ll b/llvm/test/CodeGen/AArch64/jump-table.ll
index 6c32444657542db..bb6c74b3b9df6c7 100644
--- a/llvm/test/CodeGen/AArch64/jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/jump-table.ll
@@ -1,9 +1,9 @@
-; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
-; RUN: llc -no-integrated-as -code-model=large -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-LARGE %s
-; RUN: llc -no-integrated-as -code-model=large -relocation-model=pic -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-PIC %s
-; RUN: llc -no-integrated-as -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-enable-atomic-cfg-tidy=0 -o - %s | FileCheck --check-prefix=CHECK-PIC %s
-; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-IOS %s
-; RUN: llc -no-integrated-as -code-model=tiny -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-TINY %s
+; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck %s
+; RUN: llc -no-integrated-as -code-model=large -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-LARGE %s
+; RUN: llc -no-integrated-as -code-model=large -relocation-model=pic -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-PIC %s
+; RUN: llc -no-integrated-as -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 -o - %s | FileCheck --check-prefix=CHECK-PIC %s
+; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=arm64-apple-ios -aarch64-min-jump-table-entries=4 -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-IOS %s
+; RUN: llc -no-integrated-as -code-model=tiny -verify-machineinstrs -o - %s -aarch64-min-jump-table-entries=4 -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 | FileCheck --check-prefix=CHECK-TINY %s
 
 define i32 @test_jumptable(i32 %in) {
 ; CHECK: test_jumptable
diff --git a/llvm/test/CodeGen/AArch64/max-jump-table.ll b/llvm/test/CodeGen/AArch64/max-jump-table.ll
index d01924a9a54272e..1268f5e6167ad8d 100644
--- a/llvm/test/CodeGen/AArch64/max-jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/max-jump-table.ll
@@ -1,8 +1,8 @@
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40                         -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=4  -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=8  -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m3         -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40                         -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=4  -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=8  -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -max-jump-table-size=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 -jump-table-density=40 -mcpu=exynos-m3         -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t
 
 declare void @ext(i32, i32)
 
diff --git a/llvm/test/CodeGen/AArch64/min-jump-table.ll b/llvm/test/CodeGen/AArch64/min-jump-table.ll
index bf6bddf7b9c4f73..3b3f79746de0ab7 100644
--- a/llvm/test/CodeGen/AArch64/min-jump-table.ll
+++ b/llvm/test/CodeGen/AArch64/min-jump-table.ll
@@ -1,7 +1,9 @@
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=2 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK2  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4  < %t
-; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=2 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK2  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -aarch64-min-jump-table-entries=12 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK12  < %t
+; RUN: llc %s -O2 -print-after-isel -mtriple=aarch64-linux-gnu -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK-DEFAULT  < %t
 
 declare void @ext(i32, i32)
 
@@ -16,6 +18,8 @@ entry:
 ; CHECK2-NEXT: Jump Tables:
 ; CHECK4-NOT: {{^}}Jump Tables:
 ; CHECK8-NOT: {{^}}Jump Tables:
+; CHECK12-NOT: {{^}}Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
 
 bb1: tail call void @ext(i32 1, i32 0) br label %return
 bb2: tail call void @ext(i32 2, i32 2) br label %return
@@ -36,6 +40,8 @@ entry:
 ; CHECK2-NEXT: Jump Tables:
 ; CHECK4-NEXT: Jump Tables:
 ; CHECK8-NOT: {{^}}Jump Tables:
+; CHECK12-NOT: {{^}}Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
 
 bb1: tail call void @ext(i32 1, i32 0) br label %return
 bb2: tail call void @ext(i32 3, i32 2) br label %return
@@ -58,6 +64,83 @@ entry:
     i32 9, label %bb8
   ]
 ; CHECK-LABEL: function jt8:
+; CHECK0-NEXT: Jump Tables:
+; CHECK2-NEXT: Jump Tables:
+; CHECK4-NEXT: Jump Tables:
+; CHECK8-NEXT: Jump Tables:
+; CHECK12-NOT: Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
+
+bb1: tail call void @ext(i32 1, i32 0) br label %return
+bb2: tail call void @ext(i32 2, i32 2) br label %return
+bb3: tail call void @ext(i32 3, i32 4) br label %return
+bb4: tail call void @ext(i32 4, i32 6) br label %return
+bb5: tail call void @ext(i32 5, i32 8) br label %return
+bb6: tail call void @ext(i32 6, i32 10) br label %return
+bb7: tail call void @ext(i32 7, i32 12) br label %return
+bb8: tail call void @ext(i32 8, i32 14) br label %return
+
+return: ret i32 %b
+}
+
+define i32 @jt12(i32 %a, i32 %b) {
+entry:
+  switch i32 %a, label %return [
+    i32 1, label %bb1
+    i32 2, label %bb2
+    i32 3, label %bb3
+    i32 4, label %bb4
+    i32 5, label %bb5
+    i32 6, label %bb6
+    i32 7, label %bb7
+    i32 8, label %bb8
+    i32 9, label %bb9
+    i32 10, label %bb10
+    i32 11, label %bb11
+    i32 12, label %bb12
+  ]
+; CHECK-LABEL: function jt12:
+; CHECK0-NEXT: Jump Tables:
+; CHECK2-NEXT: Jump Tables:
+; CHECK4-NEXT: Jump Tables:
+; CHECK8-NEXT: Jump Tables:
+; CHECK12-NEXT: Jump Tables:
+; CHECK-DEFAULT-NOT: {{^}}Jump Tables:
+
+bb1: tail call void @ext(i32 1, i32 0) br label %return
+bb2: tail call void @ext(i32 2, i32 2) br label %return
+bb3: tail call void @ext(i32 3, i32 4) br label %return
+bb4: tail call void @ext(i32 4, i32 6) br label %return
+bb5: tail call void @ext(i32 5, i32 8) br label %return
+bb6: tail call void @ext(i32 6, i32 10) br label %return
+bb7: tail call void @ext(i32 7, i32 12) br label %return
+bb8: tail call void @ext(i32 8, i32 14) br label %return
+bb9: tail call void @ext(i32 9, i32 16) br label %return
+bb10: tail call void @ext(i32 10, i32 18) br label %return
+bb11: tail call void @ext(i32 11, i32 20) br label %return
+bb12: tail call void @ext(i32 12, i32 22) br label %return
+
+return: ret i32 %b
+}
+
+define i32 @jt13(i32 %a, i32 %b) {
+entry:
+  switch i32 %a, label %return [
+    i32 1, label %bb1
+    i32 2, label %bb2
+    i32 3, label %bb3
+    i32 4, label %bb4
+    i32 5, label %bb5
+    i32 6, label %bb6
+    i32 7, label %bb7
+    i32 8, label %bb8
+    i32 9, label %bb9
+    i32 10, label %bb10
+    i32 11, label %bb11
+    i32 12, label %bb12
+    i32 13, label %bb13
+  ]
+; CHECK-LABEL: function jt13:
 ; CHECK-NEXT: Jump Tables:
 
 bb1: tail call void @ext(i32 1, i32 0) br label %return
@@ -68,6 +151,11 @@ bb5: tail call void @ext(i32 5, i32 8) br label %return
 bb6: tail call void @ext(i32 6, i32 10) br label %return
 bb7: tail call void @ext(i32 7, i32 12) br label %return
 bb8: tail call void @ext(i32 8, i32 14) br label %return
+bb9: tail call void @ext(i32 9, i32 16) br label %return
+bb10: tail call void @ext(i32 10, i32 18) br label %return
+bb11: tail call void @ext(i32 11, i32 20) br label %return
+bb12: tail call void @ext(i32 12, i32 22) br label %return
+bb13: tail call void @ext(i32 13, i32 24) br label %return
 
 return: ret i32 %b
 }
diff --git a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
index 15657730c2cdcb6..106f6bb856b6b8a 100644
--- a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
+++ b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mtriple=aarch64 %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64 -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s
 
 define void @f0() "patchable-function-entry"="0" "branch-target-enforcement"="true" {
 ; CHECK-LABEL: f0:
diff --git a/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll b/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
index c150cb889313ac9..caa28b31a6f6539 100644
--- a/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
+++ b/llvm/test/CodeGen/AArch64/redundant-mov-from-zero-extend.ll
@@ -11,33 +11,37 @@ define i32 @test(i32 %input, i32 %n, i32 %a) {
 ; CHECK-NEXT:  .LBB0_2: // %bb.0
 ; CHECK-NEXT:    add w8, w0, w1
 ; CHECK-NEXT:    mov w0, #100 // =0x64
-; CHECK-NEXT:    cmp w8, #4
-; CHECK-NEXT:    b.hi .LBB0_5
+; CHECK-NEXT:    cmp w8, #1
+; CHECK-NEXT:    b.le .LBB0_7
 ; CHECK-NEXT:  // %bb.3: // %bb.0
-; CHECK-NEXT:    adrp x9, .LJTI0_0
-; CHECK-NEXT:    add x9, x9, :lo12:.LJTI0_0
-; CHECK-NEXT:    adr x10, .LBB0_4
-; CHECK-NEXT:    ldrb w11, [x9, x8]
-; CHECK-NEXT:    add x10, x10, x11, lsl #2
-; CHECK-NEXT:    br x10
-; CHECK-NEXT:  .LBB0_4: // %sw.bb
-; CHECK-NEXT:    add w0, w2, #1
-; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_5: // %bb.0
+; CHECK-NEXT:    cmp w8, #2
+; CHECK-NEXT:    b.eq .LBB0_10
+; CHECK-NEXT:  // %bb.4: // %bb.0
+; CHECK-NEXT:    cmp w8, #4
+; CHECK-NEXT:    b.eq .LBB0_11
+; CHECK-NEXT:  // %bb.5: // %bb.0
 ; CHECK-NEXT:    cmp w8, #200
-; CHECK-NEXT:    b.ne .LBB0_9
+; CHECK-NEXT:    b.ne .LBB0_12
 ; CHECK-NEXT:  // %bb.6: // %sw.bb7
 ; CHECK-NEXT:    add w0, w2, #7
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_7: // %sw.bb3
+; CHECK-NEXT:  .LBB0_7: // %bb.0
+; CHECK-NEXT:    cbz w8, .LBB0_13
+; CHECK-NEXT:  // %bb.8: // %bb.0
+; CHECK-NEXT:    cmp w8, #1
+; CHECK-NEXT:    b.ne .LBB0_12
+; CHECK-NEXT:  // %bb.9: // %sw.bb1
+; CHECK-NEXT:    add w0, w2, #3
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_10: // %sw.bb3
 ; CHECK-NEXT:    add w0, w2, #4
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_8: // %sw.bb5
+; CHECK-NEXT:  .LBB0_11: // %sw.bb5
 ; CHECK-NEXT:    add w0, w2, #5
-; CHECK-NEXT:  .LBB0_9: // %return
+; CHECK-NEXT:  .LBB0_12: // %return
 ; CHECK-NEXT:    ret
-; CHECK-NEXT:  .LBB0_10: // %sw.bb1
-; CHECK-NEXT:    add w0, w2, #3
+; CHECK-NEXT:  .LBB0_13: // %sw.bb
+; CHECK-NEXT:    add w0, w2, #1
 ; CHECK-NEXT:    ret
 entry:
   %b = add nsw i32 %input, %n
@@ -77,3 +81,4 @@ return:
   %retval.0 = phi i32 [ %add8, %sw.bb7 ], [ %add6, %sw.bb5 ], [ %add4, %sw.bb3 ], [ %add2, %sw.bb1 ], [ %add, %sw.bb ], [ 100, %bb.0 ], [ 0, %entry ]
   ret i32 %retval.0
 }
+
diff --git a/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll b/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
index 9acc5a150b630f2..949485bf52f2e8d 100644
--- a/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
+++ b/llvm/test/CodeGen/AArch64/switch-unreachable-default.ll
@@ -1,4 +1,4 @@
-; RUN: llc -O3 -o - %s | FileCheck %s
+; RUN: llc -O3 -aarch64-min-jump-table-entries=4 -o - %s | FileCheck %s
 
 ; Test that the output in the presence of an unreachable default does not have
 ; a compare and branch at the top of the switch to handle the default case.
diff --git a/llvm/test/CodeGen/AArch64/win64-jumptable.ll b/llvm/test/CodeGen/AArch64/win64-jumptable.ll
index 0b9b7deceae1138..f9f2b0bf0ca5cf5 100644
--- a/llvm/test/CodeGen/AArch64/win64-jumptable.ll
+++ b/llvm/test/CodeGen/AArch64/win64-jumptable.ll
@@ -1,5 +1,5 @@
-; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-enable-compress-jump-tables=0 | FileCheck %s
-; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-enable-compress-jump-tables=0 -filetype=obj | llvm-readobj --unwind - | FileCheck %s -check-prefix=UNWIND
+; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-min-jump-table-entries=4 -aarch64-enable-compress-jump-tables=0 | FileCheck %s
+; RUN: llc -o - %s -mtriple=aarch64-windows -aarch64-min-jump-table-entries=4 -aarch64-enable-compress-jump-tables=0 -filetype=obj | llvm-readobj --...
[truncated]

davemgreen · 2023-11-06T11:17:12Z

As with last time this came up I have no strong opinion on whether one value is better than another, so this sounds OK.

What happens with minsize? Should that have a lower limit?

I've changed getMinimumJumpTableEntries to now take a Function parameter so that we can query the minsize attribute and only increase the threshold if minsize is absent.

paulwalker-arm

Now that you're making MinimumJumpTableEntries a per function property perhaps it’s better stored in AArch64Subtarget and then you don’t need to change the existing getMinimumJumpTableEntries interface. This will be more in keeping with how other per function settings are handled.

Instead of querying the function attributes every time, we can let the subtarget decide once at construction.

david-arm · 2023-11-13T08:57:19Z

Now that you're making MinimumJumpTableEntries a per function property perhaps it’s better stored in AArch64Subtarget and then you don’t need to change the existing getMinimumJumpTableEntries interface. This will be more in keeping with how other per function settings are handled.

Done!

github-actions · 2023-11-13T08:58:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

davemgreen · 2023-11-14T11:36:57Z

Thanks. It looks like some of the formatting could be redone? The failing test at least looks unrelated.

I see the Arm backend does the same thing with adding MinSize to the subtarget. This sounds OK to me if the formatting is happy.

david-arm · 2023-11-14T12:05:53Z

I ran make check-all using HEAD of LLVM and tests pass.

nikic · 2023-11-14T13:25:16Z

How does this affect performance in cases where the last switch case is usually taken?

david-arm · 2023-11-14T13:33:03Z

How does this affect performance in cases where the last switch case is usually taken?

Honestly speaking I don't know! As mentioned in the commit message, I just changed this number until I got the best performance results, so it's quite possible that in some benchmarks that I've not tested this does make things worse. If we discover issues later on we can revisit this and perhaps refine the algorithm.

davemgreen · 2023-11-14T14:29:10Z

How does this affect performance in cases where the last switch case is usually taken?

If I understand what you are saying, it should generate a tree of branches so the positions of the cases shouldn't matter a huge amount. It can depend on the differences between the cases and the branch targets though, they might not match.

Let us know if you have reason to think this would be worse in some way, it's not a very easy thing to measure accurately. A load from a constant pool into an indirect jump can be worse than a series of branches that can each be independently branch predicted.

nikic · 2023-11-14T15:02:31Z

How does this affect performance in cases where the last switch case is usually taken?

If I understand what you are saying, it should generate a tree of branches so the positions of the cases shouldn't matter a huge amount. It can depend on the differences between the cases and the branch targets though, they might not match.

Let us know if you have reason to think this would be worse in some way, it's not a very easy thing to measure accurately. A load from a constant pool into an indirect jump can be worse than a series of branches that can each be independently branch predicted.

I was mainly just wondering about what the worst case impact here would be for an unlucky switch. If I understand correctly, previously the switch (without jump table) would need at most two branches and now it needs at most four. Does that sound correct? (I was thinking in terms of a linear chain previously -- with a binary tree this doesn't sound so bad.)

sjoerdmeijer · 2023-11-14T15:53:59Z

FWIW: this gives < 0.5% on the Neoverse V2 for Perlbench. Might even be noise because I haven't done a lot of runs.

david-arm · 2023-11-14T16:04:27Z

FWIW: this gives < 0.5% on the Neoverse V2 for Perlbench. Might even be noise because I haven't done a lot of runs.

That's interesting - and an improvement, even if minor, is still good news! It is also possible to tweak the flag per CPU if a higher or lower number would help.

sjoerdmeijer · 2023-11-15T09:39:49Z

I think it's actually noise, don't think I see improvements on the V2.
Perhaps that's still okay because it is shows improvements on the V1. But now I am actually wondering if the V1 improvement is an actual improvement, or do we get lucky with secondary effects such as different function/loop/block alignments?

david-arm · 2023-11-15T10:11:22Z

I think it's actually noise, don't think I see improvements on the V2. Perhaps that's still okay because it is shows improvements on the V1. But now I am actually wondering if the V1 improvement is an actual improvement, or do we get lucky with secondary effects such as different function/loop/block alignments?

I ran this quite a few times on neoverse-v1 hardware and consistently saw an improvement, particularly with #72132 applied as well. Although I appreciate there is a lot of noise in perlbench. From observation it seems that GCC also takes a similar approach when deciding to create jump tables for perlbench.

david-arm · 2023-11-15T10:21:13Z

I think it's actually noise, don't think I see improvements on the V2. Perhaps that's still okay because it is shows improvements on the V1. But now I am actually wondering if the V1 improvement is an actual improvement, or do we get lucky with secondary effects such as different function/loop/block alignments?

What I also saw was a gradual improvement in performance as you increased the threshold for creating jump tables. For example, I'd get consistently better results with a minimum threshold of 6, then even better with 10, then the best around 13. When I saw that trend of improving performance with higher threshold it made me think this wasn't just luck or coincidence, and that there is a genuine correlation.

sjoerdmeijer · 2023-11-15T10:47:13Z

I think it's actually noise, don't think I see improvements on the V2. Perhaps that's still okay because it is shows improvements on the V1. But now I am actually wondering if the V1 improvement is an actual improvement, or do we get lucky with secondary effects such as different function/loop/block alignments?

What I also saw was a gradual improvement in performance as you increased the threshold for creating jump tables. For example, I'd get consistently better results with a minimum threshold of 6, then even better with 10, then the best around 13. When I saw that trend of improving performance with higher threshold it made me think this wasn't just luck or coincidence, and that there is a genuine correlation.

Ok, that sounds good.

Wilco1 · 2023-11-20T14:47:59Z

I think it's actually noise, don't think I see improvements on the V2. Perhaps that's still okay because it is shows improvements on the V1. But now I am actually wondering if the V1 improvement is an actual improvement, or do we get lucky with secondary effects such as different function/loop/block alignments?

Check the instruction count, cache and branch misses. Much of the gain comes from a huge reduction in instruction count, the rest from avoiding cache misses from loading the table entries and indirect branch misses. As indirect branch prediction improves, a lower value may be slightly better in future cores, however I don't believe it ever makes sense to use a table with less than 8 cases.

…ts (llvm#71166) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

This is like what AArch64 has done in llvm#71166 except that we don't handle `HasMinSize` case now.

This is like what AArch64 has done in #71166 except that we don't handle `HasMinSize` case now.

llvmbot added backend:AArch64 debuginfo llvm:globalisel labels Nov 3, 2023

david-arm requested review from davemgreen, paulwalker-arm and jaykang10 November 3, 2023 10:10

paulwalker-arm approved these changes Nov 9, 2023

View reviewed changes

Take minsize attribute into account

bb2a9b7

I've changed getMinimumJumpTableEntries to now take a Function parameter so that we can query the minsize attribute and only increase the threshold if minsize is absent.

paulwalker-arm reviewed Nov 10, 2023

View reviewed changes

Let the subtarget decide the minimum at construction

5514ecd

Instead of querying the function attributes every time, we can let the subtarget decide once at construction.

Fix clang format

328d680

davemgreen approved these changes Nov 14, 2023

View reviewed changes

Really fix the clang formatting issues

6020ea9

paulwalker-arm approved these changes Nov 14, 2023

View reviewed changes

david-arm merged commit bdc0afc into llvm:main Nov 14, 2023
2 of 3 checks passed

nikic mentioned this pull request Nov 15, 2023

[SimplifyCFG] Add optimization for switches of powers of two #70977

Merged

david-arm deleted the min_jt branch November 20, 2023 11:38

wangpc-pp reviewed Nov 21, 2023

View reviewed changes

llvm/lib/Target/AArch64/AArch64Subtarget.cpp Show resolved Hide resolved

wangpc-pp mentioned this pull request Nov 21, 2023

[RISCV] Add MinimumJumpTableEntries to TuneInfo #72963

Merged

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Nov 23, 2023

[RISCV] Add MinimumJumpTableEntries to TuneInfo

5a715e0

This is like what AArch64 has done in llvm#71166 except that we don't handle `HasMinSize` case now.

wangpc-pp added a commit that referenced this pull request Nov 23, 2023

[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963)

5973272

This is like what AArch64 has done in #71166 except that we don't handle `HasMinSize` case now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166

[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166

david-arm commented Nov 3, 2023

llvmbot commented Nov 3, 2023 •

edited

davemgreen commented Nov 6, 2023

paulwalker-arm left a comment

david-arm commented Nov 13, 2023

github-actions bot commented Nov 13, 2023 •

edited

davemgreen commented Nov 14, 2023

david-arm commented Nov 14, 2023

nikic commented Nov 14, 2023

david-arm commented Nov 14, 2023

davemgreen commented Nov 14, 2023

nikic commented Nov 14, 2023

sjoerdmeijer commented Nov 14, 2023

david-arm commented Nov 14, 2023

sjoerdmeijer commented Nov 15, 2023

david-arm commented Nov 15, 2023

david-arm commented Nov 15, 2023

sjoerdmeijer commented Nov 15, 2023

Wilco1 commented Nov 20, 2023

[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166

[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets #71166

Conversation

david-arm commented Nov 3, 2023

llvmbot commented Nov 3, 2023 • edited

davemgreen commented Nov 6, 2023

paulwalker-arm left a comment

Choose a reason for hiding this comment

david-arm commented Nov 13, 2023

github-actions bot commented Nov 13, 2023 • edited

davemgreen commented Nov 14, 2023

david-arm commented Nov 14, 2023

nikic commented Nov 14, 2023

david-arm commented Nov 14, 2023

davemgreen commented Nov 14, 2023

nikic commented Nov 14, 2023

sjoerdmeijer commented Nov 14, 2023

david-arm commented Nov 14, 2023

sjoerdmeijer commented Nov 15, 2023

david-arm commented Nov 15, 2023

david-arm commented Nov 15, 2023

sjoerdmeijer commented Nov 15, 2023

Wilco1 commented Nov 20, 2023

llvmbot commented Nov 3, 2023 •

edited

github-actions bot commented Nov 13, 2023 •

edited