-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TailDuplicator] Do not restrict the computed gotos #114990
Conversation
|
@llvm/pr-subscribers-backend-x86 Author: DianQK (DianQK) ChangesFixes #106846. This is what I learned from GCC. I found that GCC does not duplicate the BB that has indirect jumps with the jump table. I believe GCC has provided a clear explanation here: > Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to speed up edge based data flow. We used to not unfactor them again, which can seriously pessimize code with many computed jumps in the source code, such as interpreters. Full diff: https://github.com/llvm/llvm-project/pull/114990.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index ead6bbe1d5f641..6d268628ec14b4 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -986,8 +986,12 @@ class MachineInstr
/// Return true if this is an indirect branch, such as a
/// branch through a register.
- bool isIndirectBranch(QueryType Type = AnyInBundle) const {
- return hasProperty(MCID::IndirectBranch, Type);
+ bool isIndirectBranch(QueryType Type = AnyInBundle,
+ bool IncludeJumpTable = true) const {
+ return hasProperty(MCID::IndirectBranch, Type) &&
+ (IncludeJumpTable || !llvm::any_of(operands(), [](const auto &Op) {
+ return Op.isJTI();
+ }));
}
/// Return true if this is a branch which may fall
diff --git a/llvm/lib/CodeGen/TailDuplicator.cpp b/llvm/lib/CodeGen/TailDuplicator.cpp
index 3f2e1511d403a0..988c58beac20f6 100644
--- a/llvm/lib/CodeGen/TailDuplicator.cpp
+++ b/llvm/lib/CodeGen/TailDuplicator.cpp
@@ -603,17 +603,19 @@ bool TailDuplicator::shouldTailDuplicate(bool IsSimple,
TailBB.canFallThrough())
return false;
- // If the target has hardware branch prediction that can handle indirect
- // branches, duplicating them can often make them predictable when there
- // are common paths through the code. The limit needs to be high enough
- // to allow undoing the effects of tail merging and other optimizations
- // that rearrange the predecessors of the indirect branch.
-
- bool HasIndirectbr = false;
+ // Only duplicate the blocks containing computed gotos. This basically
+ // unfactors computed gotos that were factored early on in the compilation
+ // process to speed up edge based data flow. If we do not unfactor them again,
+ // it can seriously pessimize code with many computed jumps in the source
+ // code, such as interpreters.
+ bool HasComputedGoto = false;
if (!TailBB.empty())
- HasIndirectbr = TailBB.back().isIndirectBranch();
+ HasComputedGoto = TailBB.back().isIndirectBranch(
+ /*Type=*/MachineInstr::AnyInBundle,
+ // Jump tables are not considered computed gotos.
+ /*IncludeJumpTable=*/false);
- if (HasIndirectbr && PreRegAlloc)
+ if (HasComputedGoto && PreRegAlloc)
MaxDuplicateCount = TailDupIndirectBranchSize;
// Check the instructions in the block to determine whether tail-duplication
@@ -685,7 +687,7 @@ bool TailDuplicator::shouldTailDuplicate(bool IsSimple,
}
}
- if (HasIndirectbr && PreRegAlloc)
+ if (HasComputedGoto && PreRegAlloc)
return true;
if (IsSimple)
diff --git a/llvm/test/CodeGen/X86/tail-dup-computed-goto.mir b/llvm/test/CodeGen/X86/tail-dup-computed-goto.mir
new file mode 100644
index 00000000000000..b1c699c11f4619
--- /dev/null
+++ b/llvm/test/CodeGen/X86/tail-dup-computed-goto.mir
@@ -0,0 +1,310 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=x86_64-unknown-linux-gnu -run-pass=early-tailduplication -tail-dup-size=0 %s -o - | FileCheck %s
+# Check that only the computed goto is duplicated.
+--- |
+ declare i64 @f0()
+ declare i64 @f1()
+ declare i64 @f2()
+ declare i64 @f3()
+ declare i64 @f4()
+ declare i64 @f5()
+ @computed_goto.dispatch = external global [5 x ptr]
+ define void @computed_goto() { ret void }
+ define void @jump_table() { ret void }
+...
+---
+name: computed_goto
+alignment: 16
+tracksRegLiveness: true
+noPhis: false
+isSSA: true
+noVRegs: false
+hasFakeUses: false
+debugInstrRef: true
+registers:
+ - { id: 0, class: gr64 }
+ - { id: 1, class: gr64 }
+ - { id: 2, class: gr64 }
+ - { id: 3, class: gr64 }
+ - { id: 4, class: gr64 }
+ - { id: 5, class: gr64_nosp }
+ - { id: 6, class: gr64 }
+ - { id: 7, class: gr64 }
+ - { id: 8, class: gr64 }
+ - { id: 9, class: gr64 }
+ - { id: 10, class: gr64 }
+frameInfo:
+ maxAlignment: 1
+ adjustsStack: true
+ hasCalls: true
+machineFunctionInfo:
+ amxProgModel: None
+body: |
+ ; CHECK-LABEL: name: computed_goto
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr64_nosp = COPY [[COPY]]
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:gr64_nosp = COPY [[COPY1]]
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[COPY1]], @computed_goto.dispatch, $noreg
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f1, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:gr64_nosp = COPY [[COPY3]]
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:gr64_nosp = COPY [[COPY4]]
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[COPY4]], @computed_goto.dispatch, $noreg
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f2, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY7:%[0-9]+]]:gr64_nosp = COPY [[COPY6]]
+ ; CHECK-NEXT: [[COPY8:%[0-9]+]]:gr64_nosp = COPY [[COPY7]]
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[COPY7]], @computed_goto.dispatch, $noreg
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f3, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY9:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY10:%[0-9]+]]:gr64_nosp = COPY [[COPY9]]
+ ; CHECK-NEXT: [[COPY11:%[0-9]+]]:gr64_nosp = COPY [[COPY10]]
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[COPY10]], @computed_goto.dispatch, $noreg
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f4, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY12:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY13:%[0-9]+]]:gr64_nosp = COPY [[COPY12]]
+ ; CHECK-NEXT: [[COPY14:%[0-9]+]]:gr64_nosp = COPY [[COPY13]]
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[COPY13]], @computed_goto.dispatch, $noreg
+ bb.0:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %6:gr64 = COPY $rax
+ %0:gr64 = COPY %6
+ JMP_1 %bb.5
+
+ bb.1:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f1, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %10:gr64 = COPY $rax
+ %1:gr64 = COPY %10
+ JMP_1 %bb.5
+
+ bb.2:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f2, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %9:gr64 = COPY $rax
+ %2:gr64 = COPY %9
+ JMP_1 %bb.5
+
+ bb.3:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f3, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %8:gr64 = COPY $rax
+ %3:gr64 = COPY %8
+ JMP_1 %bb.5
+
+ bb.4:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f4, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %7:gr64 = COPY $rax
+ %4:gr64 = COPY %7
+
+ bb.5:
+ successors: %bb.1, %bb.2, %bb.3, %bb.4
+
+ %5:gr64_nosp = PHI %0, %bb.0, %4, %bb.4, %3, %bb.3, %2, %bb.2, %1, %bb.1
+ JMP64m $noreg, 8, %5, @computed_goto.dispatch, $noreg
+
+...
+---
+name: jump_table
+alignment: 16
+tracksRegLiveness: true
+noPhis: false
+isSSA: true
+noVRegs: false
+hasFakeUses: false
+debugInstrRef: true
+registers:
+ - { id: 0, class: gr64 }
+ - { id: 1, class: gr64 }
+ - { id: 2, class: gr64 }
+ - { id: 3, class: gr64 }
+ - { id: 4, class: gr64 }
+ - { id: 5, class: gr64 }
+ - { id: 6, class: gr64 }
+ - { id: 7, class: gr64 }
+ - { id: 8, class: gr64_nosp }
+ - { id: 9, class: gr64 }
+ - { id: 10, class: gr64 }
+ - { id: 11, class: gr64 }
+ - { id: 12, class: gr64 }
+ - { id: 13, class: gr64 }
+frameInfo:
+ maxAlignment: 1
+ adjustsStack: true
+ hasCalls: true
+machineFunctionInfo:
+ amxProgModel: None
+jumpTable:
+ kind: block-address
+ entries:
+ - id: 0
+ blocks: [ '%bb.2', '%bb.3', '%bb.4', '%bb.5', '%bb.6' ]
+body: |
+ ; CHECK-LABEL: name: jump_table
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr64 = COPY [[COPY]]
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:gr64 = PHI [[COPY1]], %bb.0, %6, %bb.7, %5, %bb.6, %4, %bb.5, %3, %bb.4, %2, %bb.3
+ ; CHECK-NEXT: [[DEC64r:%[0-9]+]]:gr64_nosp = DEC64r [[PHI]], implicit-def dead $eflags
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: successors: %bb.3(0x1999999a), %bb.4(0x1999999a), %bb.5(0x1999999a), %bb.6(0x1999999a), %bb.7(0x1999999a)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: JMP64m $noreg, 8, [[DEC64r]], %jump-table.0, $noreg :: (load (s64) from jump-table)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f1, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:gr64 = COPY [[COPY2]]
+ ; CHECK-NEXT: JMP_1 %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f2, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY4:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY5:%[0-9]+]]:gr64 = COPY [[COPY4]]
+ ; CHECK-NEXT: JMP_1 %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f3, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY6:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY7:%[0-9]+]]:gr64 = COPY [[COPY6]]
+ ; CHECK-NEXT: JMP_1 %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.6:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f4, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY8:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY9:%[0-9]+]]:gr64 = COPY [[COPY8]]
+ ; CHECK-NEXT: JMP_1 %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.7:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f5, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ ; CHECK-NEXT: [[COPY10:%[0-9]+]]:gr64 = COPY $rax
+ ; CHECK-NEXT: [[COPY11:%[0-9]+]]:gr64 = COPY [[COPY10]]
+ ; CHECK-NEXT: JMP_1 %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.8:
+ bb.0:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %7:gr64 = COPY $rax
+ %0:gr64 = COPY %7
+
+ bb.1:
+ %1:gr64 = PHI %0, %bb.0, %6, %bb.6, %5, %bb.5, %4, %bb.4, %3, %bb.3, %2, %bb.2
+ %8:gr64_nosp = DEC64r %1, implicit-def dead $eflags
+
+ bb.8:
+ successors: %bb.2(0x1999999a), %bb.3(0x1999999a), %bb.4(0x1999999a), %bb.5(0x1999999a), %bb.6(0x1999999a)
+
+ JMP64m $noreg, 8, %8, %jump-table.0, $noreg :: (load (s64) from jump-table)
+
+ bb.2:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f1, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %13:gr64 = COPY $rax
+ %2:gr64 = COPY %13
+ JMP_1 %bb.1
+
+ bb.3:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f2, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %12:gr64 = COPY $rax
+ %3:gr64 = COPY %12
+ JMP_1 %bb.1
+
+ bb.4:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f3, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %11:gr64 = COPY $rax
+ %4:gr64 = COPY %11
+ JMP_1 %bb.1
+
+ bb.5:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f4, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %10:gr64 = COPY $rax
+ %5:gr64 = COPY %10
+ JMP_1 %bb.1
+
+ bb.6:
+ ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ CALL64pcrel32 target-flags(x86-plt) @f5, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
+ ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
+ %9:gr64 = COPY $rax
+ %6:gr64 = COPY %9
+ JMP_1 %bb.1
+
+ bb.7:
+
+...
|
|
I will revert #78582 after this PR is merged. |
|
Please don't link to gcc source code in commit messages. I think it might make sense to try to pursue a more general solution here... can we simplify the CFG while still keeping the benefit of taildup? Maybe introduce a fake BB into the CFG or something. But special-casing indirectbr seems okay short-term. |
Removed.
I think the current IR is already as you want: https://llvm.godbolt.org/z/o5YT191P9. |
Currently, taildup means we end up with O(N^2) edges, even in the indirectbr case. This patch is just deciding that we're willing to pay that cost in the indirectbr case, because it's a strong hint that that the code is important for performance. The question is, can we rework the backend representation to allow taildup without the explosion in the number of edges? |
Sound makes sense to me. I can try to address it, although the process can be slow. https://www.ajla-lang.cz/tutorial.html pointed out several issues:
|
58b5757 to
5e4c0f4
Compare
|
ping :p |
The idea sounds good. Maybe outlined as => |
I don't fully understand the details of this improvement yet, but it shouldn't block this PR. So.. ping |
|
Just getting back to this, sorry! Is this the latest version of the patch that should fix the regressions? I tried the patch when building Python on macOS and it improves performance by ~0.5% while with #116072 increases performance by 2-3% |
I don't have a CPU like the i7-2640M, but I can observe a significant decrease in This should help CPUs with weaker branch prediction capabilities. I will rebase this PR after #116072 land. |
|
Rebased. |
|
Ping ~ (From #106846 (comment)) |
The vendored patches were produced from the latest versions of the following PRs: * llvm/llvm-project#114990 * llvm/llvm-project#120267 The first improves codegen for computed gotos. There was a regression in LLVM 19 causing a ~10% performance drop in CPython. The second enables BOLT to work with computed gotos. This enables BOLT to accomplish more on CPython.
The vendored patches were produced from the latest versions of the following PRs: * llvm/llvm-project#114990 * llvm/llvm-project#120267 The first improves codegen for computed gotos. There was a regression in LLVM 19 causing a ~10% performance drop in CPython. The second enables BOLT to work with computed gotos. This enables BOLT to accomplish more on CPython.
The vendored patches were produced from the latest versions of the following PRs: * llvm/llvm-project#114990 * llvm/llvm-project#120267 The first improves codegen for computed gotos. There was a regression in LLVM 19 causing a ~10% performance drop in CPython. The second enables BOLT to work with computed gotos. This enables BOLT to accomplish more on CPython.
The vendored patches were produced from the latest versions of the following PRs: * llvm/llvm-project#114990 * llvm/llvm-project#120267 The first improves codegen for computed gotos. There was a regression in LLVM 19 causing a ~10% performance drop in CPython. The second enables BOLT to work with computed gotos. This enables BOLT to accomplish more on CPython.
The vendored patches were produced from the latest versions of the following PRs: * llvm/llvm-project#114990 * llvm/llvm-project#120267 The first improves codegen for computed gotos. There was a regression in LLVM 19 causing a ~10% performance drop in CPython. The second enables BOLT to work with computed gotos. This enables BOLT to accomplish more on CPython.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes llvm#106846. This is what I learned from GCC. I found that GCC does not duplicate the BB that has indirect jumps with the jump table. I believe GCC has provided a clear explanation here: > Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to speed up edge based data flow. We used to not unfactor them again, which can seriously pessimize code with many computed jumps in the source code, such as interpreters. (cherry picked from commit dd21aac)
Fixes llvm#106846. This is what I learned from GCC. I found that GCC does not duplicate the BB that has indirect jumps with the jump table. I believe GCC has provided a clear explanation here: > Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to speed up edge based data flow. We used to not unfactor them again, which can seriously pessimize code with many computed jumps in the source code, such as interpreters.
Fixes #106846.
This is what I learned from GCC. I found that GCC does not duplicate the BB that has indirect jumps with the jump table. I believe GCC has provided a clear explanation here: