[GlobalISel] Disable CSE in IRTranslator & Legalizer at O0#197000
[GlobalISel] Disable CSE in IRTranslator & Legalizer at O0#197000c-rhodes wants to merge 5 commits into
Conversation
|
this is showing positive results, -0.31% geomean improvement aarch64-O0-g with no change in code size. @aemerson I noticed you turned this on a while back (946b124) but only for constants at O0. Is this result unexpected? it breaks tons of tests of course and I've not updated them yet hence draft, just wanted to check before I post a full PR. |
|
I am quite surprised at these results but I guess we may have added the same CSE combines into the O0 prelegalizer combiner anyway. In principal if this doesn't regress anything I'm not opposed to it, and 0.3% compile time saving is quite significant. |
7958550 to
98bfcb7
Compare
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
|
also disabling it for IRTranslator is a further improvement still with no impact to code-size, geomean -0.54% sqlite -1.30%: |
a05963a to
2cd7f4f
Compare
|
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-aarch64 Author: Cullen Rhodes (c-rhodes) ChangesCTMark -0.54% geomean improvement on stage1-aarch64-O0-g [1] with no change to I also measured without -g locally and geomean is -0.62%: This was enabled for constants only in 946b124 and improved compile-time Assisted-by: codex [1] https://llvm-compile-time-tracker.com/compare.php?from=f0c84b3ac80c4c594045aff9b4f88ba564614361&to=98bfcb7aa08ba8ecf5ed5a89e64d2e7f0878be5b&stat=instructions%3Au Patch is 6.07 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/197000.diff 140 Files Affected:
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index 096e9a6f2b1dc..c3af2acc622ba 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -1587,7 +1587,7 @@ bool TargetPassConfig::reportDiagnosticWhenGlobalISelFallback() const {
}
bool TargetPassConfig::isGISelCSEEnabled() const {
- return true;
+ return getOptLevel() != CodeGenOptLevel::None;
}
std::unique_ptr<CSEConfigBase> TargetPassConfig::getCSEConfig() const {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc.ll
index 373b040ebec65..7c08cf1273b32 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc.ll
@@ -231,8 +231,8 @@ define dso_local i128 @load_atomic_i128_aligned_unordered(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_unordered:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_unordered:
@@ -246,8 +246,8 @@ define dso_local i128 @load_atomic_i128_aligned_unordered_const(ptr readonly %pt
; -O0-LABEL: load_atomic_i128_aligned_unordered_const:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_unordered_const:
@@ -261,8 +261,8 @@ define dso_local i128 @load_atomic_i128_aligned_monotonic(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_monotonic:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_monotonic:
@@ -276,8 +276,8 @@ define dso_local i128 @load_atomic_i128_aligned_monotonic_const(ptr readonly %pt
; -O0-LABEL: load_atomic_i128_aligned_monotonic_const:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:
@@ -291,8 +291,8 @@ define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_acquire:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_acquire:
@@ -306,8 +306,8 @@ define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr)
; -O0-LABEL: load_atomic_i128_aligned_acquire_const:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_acquire_const:
@@ -321,8 +321,8 @@ define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_seq_cst:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stlxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stlxp w8, x10, x11, [x9]
; -O0: stlxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_seq_cst:
@@ -336,8 +336,8 @@ define dso_local i128 @load_atomic_i128_aligned_seq_cst_const(ptr readonly %ptr)
; -O0-LABEL: load_atomic_i128_aligned_seq_cst_const:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stlxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stlxp w8, x10, x11, [x9]
; -O0: stlxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_seq_cst_const:
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-v8a.ll
index 045e080983d5f..8ba7f6c235696 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-v8a.ll
@@ -231,8 +231,8 @@ define dso_local i128 @load_atomic_i128_aligned_unordered(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_unordered:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_unordered:
@@ -246,8 +246,8 @@ define dso_local i128 @load_atomic_i128_aligned_unordered_const(ptr readonly %pt
; -O0-LABEL: load_atomic_i128_aligned_unordered_const:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_unordered_const:
@@ -261,8 +261,8 @@ define dso_local i128 @load_atomic_i128_aligned_monotonic(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_monotonic:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_monotonic:
@@ -276,8 +276,8 @@ define dso_local i128 @load_atomic_i128_aligned_monotonic_const(ptr readonly %pt
; -O0-LABEL: load_atomic_i128_aligned_monotonic_const:
; -O0: ldxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_monotonic_const:
@@ -291,8 +291,8 @@ define dso_local i128 @load_atomic_i128_aligned_acquire(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_acquire:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_acquire:
@@ -306,8 +306,8 @@ define dso_local i128 @load_atomic_i128_aligned_acquire_const(ptr readonly %ptr)
; -O0-LABEL: load_atomic_i128_aligned_acquire_const:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stxp w8, x10, x11, [x9]
; -O0: stxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_acquire_const:
@@ -321,8 +321,8 @@ define dso_local i128 @load_atomic_i128_aligned_seq_cst(ptr %ptr) {
; -O0-LABEL: load_atomic_i128_aligned_seq_cst:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stlxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stlxp w8, x10, x11, [x9]
; -O0: stlxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_seq_cst:
@@ -336,8 +336,8 @@ define dso_local i128 @load_atomic_i128_aligned_seq_cst_const(ptr readonly %ptr)
; -O0-LABEL: load_atomic_i128_aligned_seq_cst_const:
; -O0: ldaxp x0, x1, [x9]
; -O0: cmp x0, x10
-; -O0: cmp x1, x10
-; -O0: stlxp w8, x10, x10, [x9]
+; -O0: cmp x1, x11
+; -O0: stlxp w8, x10, x11, [x9]
; -O0: stlxp w8, x0, x1, [x9]
;
; -O1-LABEL: load_atomic_i128_aligned_seq_cst_const:
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc_immo.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc_immo.ll
index 8c958459d93e8..3369db8b5f7d8 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc_immo.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-store-rcpc_immo.ll
@@ -421,12 +421,11 @@ define void @store_atomic_i128_from_gep() {
; GISEL-LABEL: store_atomic_i128_from_gep:
; GISEL: bl init
; GISEL: dmb ish
-; GISEL: stp x8, x8, [x9, #16]
+; GISEL: stp x8, x9, [x10, #16]
;
; SDAG-LABEL: store_atomic_i128_from_gep:
; SDAG: bl init
; SDAG: dmb ish
-; SDAG: stp xzr, xzr, [sp, #16]
%a = alloca [3 x i128]
call void @init(ptr %a)
%arrayidx = getelementptr [3 x i128], ptr %a, i64 0, i64 1
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll
index be51210882eaa..04a76ffba9e2b 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll
@@ -556,15 +556,16 @@ define void @atomic_load_relaxed(i64, i64, ptr %p, ptr %p2) {
; CHECK-LLSC-O0-LABEL: atomic_load_relaxed:
; CHECK-LLSC-O0: // %bb.0:
; CHECK-LLSC-O0-NEXT: mov x11, xzr
+; CHECK-LLSC-O0-NEXT: mov x12, xzr
; CHECK-LLSC-O0-NEXT: .LBB4_1: // =>This Inner Loop Header: Depth=1
; CHECK-LLSC-O0-NEXT: ldxp x9, x8, [x2]
; CHECK-LLSC-O0-NEXT: cmp x9, x11
; CHECK-LLSC-O0-NEXT: cset w10, ne
-; CHECK-LLSC-O0-NEXT: cmp x8, x11
+; CHECK-LLSC-O0-NEXT: cmp x8, x12
; CHECK-LLSC-O0-NEXT: cinc w10, w10, ne
; CHECK-LLSC-O0-NEXT: cbnz w10, .LBB4_3
; CHECK-LLSC-O0-NEXT: // %bb.2: // in Loop: Header=BB4_1 Depth=1
-; CHECK-LLSC-O0-NEXT: stxp w10, x11, x11, [x2]
+; CHECK-LLSC-O0-NEXT: stxp w10, x11, x12, [x2]
; CHECK-LLSC-O0-NEXT: cbnz w10, .LBB4_1
; CHECK-LLSC-O0-NEXT: b .LBB4_4
; CHECK-LLSC-O0-NEXT: .LBB4_3: // in Loop: Header=BB4_1 Depth=1
@@ -585,10 +586,10 @@ define void @atomic_load_relaxed(i64, i64, ptr %p, ptr %p2) {
; CHECK-OUTLINE-LLSC-O0-NEXT: .cfi_offset w30, -16
; CHECK-OUTLINE-LLSC-O0-NEXT: mov x4, x2
; CHECK-OUTLINE-LLSC-O0-NEXT: str x3, [sp, #8] // 8-byte Spill
+; CHECK-OUTLINE-LLSC-O0-NEXT: mov x2, xzr
; CHECK-OUTLINE-LLSC-O0-NEXT: mov x3, xzr
-; CHECK-OUTLINE-LLSC-O0-NEXT: mov x0, x3
+; CHECK-OUTLINE-LLSC-O0-NEXT: mov x0, x2
; CHECK-OUTLINE-LLSC-O0-NEXT: mov x1, x3
-; CHECK-OUTLINE-LLSC-O0-NEXT: mov x2, x3
; CHECK-OUTLINE-LLSC-O0-NEXT: bl __aarch64_cas16_relax
; CHECK-OUTLINE-LLSC-O0-NEXT: ldr x3, [sp, #8] // 8-byte Reload
; CHECK-OUTLINE-LLSC-O0-NEXT: // implicit-def: $q0
@@ -601,10 +602,11 @@ define void @atomic_load_relaxed(i64, i64, ptr %p, ptr %p2) {
;
; CHECK-CAS-O0-LABEL: atomic_load_relaxed:
; CHECK-CAS-O0: // %bb.0:
+; CHECK-CAS-O0-NEXT: mov x4, xzr
; CHECK-CAS-O0-NEXT: mov x8, xzr
-; CHECK-CAS-O0-NEXT: mov x0, x8
+; CHECK-CAS-O0-NEXT: mov x0, x4
; CHECK-CAS-O0-NEXT: mov x1, x8
-; CHECK-CAS-O0-NEXT: mov x4, x8
+; CHECK-CAS-O0-NEXT: // kill: def $x4 killed $x4 def $x4_x5
; CHECK-CAS-O0-NEXT: mov x5, x8
; CHECK-CAS-O0-NEXT: casp x0, x1, x4, x5, [x2]
; CHECK-CAS-O0-NEXT: mov x9, x0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll
index 269597cbd730b..71cddad8c904d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll
@@ -17,10 +17,11 @@ define i32 @cse_gep(ptr %ptr, i32 %idx) {
; O0-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = nusw inbounds G_PTR_ADD [[COPY]], [[MUL]](i64)
; O0-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY [[PTR_ADD]](p0)
; O0-NEXT: [[LOAD:%[0-9]+]]:_(i32) = G_LOAD [[COPY2]](p0) :: (load (i32) from %ir.gep1)
- ; O0-NEXT: [[MUL1:%[0-9]+]]:_(i64) = nsw G_MUL [[SEXT]], [[C]]
+ ; O0-NEXT: [[C1:%[0-9]+]]:_(i64) = G_CONSTANT i64 16
+ ; O0-NEXT: [[MUL1:%[0-9]+]]:_(i64) = nsw G_MUL [[SEXT]], [[C1]]
; O0-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p0) = nusw inbounds G_PTR_ADD [[COPY]], [[MUL1]](i64)
- ; O0-NEXT: [[C1:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
- ; O0-NEXT: [[PTR_ADD2:%[0-9]+]]:_(p0) = nuw nusw inbounds G_PTR_ADD [[PTR_ADD1]], [[C1]](i64)
+ ; O0-NEXT: [[C2:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+ ; O0-NEXT: [[PTR_ADD2:%[0-9]+]]:_(p0) = nuw nusw inbounds G_PTR_ADD [[PTR_ADD1]], [[C2]](i64)
; O0-NEXT: [[LOAD1:%[0-9]+]]:_(i32) = G_LOAD [[PTR_ADD2]](p0) :: (load (i32) from %ir.gep2)
; O0-NEXT: [[ADD:%[0-9]+]]:_(i32) = G_ADD [[LOAD]], [[LOAD1]]
; O0-NEXT: $w0 = COPY [[ADD]](i32)
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index 8548f63bd1150..8af250766de74 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -465,7 +465,7 @@ next:
; CHECK-LABEL: name: constant_int_start
; CHECK: [[TWO:%[0-9]+]]:_(i32) = G_CONSTANT i32 2
; CHECK: [[ANSWER:%[0-9]+]]:_(i32) = G_CONSTANT i32 42
-; CHECK: [[RES:%[0-9]+]]:_(i32) = G_CONSTANT i32 44
+; CHECK: [[RES:%[0-9]+]]:_(i32) = G_ADD [[TWO]], [[ANSWER]]
define i32 @constant_int_start() {
%res = add i32 2, 42
ret i32 %res
@@ -605,7 +605,8 @@ define ptr @test_constant_null() {
; CHECK: [[GEP1:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD [[ADDR]], [[CST1]](i64)
; CHECK: [[VAL2:%[0-9]+]]:_(i32) = G_LOAD [[GEP1]](p0) :: (load (i32) from %ir.addr + 4)
; CHECK: G_STORE [[VAL1]](i8), [[ADDR]](p0) :: (store (i8) into %ir.addr, align 4)
-; CHECK: [[GEP2:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD [[ADDR]], [[CST1]](i64)
+; CHECK: [[CST1B:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+; CHECK: [[GEP2:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD [[ADDR]], [[CST1B]](i64)
; CHECK: G_STORE [[VAL2]](i32), [[GEP2]](p0) :: (store (i32) into %ir.addr + 4)
define void @test_struct_memops(ptr %addr) {
%val = load { i8, i32 }, ptr %addr
@@ -832,7 +833,8 @@ define i32 @test_extractvalue(ptr %addr) {
; CHECK: [[GEP3:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3]](i64)
; CHECK: [[LD4:%[0-9]+]]:_(i32) = G_LOAD [[GEP3]](p0) :: (load (i32) from %ir.addr + 12)
; CHECK: G_STORE [[LD2]](i8), %1(p0) :: (store (i8) into %ir.addr2, align 4)
-; CHECK: [[GEP4:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %1, [[CST1]](i64)
+; CHECK: [[CST4:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+; CHECK: [[GEP4:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %1, [[CST4]](i64)
; CHECK: G_STORE [[LD3]](i32), [[GEP4]](p0) :: (store (i32) into %ir.addr2 + 4)
define void @test_extractvalue_agg(ptr %addr, ptr %addr2) {
%struct = load %struct.nested, ptr %addr
@@ -866,11 +868,14 @@ define void @test_trivial_extract_ptr([1 x ptr] %s, i8 %val) {
; CHECK: [[GEP3:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3]](i64)
; CHECK: [[LD4:%[0-9]+]]:_(i32) = G_LOAD [[GEP3]](p0) :: (load (i32) from %ir.addr + 12)
; CHECK: G_STORE [[LD1]](i8), %0(p0) :: (store (i8) into %ir.addr, align 4)
-; CHECK: [[GEP4:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST1]](i64)
+; CHECK: [[CST1B:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+; CHECK: [[GEP4:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST1B]](i64)
; CHECK: G_STORE [[LD2]](i8), [[GEP4]](p0) :: (store (i8) into %ir.addr + 4, align 4)
-; CHECK: [[GEP5:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST2]](i64)
+; CHECK: [[CST2B:%[0-9]+]]:_(i64) = G_CONSTANT i64 8
+; CHECK: [[GEP5:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST2B]](i64)
; CHECK: G_STORE %1(i32), [[GEP5]](p0) :: (store (i32) into %ir.addr + 8)
-; CHECK: [[GEP6:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3]](i64)
+; CHECK: [[CST3B:%[0-9]+]]:_(i64) = G_CONSTANT i64 12
+; CHECK: [[GEP6:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3B]](i64)
; CHECK: G_STORE [[LD4]](i32), [[GEP6]](p0) :: (store (i32) into %ir.addr + 12)
define void @test_insertvalue(ptr %addr, i32 %val) {
%struct = load %struct.nested, ptr %addr
@@ -905,7 +910,8 @@ define [1 x ptr] @test_trivial_insert_ptr([1 x ptr] %s, ptr %val) {
; CHECK: [[GEP1:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %1, [[CST1]](i64)
; CHECK: [[LD2:%[0-9]+]]:_(i32) = G_LOAD [[GEP1]](p0) :: (load (i32) from %ir.addr2 + 4)
; CHECK: [[LD3:%[0-9]+]]:_(i8) = G_LOAD %0(p0) :: (load (i8) from %ir.addr, align 4)
-; CHECK: [[GEP2:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST1]](i64)
+; CHECK: [[CST2:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+; CHECK: [[GEP2:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST2]](i64)
; CHECK: [[LD4:%[0-9]+]]:_(i8) = G_LOAD [[GEP2]](p0) :: (load (i8) from %ir.addr + 4, align 4)
; CHECK: [[CST3:%[0-9]+]]:_(i64) = G_CONSTANT i64 8
; CHECK: [[GEP3:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3]](i64)
@@ -914,11 +920,14 @@ define [1 x ptr] @test_trivial_insert_ptr([1 x ptr] %s, ptr %val) {
; CHECK: [[GEP4:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST4]](i64)
; CHECK: [[LD6:%[0-9]+]]:_(i32) = G_LOAD [[GEP4]](p0) :: (load (i32) from %ir.addr + 12)
; CHECK: G_STORE [[LD3]](i8), %0(p0) :: (store (i8) into %ir.addr, align 4)
-; CHECK: [[GEP5:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST1]](i64)
+; CHECK: [[CST2B:%[0-9]+]]:_(i64) = G_CONSTANT i64 4
+; CHECK: [[GEP5:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST2B]](i64)
; CHECK: G_STORE [[LD1]](i8), [[GEP5]](p0) :: (store (i8) into %ir.addr + 4, align 4)
-; CHECK: [[GEP6:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3]](i64)
+; CHECK: [[CST3B:%[0-9]+]]:_(i64) = G_CONSTANT i64 8
+; CHECK: [[GEP6:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST3B]](i64)
; CHECK: G_STORE [[LD2]](i32), [[GEP6]](p0) :: (store (i32) into %ir.addr + 8)
-; CHECK: [[GEP7:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST4]](i64)
+; CHECK: [[CST4B:%[0-9]+]]:_(i64) = G_CONSTANT i64 12
+; CHECK: [[GEP7:%[0-9]+]]:_(p0) = nuw inbounds G_PTR_ADD %0, [[CST4B]](i64)
; CHECK: G_STORE [[LD6]](i32), [[GEP7]](p0) :: (store (i32) into %ir.addr + 12)
define void @test_insertvalue_agg(ptr %addr, ptr %addr2) {
%smallstruct = load {i8, i32}, ptr %addr2
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-ios.ll b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-ios.ll
index d5ffcb2b9b556..7f37a07a387b5 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-ios.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-ios.ll
@@ -70,8 +70,9 @@ define void @take_128bit_struct(ptr %ptr, [2 x i64] %in) {
; CHECK: [[ADDR:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[OFF]](i64)
; CHECK: G_STORE [[LD1]](i64), [[ADDR]](p0) :: (store (i64) into stack, align 1)
-; CHECK: [[ADDR:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[CST]]
-; CHECK: G_STORE [[LD2]](i64), [[ADDR]](p0) :: (store (i64) into stack + 8, align 1)
+; CHECK: [[CST2:%[0-9]+]]:_(i64) = G_CONSTANT i64 8
+; CHECK: [[ADDR2:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[CST2]](i64)
+; CHECK: G_STORE [[LD2]](i64), [[ADDR2]](p0) :: (store (i64) into stack + 8, align 1)
define void @test_split_struct(ptr %ptr) {
%struct = load [2 x i64], ptr %ptr
call void @take_split_struct(ptr null, i64 1, i64 2, i64 3,
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator.ll
index 25baf6a295b14..d8b83c951b0c3 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator.ll
@@ -293,7 +293,8 @@ define void @take_128bit_struct(ptr %ptr, [2 x i64] %in) {
; CHECK: [[CST2:%[0-9]+]]:_(i64) = G_CONSTANT i64 0
; CHECK: [[GEP2:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[CST2]](i64)
; CHECK: G_STORE [[LO]](i64), [[GEP2]](p0) :: (store (i64) into stack, align 1)
-; CHECK: [[GEP3:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[CST]](i64)
+; CHECK: [[CST3:%[0-9]+]]:_(i64) = G_CONSTANT i64 8
+; CHECK: [[GEP3:%[0-9]+]]:_(p0) = G_PTR_ADD [[SP]], [[CST3]](i64)
; CHECK: G_STORE [[HI]](i64), [[GEP3]](p0) :: (store (i64) into stack + 8, align 1)
define void @test_split_struct(ptr %ptr) {
%struct = load [2 x i64], ptr %ptr
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-bitcast.ll b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-bitcast.ll
index 6749a32e237db..4f676283cd62d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-bitcast.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-bitcast.ll
@@ -37,22 +37,22 @@ define i32 @test_bitcast_invalid_vreg() {
; CHECK-NEXT: [[C28:%[0-9]+]]:_(i32) = G_CONSTANT i32 29
; CHECK-NEXT: [[C29:%[0-9]+]]:_(i32) = G_CONSTANT i32 30
; CHECK-NEXT: [[C30:%[0-9]+]]:_(i32) = G_CONSTANT i32 100
- ; CHECK-NEXT: [[C31:%[0-9]+]]:_(i32) = G_CONSTANT i32 3
- ; CHECK-NEXT: [[C32:%[0-9]+]]:_(i32) = G_CONSTANT i32 7
- ; CHECK-NEXT: [[C33:%[0-9]+]]:_(i32) = G_CONSTANT i32 11
- ; CHE...
[truncated]
|
arsenm
left a comment
There was a problem hiding this comment.
I thought the whole reason we had the CSE builder was to improve compile time. Is it worth enabling at all? Especially in all these legalizer tests where the instruction count multiplies in the end
do you mean at other opt levels? Looking at https://reviews.llvm.org/D52803 which added it I don't get the impression it was added to improve compile-time, if anything it's framed as the opposite (small compile-time regression), but perhaps @aemerson knows the history better. |
aemerson
left a comment
There was a problem hiding this comment.
It's definitely possible that environment is just different now than 8 years ago when it was implemented. For one thing the combiners are just more fleshed out.
I'm somewhat nervous about wholesale removing it but again if the data suggests it's not useful anymore it's fine with me. It is nice though that for testing purposes it creates simplified MIR for dumb folds.
Anyway, for this PR specifically at -O0 I think it's fine to do. I do have a question on one of the test changes...
CTMark -0.54% geomean improvement on stage1-aarch64-O0-g [1] with no change to
code-size [2]. Sqlite is -1.30%.
I also measured without -g locally and geomean is -0.62%:
This was enabled for constants only in 946b124 and improved compile-time
and code-size.
Assisted-by: codex
[1] https://llvm-compile-time-tracker.com/compare.php?from=f0c84b3ac80c4c594045aff9b4f88ba564614361&to=98bfcb7aa08ba8ecf5ed5a89e64d2e7f0878be5b&stat=instructions%3Au
[2] https://llvm-compile-time-tracker.com/compare.php?from=f0c84b3ac80c4c594045aff9b4f88ba564614361&to=98bfcb7aa08ba8ecf5ed5a89e64d2e7f0878be5b&stat=size-total