-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachineLICM] Fix incorrect CSE on hoisted const load #73007
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Igor Kirillov (igogo-x86) ChangesWhen hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop. Fixes #72855 Full diff: https://github.com/llvm/llvm-project/pull/73007.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 3f3d7e53fb0e4e6..efc19f8fdbf8c2e 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1422,6 +1422,11 @@ bool MachineLICMBase::EliminateCSE(
if (MI->isImplicitDef())
return false;
+ // Do not CSE normal loads because between them could be store instructions
+ // that change the loaded value
+ if (MI->mayLoad() && !MI->isDereferenceableInvariantLoad())
+ return false;
+
if (MachineInstr *Dup = LookForDuplicate(MI, CI->second)) {
LLVM_DEBUG(dbgs() << "CSEing " << *MI << " with " << *Dup);
@@ -1475,6 +1480,9 @@ bool MachineLICMBase::EliminateCSE(
/// Return true if the given instruction will be CSE'd if it's hoisted out of
/// the loop.
bool MachineLICMBase::MayCSE(MachineInstr *MI) {
+ if (MI->mayLoad() && !MI->isDereferenceableInvariantLoad())
+ return false;
+
unsigned Opcode = MI->getOpcode();
for (auto &Map : CSEMap) {
// Check this CSEMap's preheader dominates MI's basic block.
diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
index 01af84ea6922ce3..30123a31cebbe9d 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
+++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
@@ -448,6 +448,56 @@ for.exit: ; preds = %for.body
ret i64 %spec.select
}
+; See issue https://github.com/llvm/llvm-project/issues/72855
+;
+; When hoisting instruction out of the loop, ensure that loads are not common
+; subexpressions eliminated. In this example pointer %c may alias pointer %b,
+; so when hoisting `%y = load i64, ptr %b` instruction we can't replace it with
+; `%b.val = load i64, ptr %b`
+;
+define i64 @hoisting_no_cse(ptr %a, ptr %b, ptr %c, i64 %N) {
+; CHECK-LABEL: hoisting_no_cse:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ldr x8, [x1]
+; CHECK-NEXT: add x8, x8, #1
+; CHECK-NEXT: str x8, [x2]
+; CHECK-NEXT: mov x8, xzr
+; CHECK-NEXT: ldr x9, [x1]
+; CHECK-NEXT: .LBB7_1: // %for.body
+; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: ldr x10, [x0], #8
+; CHECK-NEXT: ldr x10, [x10]
+; CHECK-NEXT: cmp x10, x9
+; CHECK-NEXT: cinc x8, x8, eq
+; CHECK-NEXT: subs x3, x3, #1
+; CHECK-NEXT: b.ne .LBB7_1
+; CHECK-NEXT: // %bb.2: // %for.exit
+; CHECK-NEXT: mov x0, x8
+; CHECK-NEXT: ret
+entry:
+ %b.val = load i64, ptr %b
+ %b.val.changed = add i64 %b.val, 1
+ store i64 %b.val.changed, ptr %c
+ br label %for.body
+
+for.body: ; preds = %entry, %for.body
+ %idx = phi i64 [ %inc, %for.body ], [ 0, %entry ]
+ %sum = phi i64 [ %spec.select, %for.body ], [ 0, %entry ]
+ %arrayidx = getelementptr inbounds ptr, ptr %a, i64 %idx
+ %0 = load ptr, ptr %arrayidx, align 8
+ %x = load i64, ptr %0
+ %y = load i64, ptr %b
+ %cmp = icmp eq i64 %x, %y
+ %add = zext i1 %cmp to i64
+ %spec.select = add i64 %sum, %add
+ %inc = add nuw i64 %idx, 1
+ %exitcond = icmp eq i64 %inc, %N
+ br i1 %exitcond, label %for.exit, label %for.body
+
+for.exit: ; preds = %for.body
+ ret i64 %spec.select
+}
+
declare i32 @bcmp(ptr, ptr, i64)
declare i32 @memcmp(ptr, ptr, i64)
declare void @func()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds OK to me. LGTM
@@ -1422,6 +1422,11 @@ bool MachineLICMBase::EliminateCSE( | |||
if (MI->isImplicitDef()) | |||
return false; | |||
|
|||
// Do not CSE normal loads because between them could be store instructions | |||
// that change the loaded value | |||
if (MI->mayLoad() && !MI->isDereferenceableInvariantLoad()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still guaranteed to be safe for dereferenceable invariant loads? Could we have a situation where we hoist these memory accesses from a loop because they appear invariant:
load a
store b
load a
where store b could alias with load a? In this case we will hoist all three to the preheader - do we also CSE the two loads? If so, that's unsafe too.
Hmm, I guess that shouldn't happen in practice because we only hoist the loads if there are no stores in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just ignore me. I think I probably misunderstood the problem you're trying to solve. I guess I'm just trying to understand why we don't just always return false for all loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a particular case where we load constant from memory like this:
fadd float %tmp3.i, 1.000000e+10
Also, if at least one store is in the loop, we don't hoist loads.
When hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop. Fixes llvm#72855
20d275a
to
916ae90
Compare
When hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop.
Fixes #72855