Skip to content

Conversation

@dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Nov 28, 2025

When LOps.RootInsert comes after LI2, since we use LI2 as the new insert point, we should make sure the memory region accessed by LOps isn't modified. However, the original implementation passes the bit width LOps.LoadSize as the number of bytes to be accessed, causing BasicAA to return NoAlias:

// If the size of one access is larger than the entire object on the other
// side, then we know such behavior is undefined and can assume no alias.
bool NullIsValidLocation = NullPointerIsDefined(&F);
if ((isObjectSmallerThan(
O2, getMinimalExtentFrom(*V1, V1Size, DL, NullIsValidLocation), DL,
TLI, NullIsValidLocation)) ||
(isObjectSmallerThan(
O1, getMinimalExtentFrom(*V2, V2Size, DL, NullIsValidLocation), DL,
TLI, NullIsValidLocation)))
return AliasResult::NoAlias;

With -aa-trace, we get:

End ptr getelementptr inbounds nuw (i8, ptr @g, i64 4) @ LocationSize::precise(1),   %gep1 = getelementptr i8, ptr %p, i64 4 @ LocationSize::precise(32) = NoAlias

This patch uses getTypeStoreSize to compute the correct access size for LOps. Instead of modifying the MemoryLocation for End (i.e., LOps.RootInsert), it also uses the computed base and AATag for correctness.

Closes #169921.

@llvmbot
Copy link
Member

llvmbot commented Nov 28, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

When LOps.RootInsert comes after LI2, since we use LI2 as the new insert point, we should make sure the memory region accessed by LOps isn't modified. However, the original implementation passes the bit width LOps.LoadSize as the number of bytes to be accessed, causing BasicAA to return NoAlias:

// If the size of one access is larger than the entire object on the other
// side, then we know such behavior is undefined and can assume no alias.
bool NullIsValidLocation = NullPointerIsDefined(&F);
if ((isObjectSmallerThan(
O2, getMinimalExtentFrom(*V1, V1Size, DL, NullIsValidLocation), DL,
TLI, NullIsValidLocation)) ||
(isObjectSmallerThan(
O1, getMinimalExtentFrom(*V2, V2Size, DL, NullIsValidLocation), DL,
TLI, NullIsValidLocation)))
return AliasResult::NoAlias;

With -aa-trace, we get:

End ptr getelementptr inbounds nuw (i8, ptr @<!-- -->g, i64 4) @ LocationSize::precise(1),   %gep1 = getelementptr i8, ptr %p, i64 4 @ LocationSize::precise(32) = NoAlias

This patch uses getTypeStoreSize to compute the correct access size for LOps. Instead of modifying the MemoryLocation for End (i.e., LOps.RootInsert), it also uses the computed base and AATag for correctness.

Closes #169921.


Full diff: https://github.com/llvm/llvm-project/pull/169953.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp (+10-2)
  • (modified) llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll (+53)
diff --git a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
index 7ed8fb68f107e..2397133fa61ef 100644
--- a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+++ b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
@@ -710,9 +710,17 @@ static bool foldLoadsRecursive(Value *V, LoadOps &LOps, const DataLayout &DL,
   MemoryLocation Loc;
   if (!Start->comesBefore(End)) {
     std::swap(Start, End);
-    Loc = MemoryLocation::get(End);
+    // If LOps.RootInsert comes after LI2, since we use LI2 as the new insert
+    // point, we should make sure whether the memory region accessed by LOps
+    // isn't modified.
     if (LOps.FoundRoot)
-      Loc = Loc.getWithNewSize(LOps.LoadSize);
+      Loc = MemoryLocation(
+          LOps.Root->getPointerOperand(),
+          LocationSize::precise(DL.getTypeStoreSize(
+              IntegerType::get(LI1->getContext(), LOps.LoadSize))),
+          LOps.AATags);
+    else
+      Loc = MemoryLocation::get(End);
   } else
     Loc = MemoryLocation::get(End);
   unsigned NumScanned = 0;
diff --git a/llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll b/llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll
index 46ec9e0a50842..f62a1ca15729b 100644
--- a/llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll
+++ b/llvm/test/Transforms/AggressiveInstCombine/X86/or-load.ll
@@ -2505,3 +2505,56 @@ entry:
   %or = or disjoint i32 %shl, %conv.2
   ret i32 %or
 }
+
+@g = global i64 1060856922120
+
+; Make sure we use the correct memory location for alias analysis.
+define i64 @loadcombine_consecutive_mayalias(ptr %p) {
+; LE-LABEL: @loadcombine_consecutive_mayalias(
+; LE-NEXT:  entry:
+; LE-NEXT:    [[LOAD3:%.*]] = load i32, ptr [[P:%.*]], align 4
+; LE-NEXT:    [[GEP1:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; LE-NEXT:    store i8 0, ptr getelementptr inbounds nuw (i8, ptr @g, i64 4), align 4
+; LE-NEXT:    [[LOAD2:%.*]] = load i32, ptr [[GEP1]], align 4
+; LE-NEXT:    [[TMP0:%.*]] = zext i32 [[LOAD2]] to i64
+; LE-NEXT:    [[TMP1:%.*]] = shl i64 [[TMP0]], 32
+; LE-NEXT:    [[ZEXT3:%.*]] = zext i32 [[LOAD3]] to i64
+; LE-NEXT:    [[LOAD1:%.*]] = or i64 [[TMP1]], [[ZEXT3]]
+; LE-NEXT:    [[RES:%.*]] = lshr i64 [[LOAD1]], 32
+; LE-NEXT:    ret i64 [[RES]]
+;
+; BE-LABEL: @loadcombine_consecutive_mayalias(
+; BE-NEXT:  entry:
+; BE-NEXT:    [[LOAD1:%.*]] = load i32, ptr [[P:%.*]], align 4
+; BE-NEXT:    [[GEP1:%.*]] = getelementptr i8, ptr [[P]], i64 4
+; BE-NEXT:    [[GEP2:%.*]] = getelementptr i8, ptr [[P]], i64 5
+; BE-NEXT:    store i8 0, ptr getelementptr inbounds nuw (i8, ptr @g, i64 4), align 4
+; BE-NEXT:    [[LOAD2:%.*]] = load i8, ptr [[GEP1]], align 4
+; BE-NEXT:    [[LOAD3:%.*]] = load i24, ptr [[GEP2]], align 1
+; BE-NEXT:    [[ZEXT1:%.*]] = zext i24 [[LOAD3]] to i64
+; BE-NEXT:    [[SHL1:%.*]] = shl i64 [[ZEXT1]], 40
+; BE-NEXT:    [[ZEXT2:%.*]] = zext i8 [[LOAD2]] to i64
+; BE-NEXT:    [[SHL2:%.*]] = shl i64 [[ZEXT2]], 32
+; BE-NEXT:    [[OR1:%.*]] = or i64 [[SHL1]], [[SHL2]]
+; BE-NEXT:    [[ZEXT3:%.*]] = zext i32 [[LOAD1]] to i64
+; BE-NEXT:    [[OR2:%.*]] = or i64 [[OR1]], [[ZEXT3]]
+; BE-NEXT:    [[RES:%.*]] = lshr i64 [[OR2]], 32
+; BE-NEXT:    ret i64 [[RES]]
+;
+entry:
+  %load1 = load i32, ptr %p, align 4
+  %gep1 = getelementptr i8, ptr %p, i64 4
+  %gep2 = getelementptr i8, ptr %p, i64 5
+  store i8 0, ptr getelementptr inbounds nuw (i8, ptr @g, i64 4), align 4
+  %load2 = load i8, ptr %gep1, align 4
+  %load3 = load i24, ptr %gep2, align 1
+  %zext1 = zext i24 %load3 to i64
+  %shl1 = shl i64 %zext1, 40
+  %zext2 = zext i8 %load2 to i64
+  %shl2 = shl i64 %zext2, 32
+  %or1 = or i64 %shl1, %shl2
+  %zext3 = zext i32 %load1 to i64
+  %or2 = or i64 %or1, %zext3
+  %res = lshr i64 %or2, 32
+  ret i64 %res
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit c7c6c0a into llvm:main Dec 1, 2025
12 checks passed
@dtcxzyw dtcxzyw deleted the fix-169921 branch December 1, 2025 14:46
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 1, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/34109

Here is the relevant piece of the build log for the reference
Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
...
PASS: MLIR-Unit :: IR/./MLIRIRTests/101/130 (3697 of 3708)
PASS: MLIR :: mlir-runner/verify-entry-point.mlir (3698 of 3708)
PASS: MLIR-Unit :: IR/./MLIRIRTests/38/130 (3699 of 3708)
PASS: MLIR-Unit :: IR/./MLIRIRTests/0/130 (3700 of 3708)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/11/22 (3701 of 3708)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/12/22 (3702 of 3708)
PASS: MLIR-Unit :: IR/./MLIRIRTests/39/130 (3703 of 3708)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/13/22 (3704 of 3708)
PASS: MLIR-Unit :: Pass/./MLIRPassTests/10/13 (3705 of 3708)
PASS: MLIR :: mlir-reduce/dce-test.mlir (3706 of 3708)
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=3284.543171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AggressiveInstCombine] Miscompilation at O3

5 participants