Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ConstantFolding] Do not consider padded-in-memory types as uniform #81854

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

bjope
Copy link
Collaborator

@bjope bjope commented Feb 15, 2024

Teaching ConstantFoldLoadFromUniformValue that types that are padded in memory can't be considered as uniform.

Using the big hammer to prevent optimizations when loading from a constant for which DataLayout::typeSizeEqualsStoreSize would return false.

Main problem solved would be something like this:
store i17 -1, ptr %p, align 4
%v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't really tell where the padding goes. And even if we assume that the 15 most significant bits are padding, then they should be considered as undefined (even if LLVM backend typically would pad with zeroes). Anyway, for a big-endian target the load would read those most significant bits, which aren't guaranteed to be one's. So it would be wrong to constant fold the load as returning -1.

If LLVM IR had been more explicit about the placement of padding, then we could allow the constant fold of the load in the example, but only for little-endian.

Fixes: #81793

Teaching ConstantFoldLoadFromUniformValue that types that are padded
in memory can't be considered as uniform.

Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.

Main problem solved would be something like this:
  store i17 -1, ptr %p, align 4
  %v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR
doesn't really tell where the padding goes. And even if we assume that
the 15 most significant bits are padding, then they should be
considered as undefined (even if LLVM backend typically would pad with
zeroes). Anyway, for a big-endian target the load would read those
most significant bits, which aren't guaranteed to be one's. So it
would be wrong to constant fold the load as returning -1.

If LLVM IR had been more explicit about the placement of padding,
then we could allow the constant fold of the load in the example,
but only for little-endian.

Fixes: llvm#81793
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 15, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Björn Pettersson (bjope)

Changes

Teaching ConstantFoldLoadFromUniformValue that types that are padded in memory can't be considered as uniform.

Using the big hammer to prevent optimizations when loading from a constant for which DataLayout::typeSizeEqualsStoreSize would return false.

Main problem solved would be something like this:
store i17 -1, ptr %p, align 4
%v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't really tell where the padding goes. And even if we assume that the 15 most significant bits are padding, then they should be considered as undefined (even if LLVM backend typically would pad with zeroes). Anyway, for a big-endian target the load would read those most significant bits, which aren't guaranteed to be one's. So it would be wrong to constant fold the load as returning -1.

If LLVM IR had been more explicit about the placement of padding, then we could allow the constant fold of the load in the example, but only for little-endian.

Fixes: #81793


Full diff: https://github.com/llvm/llvm-project/pull/81854.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ConstantFolding.h (+2-1)
  • (modified) llvm/lib/Analysis/ConstantFolding.cpp (+11-6)
  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+2-2)
  • (modified) llvm/lib/Transforms/IPO/Attributor.cpp (+1-1)
  • (modified) llvm/lib/Transforms/IPO/GlobalOpt.cpp (+1-1)
  • (modified) llvm/test/Transforms/InstSimplify/ConstProp/loads.ll (+34-1)
diff --git a/llvm/include/llvm/Analysis/ConstantFolding.h b/llvm/include/llvm/Analysis/ConstantFolding.h
index 1b194b07e86781..fd885a8c0ea3e5 100644
--- a/llvm/include/llvm/Analysis/ConstantFolding.h
+++ b/llvm/include/llvm/Analysis/ConstantFolding.h
@@ -174,7 +174,8 @@ Constant *ConstantFoldLoadFromConstPtr(Constant *C, Type *Ty,
 /// ones, all undef or all poison), return the corresponding uniform value in
 /// the new type. If the value is not uniform or the result cannot be
 /// represented, return null.
-Constant *ConstantFoldLoadFromUniformValue(Constant *C, Type *Ty);
+Constant *ConstantFoldLoadFromUniformValue(Constant *C, Type *Ty,
+                                           const DataLayout &DL);
 
 /// canConstantFoldCallTo - Return true if its even possible to fold a call to
 /// the specified function.
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 90da3390eab324..8b7031e7fe4a6f 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -106,7 +106,7 @@ Constant *FoldBitCast(Constant *C, Type *DestTy, const DataLayout &DL) {
          "Invalid constantexpr bitcast!");
 
   // Catch the obvious splat cases.
-  if (Constant *Res = ConstantFoldLoadFromUniformValue(C, DestTy))
+  if (Constant *Res = ConstantFoldLoadFromUniformValue(C, DestTy, DL))
     return Res;
 
   if (auto *VTy = dyn_cast<VectorType>(C->getType())) {
@@ -342,7 +342,7 @@ bool llvm::IsConstantOffsetFromGlobal(Constant *C, GlobalValue *&GV,
 }
 
 Constant *llvm::ConstantFoldLoadThroughBitcast(Constant *C, Type *DestTy,
-                                         const DataLayout &DL) {
+                                               const DataLayout &DL) {
   do {
     Type *SrcTy = C->getType();
     if (SrcTy == DestTy)
@@ -355,7 +355,7 @@ Constant *llvm::ConstantFoldLoadThroughBitcast(Constant *C, Type *DestTy,
 
     // Catch the obvious splat cases (since all-zeros can coerce non-integral
     // pointers legally).
-    if (Constant *Res = ConstantFoldLoadFromUniformValue(C, DestTy))
+    if (Constant *Res = ConstantFoldLoadFromUniformValue(C, DestTy, DL))
       return Res;
 
     // If the type sizes are the same and a cast is legal, just directly
@@ -709,7 +709,7 @@ Constant *llvm::ConstantFoldLoadFromConst(Constant *C, Type *Ty,
     return PoisonValue::get(Ty);
 
   // Try an offset-independent fold of a uniform value.
-  if (Constant *Result = ConstantFoldLoadFromUniformValue(C, Ty))
+  if (Constant *Result = ConstantFoldLoadFromUniformValue(C, Ty, DL))
     return Result;
 
   // Try hard to fold loads from bitcasted strange and non-type-safe things.
@@ -745,7 +745,7 @@ Constant *llvm::ConstantFoldLoadFromConstPtr(Constant *C, Type *Ty,
 
   // If this load comes from anywhere in a uniform constant global, the value
   // is always the same, regardless of the loaded offset.
-  return ConstantFoldLoadFromUniformValue(GV->getInitializer(), Ty);
+  return ConstantFoldLoadFromUniformValue(GV->getInitializer(), Ty, DL);
 }
 
 Constant *llvm::ConstantFoldLoadFromConstPtr(Constant *C, Type *Ty,
@@ -754,11 +754,16 @@ Constant *llvm::ConstantFoldLoadFromConstPtr(Constant *C, Type *Ty,
   return ConstantFoldLoadFromConstPtr(C, Ty, Offset, DL);
 }
 
-Constant *llvm::ConstantFoldLoadFromUniformValue(Constant *C, Type *Ty) {
+Constant *llvm::ConstantFoldLoadFromUniformValue(Constant *C, Type *Ty,
+                                                 const DataLayout &DL) {
   if (isa<PoisonValue>(C))
     return PoisonValue::get(Ty);
   if (isa<UndefValue>(C))
     return UndefValue::get(Ty);
+  // If padding is needed when storing C to memory, then it isn't considered as
+  // uniform.
+  if (!DL.typeSizeEqualsStoreSize(C->getType()))
+    return nullptr;
   if (C->isNullValue() && !Ty->isX86_MMXTy() && !Ty->isX86_AMXTy())
     return Constant::getNullValue(Ty);
   if (C->isAllOnesValue() &&
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 08050becd2df88..201472a3f10c2e 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -6960,8 +6960,8 @@ Value *llvm::simplifyLoadInst(LoadInst *LI, Value *PtrOp,
 
   // If GlobalVariable's initializer is uniform, then return the constant
   // regardless of its offset.
-  if (Constant *C =
-          ConstantFoldLoadFromUniformValue(GV->getInitializer(), LI->getType()))
+  if (Constant *C = ConstantFoldLoadFromUniformValue(GV->getInitializer(),
+                                                     LI->getType(), Q.DL))
     return C;
 
   // Try to convert operand into a constant by stripping offsets while looking
diff --git a/llvm/lib/Transforms/IPO/Attributor.cpp b/llvm/lib/Transforms/IPO/Attributor.cpp
index 5d1a783b2996d7..72a2aadc204ba8 100644
--- a/llvm/lib/Transforms/IPO/Attributor.cpp
+++ b/llvm/lib/Transforms/IPO/Attributor.cpp
@@ -275,7 +275,7 @@ AA::getInitialValueForObj(Attributor &A, const AbstractAttribute &QueryingAA,
     return ConstantFoldLoadFromConst(Initializer, &Ty, Offset, DL);
   }
 
-  return ConstantFoldLoadFromUniformValue(Initializer, &Ty);
+  return ConstantFoldLoadFromUniformValue(Initializer, &Ty, DL);
 }
 
 bool AA::isValidInScope(const Value &V, const Function *Scope) {
diff --git a/llvm/lib/Transforms/IPO/GlobalOpt.cpp b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
index 42828b4f416800..c92b5d82fc85a4 100644
--- a/llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ b/llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -296,7 +296,7 @@ static bool CleanupConstantGlobalUsers(GlobalVariable *GV,
       // A load from a uniform value is always the same, regardless of any
       // applied offset.
       Type *Ty = LI->getType();
-      if (Constant *Res = ConstantFoldLoadFromUniformValue(Init, Ty)) {
+      if (Constant *Res = ConstantFoldLoadFromUniformValue(Init, Ty, DL)) {
         LI->replaceAllUsesWith(Res);
         EraseFromParent(LI);
         continue;
diff --git a/llvm/test/Transforms/InstSimplify/ConstProp/loads.ll b/llvm/test/Transforms/InstSimplify/ConstProp/loads.ll
index 3bc301743a1835..d4c49faf91b091 100644
--- a/llvm/test/Transforms/InstSimplify/ConstProp/loads.ll
+++ b/llvm/test/Transforms/InstSimplify/ConstProp/loads.ll
@@ -407,12 +407,45 @@ define ptr addrspace(2) @load_non_integral_ptr_from_i8_data() {
 
 define i8 @load_i8_from_i1() {
 ; CHECK-LABEL: @load_i8_from_i1(
-; CHECK-NEXT:    ret i8 -1
+; CHECK-NEXT:    [[V:%.*]] = load i8, ptr @g_i1, align 1
+; CHECK-NEXT:    ret i8 [[V]]
 ;
   %v = load i8, ptr @g_i1
   ret i8 %v
 }
 
+@global9 = internal constant i9 -1
+
+; Reproducer for https://github.com/llvm/llvm-project/issues/81793
+define i8 @load_i8_from_i9() {
+; CHECK-LABEL: @load_i8_from_i9(
+; CHECK-NEXT:    [[V:%.*]] = load i8, ptr @global9, align 1
+; CHECK-NEXT:    ret i8 [[V]]
+;
+  %v = load i8, ptr @global9
+  ret i8 %v
+}
+
+define i9 @load_i9_from_i9() {
+; CHECK-LABEL: @load_i9_from_i9(
+; CHECK-NEXT:    ret i9 -1
+;
+  %v = load i9, ptr @global9
+  ret i9 %v
+}
+
+; Reproducer for https://github.com/llvm/llvm-project/issues/81793
+define i16 @load_i16_from_i17_store(ptr %p) {
+; CHECK-LABEL: @load_i16_from_i17_store(
+; CHECK-NEXT:    store i17 -1, ptr [[P:%.*]], align 4
+; CHECK-NEXT:    [[V:%.*]] = load i16, ptr @global9, align 2
+; CHECK-NEXT:    ret i16 [[V]]
+;
+  store i17 -1, ptr %p
+  %v = load i16, ptr @global9
+  ret i16 %v
+}
+
 @global128 = internal constant i128 1125899906842625
 define i128 @load-128bit(){
 ; CHECK-LABEL: @load-128bit(

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Another point in favor of forbidding non-byte-sized loads/stores...

@bjope bjope merged commit 7677453 into llvm:main Feb 15, 2024
7 checks passed
@bjope bjope deleted the pr81793 branch March 20, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faulty load-store forwarding with non byte-sized types
3 participants