Faulty load-store forwarding with non byte-sized types #81793

bjope · 2024-02-14T21:24:20Z

Consider this example with a big-endian data layout:

target datalayout = "E"

define i8 @foo(ptr %p) {
entry:
  store i9 -1, ptr %p, align 1
  %r0 = load i8, ptr %p, align 1
  ret i8 %r0
}

If running that through instcombine the result is:

target datalayout = "E"

define i8 @foo(ptr %p) {
  store i9 -1, ptr %p, align 1
  ret i8 -1
}

Neither LLVM IR reference, nor the DataLayout, is describing where padding bits go when storing a non byte-sized type like that. In practice LLVM (at least the backends I know about) would (zero) extend up to the TypeStoreSize before storing the i9 to memory. So given a big-endian datalayout (and a 8-bit byte size) the above is a miscompile, as the load depends on the padding bits. Fora little-endian datalayout the optimization isn't totally wrong, but I don't think that LLVM should optimize this even for little-endian (afaik we have avioded such optimizations historically).

Here is an alive2 link showing that the above transformation is invalid: https://alive2.llvm.org/ce/z/LJpiuS

Assuming that we simply want to skip the transformation also for little endian, then a patch like this could solve the problem:

diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index f1dda97aa32d..b63c5078f88e 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -534,9 +534,11 @@ static Value *getAvailableLoadStore(Instruction *Inst, const Value *Ptr,
 
     TypeSize StoreSize = DL.getTypeSizeInBits(Val->getType());
     TypeSize LoadSize = DL.getTypeSizeInBits(AccessTy);
-    if (TypeSize::isKnownLE(LoadSize, StoreSize))
+    if (DL.typeSizeEqualsStoreSize(Val->getType()) &&
+        TypeSize::isKnownLE(LoadSize, StoreSize)) {
       if (auto *C = dyn_cast<Constant>(Val))
         return ConstantFoldLoadFromConst(C, AccessTy, DL);
+    }
   }

The text was updated successfully, but these errors were encountered:

bjope · 2024-02-15T11:43:15Z

Notice that the fix mentioned in the bug description is based on the actual fault seen here, and that we could avoid calling ConstantFoldLoadFromConst to solve this specific problem.
There is of course a risk that there are more bugs lurking around, and that a real fix rather should go inside ConstantFolding. ConstantFoldLoadFromConst is using ConstantFoldLoadFromUniformValue, so it would be nice to check if other uses of ConstantFoldLoadFromUniformValue already are more careful considering padding/endianness. The actual fix should perhaps be related to ConstantFoldLoadFromUniformValue.

Teaching ConstantFoldLoadFromUniformValue that types that are padded in memory can't be considered as uniform. Using the big hammer to prevent optimizations when loading from a constant for which DataLayout::typeSizeEqualsStoreSize would return false. Main problem solved would be something like this: store i17 -1, ptr %p, align 4 %v = load i8, ptr %p, align 1 If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't really tell where the padding goes. And even if we assume that the 15 most significant bits are padding, then they should be considered as undefined (even if LLVM backend typically would pad with zeroes). Anyway, for a big-endian target the load would read those most significant bits, which aren't guaranteed to be one's. So it would be wrong to constant fold the load as returning -1. If LLVM IR had been more explicit about the placement of padding, then we could allow the constant fold of the load in the example, but only for little-endian. Fixes: llvm#81793

…81854) Teaching ConstantFoldLoadFromUniformValue that types that are padded in memory can't be considered as uniform. Using the big hammer to prevent optimizations when loading from a constant for which DataLayout::typeSizeEqualsStoreSize would return false. Main problem solved would be something like this: store i17 -1, ptr %p, align 4 %v = load i8, ptr %p, align 1 If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't really tell where the padding goes. And even if we assume that the 15 most significant bits are padding, then they should be considered as undefined (even if LLVM backend typically would pad with zeroes). Anyway, for a big-endian target the load would read those most significant bits, which aren't guaranteed to be one's. So it would be wrong to constant fold the load as returning -1. If LLVM IR had been more explicit about the placement of padding, then we could allow the constant fold of the load in the example, but only for little-endian. Fixes: #81793

bjope added miscompilation llvm:instcombine labels Feb 14, 2024

bjope mentioned this issue Feb 15, 2024

[ConstantFolding] Do not consider padded-in-memory types as uniform #81854

Merged

bjope self-assigned this Feb 15, 2024

bjope closed this as completed in #81854 Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faulty load-store forwarding with non byte-sized types #81793

Faulty load-store forwarding with non byte-sized types #81793

bjope commented Feb 14, 2024 •

edited

Loading

bjope commented Feb 15, 2024

Faulty load-store forwarding with non byte-sized types #81793

Faulty load-store forwarding with non byte-sized types #81793

Comments

bjope commented Feb 14, 2024 • edited Loading

bjope commented Feb 15, 2024

bjope commented Feb 14, 2024 •

edited

Loading