Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faulty load-store forwarding with non byte-sized types #81793

Closed
bjope opened this issue Feb 14, 2024 · 1 comment · Fixed by #81854
Closed

Faulty load-store forwarding with non byte-sized types #81793

bjope opened this issue Feb 14, 2024 · 1 comment · Fixed by #81854

Comments

@bjope
Copy link
Collaborator

bjope commented Feb 14, 2024

Consider this example with a big-endian data layout:

target datalayout = "E"

define i8 @foo(ptr %p) {
entry:
  store i9 -1, ptr %p, align 1
  %r0 = load i8, ptr %p, align 1
  ret i8 %r0
}

If running that through instcombine the result is:

target datalayout = "E"

define i8 @foo(ptr %p) {
  store i9 -1, ptr %p, align 1
  ret i8 -1
}

Neither LLVM IR reference, nor the DataLayout, is describing where padding bits go when storing a non byte-sized type like that. In practice LLVM (at least the backends I know about) would (zero) extend up to the TypeStoreSize before storing the i9 to memory. So given a big-endian datalayout (and a 8-bit byte size) the above is a miscompile, as the load depends on the padding bits. Fora little-endian datalayout the optimization isn't totally wrong, but I don't think that LLVM should optimize this even for little-endian (afaik we have avioded such optimizations historically).

Here is an alive2 link showing that the above transformation is invalid: https://alive2.llvm.org/ce/z/LJpiuS

Assuming that we simply want to skip the transformation also for little endian, then a patch like this could solve the problem:

diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index f1dda97aa32d..b63c5078f88e 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -534,9 +534,11 @@ static Value *getAvailableLoadStore(Instruction *Inst, const Value *Ptr,
 
     TypeSize StoreSize = DL.getTypeSizeInBits(Val->getType());
     TypeSize LoadSize = DL.getTypeSizeInBits(AccessTy);
-    if (TypeSize::isKnownLE(LoadSize, StoreSize))
+    if (DL.typeSizeEqualsStoreSize(Val->getType()) &&
+        TypeSize::isKnownLE(LoadSize, StoreSize)) {
       if (auto *C = dyn_cast<Constant>(Val))
         return ConstantFoldLoadFromConst(C, AccessTy, DL);
+    }
   }

@bjope
Copy link
Collaborator Author

bjope commented Feb 15, 2024

Notice that the fix mentioned in the bug description is based on the actual fault seen here, and that we could avoid calling ConstantFoldLoadFromConst to solve this specific problem.
There is of course a risk that there are more bugs lurking around, and that a real fix rather should go inside ConstantFolding. ConstantFoldLoadFromConst is using ConstantFoldLoadFromUniformValue, so it would be nice to check if other uses of ConstantFoldLoadFromUniformValue already are more careful considering padding/endianness. The actual fix should perhaps be related to ConstantFoldLoadFromUniformValue.

bjope added a commit to bjope/llvm-project that referenced this issue Feb 15, 2024
Teaching ConstantFoldLoadFromUniformValue that types that are padded
in memory can't be considered as uniform.

Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.

Main problem solved would be something like this:
  store i17 -1, ptr %p, align 4
  %v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR
doesn't really tell where the padding goes. And even if we assume that
the 15 most significant bits are padding, then they should be
considered as undefined (even if LLVM backend typically would pad with
zeroes). Anyway, for a big-endian target the load would read those
most significant bits, which aren't guaranteed to be one's. So it
would be wrong to constant fold the load as returning -1.

If LLVM IR had been more explicit about the placement of padding,
then we could allow the constant fold of the load in the example,
but only for little-endian.

Fixes: llvm#81793
@bjope bjope self-assigned this Feb 15, 2024
bjope added a commit that referenced this issue Feb 15, 2024
…81854)

Teaching ConstantFoldLoadFromUniformValue that types that are padded in
memory can't be considered as uniform.

Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.

Main problem solved would be something like this:
  store i17 -1, ptr %p, align 4
  %v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't
really tell where the padding goes. And even if we assume that the 15
most significant bits are padding, then they should be considered as
undefined (even if LLVM backend typically would pad with zeroes).
Anyway, for a big-endian target the load would read those most
significant bits, which aren't guaranteed to be one's. So it would be
wrong to constant fold the load as returning -1.

If LLVM IR had been more explicit about the placement of padding, then
we could allow the constant fold of the load in the example, but only
for little-endian.

Fixes: #81793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant