Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopInterchange] Forbid interchange if load store type wider than element type #77885

Closed
wants to merge 1 commit into from

Conversation

ShivaChen
Copy link
Collaborator

Consider the following case, interchange is not valid as the store type(int) is wider than the array element type(char).

char p[7];
for (int j = 0; j < 2; ++j)
  for (int i = 0; i < 2; ++i)
    *((int*)&p[2*i+j]) = 2*i+j+1;

…ement type

Consider the following case, interchange is not valid as the store type(int) is
wider than the array element type(char).

  char p[7];
  for (int j = 0; j < 2; ++j)
    for (int i = 0; i < 2; ++i)
      *((int*)&p[2*i+j]) = 2*i+j+1;
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 12, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (ShivaChen)

Changes

Consider the following case, interchange is not valid as the store type(int) is wider than the array element type(char).

char p[7];
for (int j = 0; j &lt; 2; ++j)
  for (int i = 0; i &lt; 2; ++i)
    *((int*)&amp;p[2*i+j]) = 2*i+j+1;

Full diff: https://github.com/llvm/llvm-project/pull/77885.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+38)
  • (added) llvm/test/Transforms/LoopInterchange/not-interchanged-wider-store.ll (+46)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 277f530ee25fc1..443644c40cebab 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -82,6 +82,40 @@ static void printDepMatrix(CharMatrix &DepMatrix) {
 }
 #endif
 
+// Consider the following case, interchange is not valid as the store type(int)
+// is wider than the array element type(char).
+//
+// char p[7];
+// for (int j = 0; j < 2; ++j)
+//   for (int i = 0; i < 2; ++i)
+//     *((int*)&p[2*i+j]) = 2*i+j+1;
+//
+// Return true if the load/store type is wider than the element type.
+static bool isWiderLoadStore(Instruction *I) {
+  Value *Ptr = getLoadStorePointerOperand(I);
+  Type *ITy = getLoadStoreType(I);
+
+  if (!ITy->isSingleValueType())
+    return false;
+
+  if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr)) {
+    Type *ElementTy = GEP->getSourceElementType();
+
+    while (isa<ArrayType>(ElementTy)) {
+      ElementTy = ElementTy->getArrayElementType();
+      if (ElementTy->isSingleValueType())
+        if (ITy->getScalarSizeInBits() > ElementTy->getScalarSizeInBits()) {
+          LLVM_DEBUG(dbgs() << "Loads or Stores Type " << *ITy
+                            << " is wider than the element type " << *ElementTy
+                            << "\n");
+          return true;
+        }
+    }
+  }
+
+  return false;
+}
+
 static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
                                      Loop *L, DependenceInfo *DI,
                                      ScalarEvolution *SE) {
@@ -98,10 +132,14 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
       if (auto *Ld = dyn_cast<LoadInst>(&I)) {
         if (!Ld->isSimple())
           return false;
+        if (isWiderLoadStore(&I))
+          return false;
         MemInstr.push_back(&I);
       } else if (auto *St = dyn_cast<StoreInst>(&I)) {
         if (!St->isSimple())
           return false;
+        if (isWiderLoadStore(&I))
+          return false;
         MemInstr.push_back(&I);
       }
     }
diff --git a/llvm/test/Transforms/LoopInterchange/not-interchanged-wider-store.ll b/llvm/test/Transforms/LoopInterchange/not-interchanged-wider-store.ll
new file mode 100644
index 00000000000000..a50751b1344c78
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/not-interchanged-wider-store.ll
@@ -0,0 +1,46 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
+; RUN:     -S -debug 2>&1 | FileCheck %s
+
+;; Loops should not be interchanged in this case as the store is wider than
+;; array element type
+;;
+;;  char p[7];
+;;  for (int j = 0; j < 2; ++j)
+;;    for (int i = 0; i < 2; ++i)
+;;      *((int*)&p[2*i+j]) = 2*i+j+1;
+
+; CHECK: Loads or Stores Type i32 is wider than the element type i8
+; CHECK: Populating dependency matrix failed
+
+define i32 @main() {
+entry:
+  %p = alloca [7 x i8], align 1
+  br label %for.cond1.preheader
+
+; Loop:
+for.cond1.preheader:                              ; preds = %entry, %for.cond.cleanup3
+  %indvars.iv29 = phi i64 [ 0, %entry ], [ %indvars.iv.next30, %for.cond.cleanup3 ]
+  br label %for.body4
+
+for.cond.cleanup3:                                ; preds = %for.body4
+  %indvars.iv.next30 = add nuw nsw i64 %indvars.iv29, 1
+  %cmp = icmp eq i64 %indvars.iv29, 0
+  br i1 %cmp, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.body4:                                        ; preds = %for.cond1.preheader, %for.body4
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
+  %0 = shl nuw nsw i64 %indvars.iv, 1
+  %1 = add nuw nsw i64 %0, %indvars.iv29
+  %2 = add nuw nsw i64 %1, 1
+  %arrayidx = getelementptr inbounds [7 x i8], ptr %p, i64 0, i64 %1
+  %3 = trunc i64 %2 to i32
+  store i32 %3, ptr %arrayidx, align 1
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %cmp2 = icmp eq i64 %indvars.iv, 0
+  br i1 %cmp2, label %for.body4, label %for.cond.cleanup3
+
+; Exit blocks
+for.cond.cleanup:                                 ; preds = %for.cond.cleanup3
+  ret i32 0
+}

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using different type widths is not a sufficient condition for testing overwrites. It could be the same type, but with offsets single bytes apart. There could be other conditions, e.g. function calls/atomics that only partially overwrite the target. Admittedly, DependenceAnalysis currently bails out on those, but in the very least the comment should mention all the necessary conditions and explain why they don't need to be checked. Your comments only explain the what, not the why.

return false;

if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr)) {
Type *ElementTy = GEP->getSourceElementType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot rely on the GEP data type for semantics, and will probably go away in the future. See https://llvm.org/devmtg/2022-04-03/slides/keynote.Opaque.Pointers.Are.Coming.pdf slide 68.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information. I was mimicking how the delinearization extract the subscripts. I assume we should not increase the dependency to GEP. Thanks.

// Consider the following case, interchange is not valid as the store type(int)
// is wider than the array element type(char).
//
// char p[7];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of char, I could also have used int and casted it to (char*) before accessing. Or use array base pointers that are single bytes apart (PartialAlias).

@ShivaChen
Copy link
Collaborator Author

Thanks for your kind explanation and bring more cases that could have overwritten.
Another angle could be would the casting between int and char types is well-defined behavior.
If it isn't, should we abandon the PR because the compiler may not be able to guarantee the results?

@Meinersbur
Copy link
Member

Yes, I suggest to abandon this approach. Delinearization by use the GEP data type, but would also have to change if the GEP type goes away. Better not add a dependency on it.

However, it is "safe" for delinerization because passes are required to add runtime checks that check that verify that the indices fall into the expected range (e.g. no negative). In worst case when the delinerization heuristic is completely wrong the runtime check would always fail and the fallback (non-optimized) version to be executed.

@ShivaChen
Copy link
Collaborator Author

Abandon due to the case might be undefined behavior.

@ShivaChen ShivaChen closed this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants