Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][llvm] Drop unreachable basic block during import #78467

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

gysit
Copy link
Contributor

@gysit gysit commented Jan 17, 2024

This revision updates the LLVM IR import to support unreachable basic blocks. An unreachable block may dominate itself and a value defined inside the block may thus be used before its definition. The import does not support such dependencies. We thus delete the unreachable basic blocks before the import. This is possible since MLIR does not have basic block labels that can be reached using an indirect call and unreachable blocks can indeed be deleted safely.

Additionally, add a small poison constant import test.

@gysit gysit force-pushed the unreachable-blocks branch 2 times, most recently from 06db79b to e2e5051 Compare January 17, 2024 16:46
@gysit gysit requested a review from Dinistro January 17, 2024 16:51
@gysit gysit marked this pull request as ready for review January 17, 2024 18:36
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: Tobias Gysi (gysit)

Changes

This revision updates the LLVM IR import to support unreachable basic blocks. An unreachable block may dominate itself and a value defined inside the block may thus be used before its definition. The import does not support such dependencies. We thus delete the unreachable basic block since MLIR does not have basic block labels that can be reached using an indirect call and unreachable blocks are indeed dead for us.

Additionally, add a small poison constant import test.


Full diff: https://github.com/llvm/llvm-project/pull/78467.diff

4 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/ModuleImport.cpp (+31-15)
  • (modified) mlir/test/Target/LLVMIR/Import/constant.ll (+10)
  • (modified) mlir/test/Target/LLVMIR/Import/exception.ll (+21-20)
  • (added) mlir/test/Target/LLVMIR/Import/unreachable-blocks.ll (+13)
diff --git a/mlir/lib/Target/LLVMIR/ModuleImport.cpp b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
index e905408a1e0897..928d8077175ccf 100644
--- a/mlir/lib/Target/LLVMIR/ModuleImport.cpp
+++ b/mlir/lib/Target/LLVMIR/ModuleImport.cpp
@@ -26,6 +26,7 @@
 #include "mlir/Interfaces/DataLayoutInterfaces.h"
 #include "mlir/Tools/mlir-translate/Translation.h"
 
+#include "llvm/ADT/DepthFirstIterator.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/StringSet.h"
@@ -132,18 +133,17 @@ static LogicalResult convertInstructionImpl(OpBuilder &odsBuilder,
   return failure();
 }
 
-/// Get a topologically sorted list of blocks for the given function.
+/// Get a topologically sorted list of blocks for the given basic blocks.
 static SetVector<llvm::BasicBlock *>
-getTopologicallySortedBlocks(llvm::Function *func) {
+getTopologicallySortedBlocks(ArrayRef<llvm::BasicBlock *> basicBlocks) {
   SetVector<llvm::BasicBlock *> blocks;
-  for (llvm::BasicBlock &bb : *func) {
-    if (!blocks.contains(&bb)) {
-      llvm::ReversePostOrderTraversal<llvm::BasicBlock *> traversal(&bb);
+  for (llvm::BasicBlock *basicBlock : basicBlocks) {
+    if (!blocks.contains(basicBlock)) {
+      llvm::ReversePostOrderTraversal<llvm::BasicBlock *> traversal(basicBlock);
       blocks.insert(traversal.begin(), traversal.end());
     }
   }
-  assert(blocks.size() == func->size() && "some blocks are not sorted");
-
+  assert(blocks.size() == basicBlocks.size() && "some blocks are not sorted");
   return blocks;
 }
 
@@ -1859,11 +1859,26 @@ LogicalResult ModuleImport::processFunction(llvm::Function *func) {
   if (func->isDeclaration())
     return success();
 
-  // Eagerly create all blocks.
-  for (llvm::BasicBlock &bb : *func) {
-    Block *block =
-        builder.createBlock(&funcOp.getBody(), funcOp.getBody().end());
-    mapBlock(&bb, block);
+  // Collect the set of basic blocks reachable from the function's entry block.
+  // This step is crucial as LLVM IR can contain unreachable blocks that
+  // self-dominate. As a result, an operation might utilize a variable it
+  // defines, which the import does not support. Given that MLIR lacks block
+  // label support, we can safely remove unreachable blocks, as there are no
+  // indirect branch instructions that could potentially target these blocks.
+  llvm::df_iterator_default_set<llvm::BasicBlock *> reachable;
+  for (llvm::BasicBlock *basicBlock : llvm::depth_first_ext(func, reachable))
+    (void)basicBlock;
+
+  // Eagerly create all reachable blocks.
+  SmallVector<llvm::BasicBlock *> reachableBasicBlocks;
+  for (llvm::BasicBlock &basicBlock : *func) {
+    // Skip unreachable blocks.
+    if (!reachable.contains(&basicBlock))
+      continue;
+    Region &body = funcOp.getBody();
+    Block *block = builder.createBlock(&body, body.end());
+    mapBlock(&basicBlock, block);
+    reachableBasicBlocks.push_back(&basicBlock);
   }
 
   // Add function arguments to the entry block.
@@ -1876,10 +1891,11 @@ LogicalResult ModuleImport::processFunction(llvm::Function *func) {
   // Process the blocks in topological order. The ordered traversal ensures
   // operands defined in a dominating block have a valid mapping to an MLIR
   // value once a block is translated.
-  SetVector<llvm::BasicBlock *> blocks = getTopologicallySortedBlocks(func);
+  SetVector<llvm::BasicBlock *> blocks =
+      getTopologicallySortedBlocks(reachableBasicBlocks);
   setConstantInsertionPointToStart(lookupBlock(blocks.front()));
-  for (llvm::BasicBlock *bb : blocks)
-    if (failed(processBasicBlock(bb, lookupBlock(bb))))
+  for (llvm::BasicBlock *basicBlock : blocks)
+    if (failed(processBasicBlock(basicBlock, lookupBlock(basicBlock))))
       return failure();
 
   // Process the debug intrinsics that require a delayed conversion after
diff --git a/mlir/test/Target/LLVMIR/Import/constant.ll b/mlir/test/Target/LLVMIR/Import/constant.ll
index cd2d00ec0aa783..3c46f5b20c31ca 100644
--- a/mlir/test/Target/LLVMIR/Import/constant.ll
+++ b/mlir/test/Target/LLVMIR/Import/constant.ll
@@ -47,6 +47,16 @@ define void @undef_constant(i32 %arg0) {
 
 ; // -----
 
+; CHECK-LABEL: @poison_constant
+define void @poison_constant(double %arg0) {
+  ; CHECK:  %[[POISON:.+]] = llvm.mlir.poison : f64
+  ; CHECK:  llvm.fadd %[[POISON]], %{{.*}} : f64
+  %1 = fadd double poison, %arg0
+  ret void
+}
+
+; // -----
+
 ; CHECK-LABEL: @null_constant
 define ptr @null_constant() {
   ; CHECK:  %[[NULL:[0-9]+]] = llvm.mlir.zero : !llvm.ptr
diff --git a/mlir/test/Target/LLVMIR/Import/exception.ll b/mlir/test/Target/LLVMIR/Import/exception.ll
index de227645cc15bb..440d89ec147f76 100644
--- a/mlir/test/Target/LLVMIR/Import/exception.ll
+++ b/mlir/test/Target/LLVMIR/Import/exception.ll
@@ -12,34 +12,35 @@ define i32 @invokeLandingpad() personality ptr @__gxx_personality_v0 {
   ; CHECK: %[[a1:[0-9]+]] = llvm.mlir.addressof @_ZTIii : !llvm.ptr
   ; CHECK: %[[a3:[0-9]+]] = llvm.alloca %{{[0-9]+}} x i8 {alignment = 1 : i64} : (i32) -> !llvm.ptr
   %1 = alloca i8
-  ; CHECK: llvm.invoke @foo(%[[a3]]) to ^bb2 unwind ^bb1 : (!llvm.ptr) -> ()
-  invoke void @foo(ptr %1) to label %4 unwind label %2
+  ; CHECK: llvm.invoke @foo(%[[a3]]) to ^[[bb1:.*]] unwind ^[[bb4:.*]] : (!llvm.ptr) -> ()
+  invoke void @foo(ptr %1) to label %bb1 unwind label %bb4
 
-; CHECK: ^bb1:
+; CHECK: ^[[bb1]]:
+bb1:
+  ; CHECK: %{{[0-9]+}} = llvm.invoke @bar(%[[a3]]) to ^[[bb2:.*]] unwind ^[[bb4]] : (!llvm.ptr) -> !llvm.ptr
+  %2 = invoke ptr @bar(ptr %1) to label %bb2 unwind label %bb4
+
+; CHECK: ^[[bb2]]:
+bb2:
+  ; CHECK: llvm.invoke @vararg_foo(%[[a3]], %{{.*}}) to ^[[bb3:.*]] unwind ^[[bb4]] vararg(!llvm.func<void (ptr, ...)>) : (!llvm.ptr, i32) -> ()
+  invoke void (ptr, ...) @vararg_foo(ptr %1, i32 0) to label %bb3 unwind label %bb4
+
+; CHECK: ^[[bb3]]:
+bb3:
+  ; CHECK: llvm.invoke %{{.*}}(%[[a3]], %{{.*}}) to ^[[bb5:.*]] unwind ^[[bb4]] vararg(!llvm.func<void (ptr, ...)>) : !llvm.ptr, (!llvm.ptr, i32) -> ()
+  invoke void (ptr, ...) undef(ptr %1, i32 0) to label %bb5 unwind label %bb4
+
+; CHECK: ^[[bb4]]:
+bb4:
   ; CHECK: %{{[0-9]+}} = llvm.landingpad (catch %{{[0-9]+}} : !llvm.ptr) (catch %[[a1]] : !llvm.ptr) (filter %{{[0-9]+}} : !llvm.array<1 x i1>) : !llvm.struct<(ptr, i32)>
   %3 = landingpad { ptr, i32 } catch ptr @_ZTIi catch ptr @_ZTIii
           filter [1 x i1] [i1 1]
   resume { ptr, i32 } %3
 
-; CHECK: ^bb2:
+; CHECK: ^[[bb5]]:
+bb5:
   ; CHECK: llvm.return %{{[0-9]+}} : i32
   ret i32 1
-
-; CHECK: ^bb3:
-  ; CHECK: %{{[0-9]+}} = llvm.invoke @bar(%[[a3]]) to ^bb2 unwind ^bb1 : (!llvm.ptr) -> !llvm.ptr
-  %6 = invoke ptr @bar(ptr %1) to label %4 unwind label %2
-
-; CHECK: ^bb4:
-  ; CHECK: llvm.invoke @vararg_foo(%[[a3]], %{{.*}}) to ^bb2 unwind ^bb1 vararg(!llvm.func<void (ptr, ...)>) : (!llvm.ptr, i32) -> ()
-  invoke void (ptr, ...) @vararg_foo(ptr %1, i32 0) to label %4 unwind label %2
-
-; CHECK: ^bb5:
-  ; CHECK: llvm.invoke %{{.*}}(%[[a3]], %{{.*}}) to ^bb2 unwind ^bb1 vararg(!llvm.func<void (ptr, ...)>) : !llvm.ptr, (!llvm.ptr, i32) -> ()
-  invoke void (ptr, ...) undef(ptr %1, i32 0) to label %4 unwind label %2
-
-; CHECK: ^bb6:
-  ; CHECK: llvm.return %{{[0-9]+}} : i32
-  ret i32 0
 }
 
 declare i32 @foo2()
diff --git a/mlir/test/Target/LLVMIR/Import/unreachable-blocks.ll b/mlir/test/Target/LLVMIR/Import/unreachable-blocks.ll
new file mode 100644
index 00000000000000..15d7f2f3f09c89
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/Import/unreachable-blocks.ll
@@ -0,0 +1,13 @@
+; RUN: mlir-translate -import-llvm %s | FileCheck %s
+
+; CHECK-LABEL: llvm.func @unreachable_block
+; CHECK-NEXT: llvm.return
+; CHECK-NOT: llvm.fadd
+define void @unreachable_block(float %0) {
+.entry:
+  ret void
+
+unreachable:
+  %1 = fadd float %0, %1
+  br label %unreachable
+}

@joker-eph
Copy link
Collaborator

I'm confused because the title says "Import unreachable basic blocks" but the description says "We thus delete the unreachable basic block"?

@gysit gysit changed the title [mlir][llvm] Import unreachable basic blocks [mlir][llvm] Drop unreachable basic block during import Jan 17, 2024
@gysit
Copy link
Contributor Author

gysit commented Jan 17, 2024

I'm confused because the title says "Import unreachable basic blocks" but the description says "We thus delete the unreachable basic block"?

I tried to improve title and comment. That was indeed a failed attempt of coming up with a short title.

The revision indeed deletes unreachable basic blocks before the import since importing an unreachable block is not possible in the general case. The problem is that instructions can use a value that they define themselves. This is possible since an unreachable block can dominate itself.

The fadd in the following code is an example for such a self dependency.

define void @unreachable_block(float %0) {
.entry:
  ret void

unreachable:
  %1 = fadd float %0, %1
  br label %unreachable
}

@joker-eph
Copy link
Collaborator

The revision indeed deletes unreachable basic blocks before the import since importing an unreachable block is not possible in the general case. The problem is that instructions can use a value that they define themselves. This is possible since an unreachable block can dominate itself.

I know about self-referencing instructions, but I don't quite get why we can't import this in MLIR?

@Dinistro
Copy link
Contributor

Turns out the MLIR also supports this. I was honestly not aware of this, and I fail to see any usecase for this, as the conversion framework doesn't touch unreachable blocks. It seems adding support for that might be possible by relaxing some assumptions, but the canonicalizer will afterwards just remove these unreachable blocks.

Thus, I'm in favor of taking the path of least resistance:
If this can be done easily, lets import these self-referencing operations, otherwise drop these constructs directly in the import, as done in this PR.

@gysit
Copy link
Contributor Author

gysit commented Jan 18, 2024

I know about self-referencing instructions, but I don't quite get why we can't import this in MLIR?

The import in its current form cannot support this since all operand values need to exist before the operation is converted. There is currently no mechanism to create temporary values that then later on are magically replaced. A possibly solution may be to have an LLVM pass before the import that introduces phi instructions for such values. That sounds really complex to get right in the general case though.

Is there a use case for such unreachable blocks in MLIR? Canonicalize immediately deletes such a block since there to the best of my knowledge there is no way to reach an unreachable block in MLIR (since there is no indirect branch or similar). I would thus strongly favor the current solution which is much simpler and leads to the same result.

This revision updates the LLVM IR import to support unreachable basic
blocks. An unreachable block may dominate itself and a value defined
inside the block may thus be used before its definition. The import does
not support such dependencies. We thus delete the unreachable basic
block since MLIR does not have basic block labels that can be reached
using an indirect call and unreachable blocks are indeed dead for us.

Additionally, add a small poison constant import test.
@joker-eph
Copy link
Collaborator

The import in its current form cannot support this since all operand values need to exist before the operation is converted. There is currently no mechanism to create temporary values that then later on are magically replaced.

Sure, but such mechanism could be implemented fairly easily right?

Is there a use case for such unreachable blocks in MLIR?

There may not be actual use-cases to import them in practice. It just makes me always a bit nervous when the translation isn't "faithful" ;)

@gysit
Copy link
Contributor Author

gysit commented Jan 18, 2024

Sure, but such mechanism could be implemented fairly easily right?

It is definitely more complex than the current solution. It requires at least one additional mapping from the initially unknown values to all their uses and some dummy poison values that are used initially. Plus I guess we can only check at the end of the translation if all unknown values have been set. If possible I would like to avoid that complexity.

@gysit
Copy link
Contributor Author

gysit commented Jan 19, 2024

I prototyped a version that imports the code as is without deleting the unreachable blocks. It turns out to be quite tricky to get right since the change interferes with the error handling of the import. In particular, the convertValue method is used in different contexts and only in some of them it is ok to introduce a placeholder value. After the import, the unreachable blocks either way need to be deleted since the conversion infrastructure does not support them AFAIK.

As this PR fixes an existing bug, I suggest to land it (it is definitely an improvement over the status quo). Once we have a concrete use case, we can follow up with an version that introduces placeholder variables to support self-referencing instructions.

Copy link
Contributor

@Dinistro Dinistro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. I'm fine with fixing this bug and deferring a potential support for unreachable basic blocks to a later revision.

@gysit gysit merged commit 9dd0eb9 into llvm:main Jan 19, 2024
3 of 4 checks passed
@gysit gysit deleted the unreachable-blocks branch January 19, 2024 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants