Splits cleanup block lowered by AsyncToAsyncRuntime. #66123

yliu120 · 2023-09-12T18:34:09Z

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

suspend
resume
cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

Entry block
        |                    \
   resume             |
        |                    |
            Cleanup
                   |
                End

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

  Entry (calling async.coro.suspend)
       |                    \
  Resume           Destroy (duplicate of Cleanup)
       |                     |
  Cleanup             |
       |                    /
     End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

…up blocks for resume and destroy are separated.

llvmbot · 2023-09-12T18:37:16Z

@llvm/pr-subscribers-mlir-async

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

suspend
resume
cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
(modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)

diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9f..d1a913409ca3663 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee6..36583b2b94a3c93 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

llvmbot · 2023-09-12T18:37:25Z

@llvm/pr-subscribers-mlir

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

suspend
resume
cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
(modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)

diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9f..d1a913409ca3663 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee6..36583b2b94a3c93 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

llvmbot · 2023-09-12T18:37:25Z

@llvm/pr-subscribers-mlir-core

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

suspend
resume
cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
(modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)

diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9..d1a913409ca366 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee..36583b2b94a3c9 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

Splits the cleanup block lowered from AsyncToAsyncRuntime. The incentive of this change is to clarify the CFG branched by `async.coro.suspend`. The `async.coro.suspend` op branches into 3 blocks, depending on the state of the coroutine: 1) suspend 2) resume 3) cleanup The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following, Entry block | \ resume | | | Cleanup | End This CFG can potentially be problematic, because the `Cleanup` block is a shared block and it is not dominated by `resume`. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared `Cleanup`, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path. After this change, the CFG will look like the following, The overall structure of the lowered CFG can be the following, Entry (calling async.coro.suspend) | \ Resume Destroy (duplicate of Cleanup) | | Cleanup | | / End (ends the corontine) In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Splits cleanup block lowered by AsyncToAsyncRuntime so that the clean…

eb1435a

…up blocks for resume and destroy are separated.

yliu120 requested a review from a team as a code owner September 12, 2023 18:34

llvmbot added mlir:core MLIR Core Infrastructure mlir mlir:async labels Sep 12, 2023

ezhulenev approved these changes Sep 12, 2023

View reviewed changes

ezhulenev merged commit af562fd into llvm:main Sep 12, 2023

vzakhari mentioned this pull request Sep 12, 2023

internap proc trampolines #66156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Splits cleanup block lowered by AsyncToAsyncRuntime. #66123

Splits cleanup block lowered by AsyncToAsyncRuntime. #66123

Uh oh!

yliu120 commented Sep 12, 2023 •

edited

Loading

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

Uh oh!

Splits cleanup block lowered by AsyncToAsyncRuntime. #66123

Splits cleanup block lowered by AsyncToAsyncRuntime. #66123

Uh oh!

Conversation

yliu120 commented Sep 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

llvmbot commented Sep 12, 2023

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Uh oh!

Uh oh!

yliu120 commented Sep 12, 2023 •

edited

Loading