Skip to content

Conversation

yliu120
Copy link
Contributor

@yliu120 yliu120 commented Sep 12, 2023

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

  1. suspend
  2. resume
  3. cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

Entry block
        |                    \
   resume             |
        |                    |
            Cleanup
                   |
                End

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

  Entry (calling async.coro.suspend)
       |                    \
  Resume           Destroy (duplicate of Cleanup)
       |                     |
  Cleanup             |
       |                    /
     End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

…up blocks for resume and destroy are separated.
@yliu120 yliu120 requested a review from a team as a code owner September 12, 2023 18:34
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir mlir:async labels Sep 12, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-mlir-async

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

  1. suspend
  2. resume
  3. cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

Entry block
|
resume |
| |
Cleanup
|
End

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

Entry (calling async.coro.suspend)
|
Resume Destroy (duplicate of Cleanup)
| |
Cleanup |
| /
End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
  • (modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)
diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9f..d1a913409ca3663 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee6..36583b2b94a3c93 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-mlir

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

  1. suspend
  2. resume
  3. cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

Entry block
|
resume |
| |
Cleanup
|
End

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

Entry (calling async.coro.suspend)
|
Resume Destroy (duplicate of Cleanup)
| |
Cleanup |
| /
End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
  • (modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)
diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9f..d1a913409ca3663 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee6..36583b2b94a3c93 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-mlir-core

Changes

Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by async.coro.suspend.

The async.coro.suspend op branches into 3 blocks, depending on the state of the coroutine:

  1. suspend
  2. resume
  3. cleanup

The behavior before this change is that after the coroutine is resumed and completed, it will jump to a shared cleanup block for destroying the states of coroutines. The CFG looks like the following,

Entry block
|
resume |
| |
Cleanup
|
End

This CFG can potentially be problematic, because the Cleanup block is a shared block and it is not dominated by resume. For instance, if some pass wants to add some specific cleanup mechanism to resume, it can be confused and add them to the shared Cleanup, which leads to the "operand not dominate its use" error because of the existence of the other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

Entry (calling async.coro.suspend)
|
Resume Destroy (duplicate of Cleanup)
| |
Cleanup |
| /
End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be isolated from the other path and it is strictly dominated by "Resume".

Full diff: https://github.com/llvm/llvm-project/pull/66123.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp (+37-9)
  • (modified) mlir/test/Dialect/Async/async-to-async-runtime.mlir (+22-7)
diff --git a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
index e38904724348a9..d1a913409ca366 100644
--- a/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
+++ b/mlir/lib/Dialect/Async/Transforms/AsyncToAsyncRuntime.cpp
@@ -94,8 +94,30 @@ struct CoroMachinery {
   Value coroHandle; // coroutine handle (!async.coro.getHandle value)
   Block *entry;     // coroutine entry block
   std::optional setError; // set returned values to error state
-  Block *cleanup;   // coroutine cleanup block
-  Block *suspend;   // coroutine suspension block
+  Block *cleanup;                  // coroutine cleanup block
+
+  // Coroutine cleanup block for destroy after the coroutine is resumed,
+  //   e.g. async.coro.suspend state, [suspend], [resume], [destroy]
+  //
+  // This cleanup block is a duplicate of the cleanup block followed by the
+  // resume block. The purpose of having a duplicate cleanup block for destroy
+  // is to make the CFG clear so that the control flow analysis won't confuse.
+  //
+  // The overall structure of the lowered CFG can be the following,
+  //
+  //     Entry (calling async.coro.suspend)
+  //       |                \
+  //     Resume           Destroy (duplicate of Cleanup)
+  //       |                 |
+  //     Cleanup             |
+  //       |                 /
+  //      End (ends the corontine)
+  //
+  // If there is resume-specific cleanup logic, it can go into the Cleanup
+  // block but not the destroy block. Otherwise, it can fail block dominance
+  // check.
+  Block *cleanupForDestroy;
+  Block *suspend; // coroutine suspension block
 };
 } // namespace
 
@@ -183,16 +205,21 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   builder.create(originalEntryBlock);
 
   Block *cleanupBlock = func.addBlock();
+  Block *cleanupBlockForDestroy = func.addBlock();
   Block *suspendBlock = func.addBlock();
 
   // ------------------------------------------------------------------------ //
-  // Coroutine cleanup block: deallocate coroutine frame, free the memory.
+  // Coroutine cleanup blocks: deallocate coroutine frame, free the memory.
   // ------------------------------------------------------------------------ //
-  builder.setInsertionPointToStart(cleanupBlock);
-  builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
+  auto buildCleanupBlock = [&](Block *cb) {
+    builder.setInsertionPointToStart(cb);
+    builder.create(coroIdOp.getId(), coroHdlOp.getHandle());
 
-  // Branch into the suspend block.
-  builder.create(suspendBlock);
+    // Branch into the suspend block.
+    builder.create(suspendBlock);
+  };
+  buildCleanupBlock(cleanupBlock);
+  buildCleanupBlock(cleanupBlockForDestroy);
 
   // ------------------------------------------------------------------------ //
   // Coroutine suspend block: mark the end of a coroutine and return allocated
@@ -227,6 +254,7 @@ static CoroMachinery setupCoroMachinery(func::FuncOp func) {
   machinery.entry = entryBlock;
   machinery.setError = std::nullopt; // created lazily only if needed
   machinery.cleanup = cleanupBlock;
+  machinery.cleanupForDestroy = cleanupBlockForDestroy;
   machinery.suspend = suspendBlock;
   return machinery;
 }
@@ -348,7 +376,7 @@ outlineExecuteOp(SymbolTable &symbolTable, ExecuteOp execute) {
 
     // Add async.coro.suspend as a suspended block terminator.
     builder.create(coroSaveOp.getState(), coro.suspend,
-                                  branch.getDest(), coro.cleanup);
+                                  branch.getDest(), coro.cleanupForDestroy);
 
     branch.erase();
   }
@@ -588,7 +616,7 @@ class AwaitOpLoweringBase : public OpConversionPattern {
       // Add async.coro.suspend as a suspended block terminator.
       builder.setInsertionPointToEnd(suspended);
       builder.create(coroSaveOp.getState(), coro.suspend, resume,
-                                    coro.cleanup);
+                                    coro.cleanupForDestroy);
 
       // Split the resume block into error checking and continuation.
       Block *continuation = rewriter.splitBlock(resume, Block::iterator(op));
diff --git a/mlir/test/Dialect/Async/async-to-async-runtime.mlir b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
index 635a86ecdb4bee..36583b2b94a3c9 100644
--- a/mlir/test/Dialect/Async/async-to-async-runtime.mlir
+++ b/mlir/test/Dialect/Async/async-to-async-runtime.mlir
@@ -25,17 +25,22 @@ func.func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
 // CHECK: %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Delete coroutine.
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]
@@ -79,11 +84,15 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // CHECK: ^[[RESUME]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
+
+// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[DESTROY]]:
 
 // Function outlined from the outer async.execute operation.
 // CHECK-LABEL: func private @async_execute_fn_0
@@ -96,7 +105,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // Suspend coroutine in the beginning.
 // CHECK: async.runtime.resume %[[HDL]]
 // CHECK: async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME_0:.*]], ^[[DESTROY_0:.*]]
 
 // Suspend coroutine second time waiting for the completion of inner execute op.
 // CHECK: ^[[RESUME_0]]:
@@ -104,7 +113,7 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK:   %[[SAVED:.*]] = async.coro.save %[[HDL]]
 // CHECK:   async.runtime.await_and_resume %[[INNER_TOKEN]], %[[HDL]]
 // CHECK:   async.coro.suspend %[[SAVED]]
-// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[CLEANUP]]
+// CHECK-SAME: ^[[SUSPEND]], ^[[RESUME_1:.*]], ^[[DESTROY_0]]
 
 // Check the error of the awaited token after resumption.
 // CHECK: ^[[RESUME_1]]:
@@ -115,9 +124,11 @@ func.func @nested_async_execute(%arg0: f32, %arg1: f32, %arg2: memref<1xf32>) {
 // CHECK: ^[[CONTINUATION:.*]]:
 // CHECK:   memref.store
 // CHECK:   async.runtime.set_available %[[TOKEN]]
+// CHECK:   cf.br ^[[CLEANUP_0:.*]]
 
 // CHECK: ^[[SET_ERROR]]:
-// CHECK: ^[[CLEANUP]]:
+// CHECK: ^[[CLEANUP_0]]:
+// CHECK: ^[[DESTROY_0]]:
 // CHECK: ^[[SUSPEND]]:
 
 // -----
@@ -354,7 +365,7 @@ func.func @execute_assertion(%arg0: i1) {
 
 // Initial coroutine suspension.
 // CHECK:      async.coro.suspend
-// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[CLEANUP:.*]]
+// CHECK-SAME: ^[[SUSPEND:.*]], ^[[RESUME:.*]], ^[[DESTROY:.*]]
 
 // Resume coroutine after suspension.
 // CHECK: ^[[RESUME]]:
@@ -363,7 +374,7 @@ func.func @execute_assertion(%arg0: i1) {
 // Set coroutine completion token to available state.
 // CHECK: ^[[SET_AVAILABLE]]:
 // CHECK:   async.runtime.set_available %[[TOKEN]]
-// CHECK:   cf.br ^[[CLEANUP]]
+// CHECK:   cf.br ^[[CLEANUP:.*]]
 
 // Set coroutine completion token to error state.
 // CHECK: ^[[SET_ERROR]]:
@@ -374,6 +385,10 @@ func.func @execute_assertion(%arg0: i1) {
 // CHECK: ^[[CLEANUP]]:
 // CHECK:   async.coro.free %[[ID]], %[[HDL]]
 
+// Delete coroutine.
+// CHECK: ^[[DESTROY]]:
+// CHECK:   async.coro.free %[[ID]], %[[HDL]]
+
 // Suspend coroutine, and also a return statement for ramp function.
 // CHECK: ^[[SUSPEND]]:
 // CHECK:   async.coro.end %[[HDL]]

@ezhulenev ezhulenev merged commit af562fd into llvm:main Sep 12, 2023
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by
`async.coro.suspend`.

The `async.coro.suspend` op branches into 3 blocks, depending on the
state of the coroutine:
1) suspend
2) resume
3) cleanup

The behavior before this change is that after the coroutine is resumed
and completed, it will jump to a shared cleanup block for destroying the
states of coroutines. The CFG looks like the following,

Entry block
        |                    \
   resume             |
        |                    |
            Cleanup
                   |
                End

This CFG can potentially be problematic, because the `Cleanup` block is
a shared block and it is not dominated by `resume`. For instance, if
some pass wants to add some specific cleanup mechanism to resume, it can
be confused and add them to the shared `Cleanup`, which leads to the
"operand not dominate its use" error because of the existence of the
other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

  Entry (calling async.coro.suspend)
       |                    \
  Resume           Destroy (duplicate of Cleanup)
       |                     |
  Cleanup             |
       |                    /
     End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be
isolated from the other path and it is strictly dominated by "Resume".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:async mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants