|
|
@@ -0,0 +1,1218 @@ |
|
|
===================================== |
|
|
Coroutines in LLVM |
|
|
===================================== |
|
|
|
|
|
.. contents:: |
|
|
:local: |
|
|
:depth: 3 |
|
|
|
|
|
.. warning:: |
|
|
This is a work in progress. Compatibility across LLVM releases is not |
|
|
guaranteed. |
|
|
|
|
|
Introduction |
|
|
============ |
|
|
|
|
|
.. _coroutine handle: |
|
|
|
|
|
LLVM coroutines are functions that have one or more `suspend points`_. |
|
|
When a suspend point is reached, the execution of a coroutine is suspended and |
|
|
control is returned back to its caller. A suspended coroutine can be resumed |
|
|
to continue execution from the last suspend point or it can be destroyed. |
|
|
|
|
|
In the following example, we call function `f` (which may or may not be a |
|
|
coroutine itself) that returns a handle to a suspended coroutine |
|
|
(**coroutine handle**) that is used by `main` to resume the coroutine twice and |
|
|
then destroy it: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
%hdl = call i8* @f(i32 4) |
|
|
call void @llvm.coro.resume(i8* %hdl) |
|
|
call void @llvm.coro.resume(i8* %hdl) |
|
|
call void @llvm.coro.destroy(i8* %hdl) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
.. _coroutine frame: |
|
|
|
|
|
In addition to the function stack frame which exists when a coroutine is |
|
|
executing, there is an additional region of storage that contains objects that |
|
|
keep the coroutine state when a coroutine is suspended. This region of storage |
|
|
is called **coroutine frame**. It is created when a coroutine is called and |
|
|
destroyed when a coroutine runs to completion or destroyed by a call to |
|
|
the `coro.destroy`_ intrinsic. |
|
|
|
|
|
An LLVM coroutine is represented as an LLVM function that has calls to |
|
|
`coroutine intrinsics`_ defining the structure of the coroutine. |
|
|
After lowering, a coroutine is split into several |
|
|
functions that represent three different ways of how control can enter the |
|
|
coroutine: |
|
|
|
|
|
1. a ramp function, which represents an initial invocation of the coroutine that |
|
|
creates the coroutine frame and executes the coroutine code until it |
|
|
encounters a suspend point or reaches the end of the function; |
|
|
|
|
|
2. a coroutine resume function that is invoked when the coroutine is resumed; |
|
|
|
|
|
3. a coroutine destroy function that is invoked when the coroutine is destroyed. |
|
|
|
|
|
.. note:: Splitting out resume and destroy functions are just one of the |
|
|
possible ways of lowering the coroutine. We chose it for initial |
|
|
implementation as it matches closely the mental model and results in |
|
|
reasonably nice code. |
|
|
|
|
|
Coroutines by Example |
|
|
===================== |
|
|
|
|
|
Coroutine Representation |
|
|
------------------------ |
|
|
|
|
|
Let's look at an example of an LLVM coroutine with the behavior sketched |
|
|
by the following pseudo-code. |
|
|
|
|
|
.. code-block:: C++ |
|
|
|
|
|
void *f(int n) { |
|
|
for(;;) { |
|
|
print(n++); |
|
|
<suspend> // returns a coroutine handle on first suspend |
|
|
} |
|
|
} |
|
|
|
|
|
This coroutine calls some function `print` with value `n` as an argument and |
|
|
suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with |
|
|
a `main` shown in the previous section. It will call `print` with values 4, 5 |
|
|
and 6 after which the coroutine will be destroyed. |
|
|
|
|
|
The LLVM IR for this coroutine looks like this: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i8* @f(i32 %n) { |
|
|
entry: |
|
|
%size = call i32 @llvm.coro.size.i32() |
|
|
%alloc = call i8* @malloc(i32 %size) |
|
|
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null) |
|
|
br label %loop |
|
|
loop: |
|
|
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ] |
|
|
%inc = add nsw i32 %n.val, 1 |
|
|
call void @print(i32 %n.val) |
|
|
%0 = call i8 @llvm.coro.suspend(token none, i1 false) |
|
|
switch i8 %0, label %suspend [i8 0, label %loop |
|
|
i8 1, label %cleanup] |
|
|
cleanup: |
|
|
%mem = call i8* @llvm.coro.free(i8* %hdl) |
|
|
call void @free(i8* %mem) |
|
|
br label %suspend |
|
|
suspend: |
|
|
call void @llvm.coro.end(i8* %hdl, i1 false) |
|
|
ret i8* %hdl |
|
|
} |
|
|
|
|
|
The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is |
|
|
lowered to a constant representing the size required for the coroutine frame. |
|
|
The `coro.begin`_ intrinsic initializes the coroutine frame and returns the |
|
|
coroutine handle. The first parameter of `coro.begin` is given a block of memory |
|
|
to be used if the coroutine frame needs to be allocated dynamically. |
|
|
|
|
|
The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic, |
|
|
given the coroutine handle, returns a pointer of the memory block to be freed or |
|
|
`null` if the coroutine frame was not allocated dynamically. The `cleanup` |
|
|
block is entered when coroutine runs to completion by itself or destroyed via |
|
|
call to the `coro.destroy`_ intrinsic. |
|
|
|
|
|
The `suspend` block contains code to be executed when coroutine runs to |
|
|
completion or suspended. The `coro.end`_ intrinsic marks the point where |
|
|
a coroutine needs to return control back to the caller if it is not an initial |
|
|
invocation of the coroutine. |
|
|
|
|
|
The `loop` blocks represents the body of the coroutine. The `coro.suspend`_ |
|
|
intrinsic in combination with the following switch indicates what happens to |
|
|
control flow when a coroutine is suspended (default case), resumed (case 0) or |
|
|
destroyed (case 1). |
|
|
|
|
|
Coroutine Transformation |
|
|
------------------------ |
|
|
|
|
|
One of the steps of coroutine lowering is building the coroutine frame. The |
|
|
def-use chains are analyzed to determine which objects need be kept alive across |
|
|
suspend points. In the coroutine shown in the previous section, use of virtual register |
|
|
`%n.val` is separated from the definition by a suspend point, therefore, it |
|
|
cannot reside on the stack frame since the latter goes away once the coroutine |
|
|
is suspended and control is returned back to the caller. An i32 slot is |
|
|
allocated in the coroutine frame and `%n.val` is spilled and reloaded from that |
|
|
slot as needed. |
|
|
|
|
|
We also store addresses of the resume and destroy functions so that the |
|
|
`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine |
|
|
when its identity cannot be determined statically at compile time. For our |
|
|
example, the coroutine frame will be: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
%f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 } |
|
|
|
|
|
After resume and destroy parts are outlined, function `f` will contain only the |
|
|
code responsible for creation and initialization of the coroutine frame and |
|
|
execution of the coroutine until a suspend point is reached: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i8* @f(i32 %n) { |
|
|
entry: |
|
|
%alloc = call noalias i8* @malloc(i32 24) |
|
|
%0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null) |
|
|
%frame = bitcast i8* %frame to %f.frame* |
|
|
%1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0 |
|
|
store void (%f.frame*)* @f.resume, void (%f.frame*)** %1 |
|
|
%2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1 |
|
|
store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2 |
|
|
|
|
|
%inc = add nsw i32 %n, 1 |
|
|
%inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2 |
|
|
store i32 %inc, i32* %inc.spill.addr |
|
|
call void @print(i32 %n) |
|
|
|
|
|
ret i8* %frame |
|
|
} |
|
|
|
|
|
Outlined resume part of the coroutine will reside in function `f.resume`: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) { |
|
|
entry: |
|
|
%inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2 |
|
|
%inc.spill = load i32, i32* %inc.spill.addr, align 4 |
|
|
%inc = add i32 %n.val, 1 |
|
|
store i32 %inc, i32* %inc.spill.addr, align 4 |
|
|
tail call void @print(i32 %inc) |
|
|
ret void |
|
|
} |
|
|
|
|
|
Whereas function `f.destroy` will contain the cleanup code for the coroutine: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) { |
|
|
entry: |
|
|
%0 = bitcast %f.frame* %frame.ptr.destroy to i8* |
|
|
tail call void @free(i8* %0) |
|
|
ret void |
|
|
} |
|
|
|
|
|
Avoiding Heap Allocations |
|
|
------------------------- |
|
|
|
|
|
A particular coroutine usage pattern, which is illustrated by the `main` |
|
|
function in the overview section, where a coroutine is created, manipulated and |
|
|
destroyed by the same calling function, is common for coroutines implementing |
|
|
RAII idiom and is suitable for allocation elision optimization which avoid |
|
|
dynamic allocation by storing the coroutine frame as a static `alloca` in its |
|
|
caller. |
|
|
|
|
|
If a coroutine uses allocation and deallocation functions that are known to |
|
|
LLVM, unused calls to `malloc` and calls to `free` with `null` argument will be |
|
|
removed as dead code. However, if custom allocation functions are used, the |
|
|
`coro.alloc` and `coro.free` intrinsics can be used to enable removal of custom |
|
|
allocation and deallocation code when coroutine does not require dynamic |
|
|
allocation of the coroutine frame. |
|
|
|
|
|
In the entry block, we will call `coro.alloc`_ intrinsic that will return `null` |
|
|
when dynamic allocation is required, and non-null otherwise: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
entry: |
|
|
%elide = call i8* @llvm.coro.alloc() |
|
|
%need.dyn.alloc = icmp ne i8* %elide, null |
|
|
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc |
|
|
dyn.alloc: |
|
|
%size = call i32 @llvm.coro.size.i32() |
|
|
%alloc = call i8* @CustomAlloc(i32 %size) |
|
|
br label %coro.begin |
|
|
coro.begin: |
|
|
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] |
|
|
%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null) |
|
|
|
|
|
In the cleanup block, we will make freeing the coroutine frame conditional on |
|
|
`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null` |
|
|
thus skipping the deallocation code: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
cleanup: |
|
|
%mem = call i8* @llvm.coro.free(i8* %hdl) |
|
|
%need.dyn.free = icmp ne i8* %mem, null |
|
|
br i1 %need.dyn.free, label %dyn.free, label %if.end |
|
|
dyn.free: |
|
|
call void @CustomFree(i8* %mem) |
|
|
br label %if.end |
|
|
if.end: |
|
|
... |
|
|
|
|
|
With allocations and deallocations represented as described as above, after |
|
|
coroutine heap allocation elision optimization, the resulting main will end up |
|
|
looking just like it was when we used `malloc` and `free`: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
call void @print(i32 4) |
|
|
call void @print(i32 5) |
|
|
call void @print(i32 6) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
Multiple Suspend Points |
|
|
----------------------- |
|
|
|
|
|
Let's consider the coroutine that has more than one suspend point: |
|
|
|
|
|
.. code-block:: C++ |
|
|
|
|
|
void *f(int n) { |
|
|
for(;;) { |
|
|
print(n++); |
|
|
<suspend> |
|
|
print(-n); |
|
|
<suspend> |
|
|
} |
|
|
} |
|
|
|
|
|
Matching LLVM code would look like (with the rest of the code remaining the same |
|
|
as the code in the previous section): |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
loop: |
|
|
%n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ] |
|
|
call void @print(i32 %n.addr) #4 |
|
|
%2 = call i8 @llvm.coro.suspend(token none, i1 false) |
|
|
switch i8 %2, label %suspend [i8 0, label %loop.resume |
|
|
i8 1, label %cleanup] |
|
|
loop.resume: |
|
|
%inc = add nsw i32 %n.addr, 1 |
|
|
%sub = xor i32 %n.addr, -1 |
|
|
call void @print(i32 %sub) |
|
|
%3 = call i8 @llvm.coro.suspend(token none, i1 false) |
|
|
switch i8 %3, label %suspend [i8 0, label %loop |
|
|
i8 1, label %cleanup] |
|
|
|
|
|
In this case, the coroutine frame would include a suspend index that will |
|
|
indicate at which suspend point the coroutine needs to resume. The resume |
|
|
function will use an index to jump to an appropriate basic block and will look |
|
|
as follows: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define internal fastcc void @f.Resume(%f.Frame* %FramePtr) { |
|
|
entry.Resume: |
|
|
%index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2 |
|
|
%index = load i8, i8* %index.addr, align 1 |
|
|
%switch = icmp eq i8 %index, 0 |
|
|
%n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3 |
|
|
%n = load i32, i32* %n.addr, align 4 |
|
|
br i1 %switch, label %loop.resume, label %loop |
|
|
|
|
|
loop.resume: |
|
|
%sub = xor i32 %n, -1 |
|
|
call void @print(i32 %sub) |
|
|
br label %suspend |
|
|
loop: |
|
|
%inc = add nsw i32 %n, 1 |
|
|
store i32 %inc, i32* %n.addr, align 4 |
|
|
tail call void @print(i32 %inc) |
|
|
br label %suspend |
|
|
|
|
|
suspend: |
|
|
%storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ] |
|
|
store i8 %storemerge, i8* %index.addr, align 1 |
|
|
ret void |
|
|
} |
|
|
|
|
|
If different cleanup code needs to get executed for different suspend points, |
|
|
a similar switch will be in the `f.destroy` function. |
|
|
|
|
|
.. note :: |
|
|
|
|
|
Using suspend index in a coroutine state and having a switch in `f.resume` and |
|
|
`f.destroy` is one of the possible implementation strategies. We explored |
|
|
another option where a distinct `f.resume1`, `f.resume2`, etc. are created for |
|
|
every suspend point, and instead of storing an index, the resume and destroy |
|
|
function pointers are updated at every suspend. Early testing showed that the |
|
|
current approach is easier on the optimizer than the latter so it is a |
|
|
lowering strategy implemented at the moment. |
|
|
|
|
|
Distinct Save and Suspend |
|
|
------------------------- |
|
|
|
|
|
In the previous example, setting a resume index (or some other state change that |
|
|
needs to happen to prepare a coroutine for resumption) happens at the same time as |
|
|
a suspension of a coroutine. However, in certain cases, it is necessary to control |
|
|
when coroutine is prepared for resumption and when it is suspended. |
|
|
|
|
|
In the following example, a coroutine represents some activity that is driven |
|
|
by completions of asynchronous operations `async_op1` and `async_op2` which get |
|
|
a coroutine handle as a parameter and resume the coroutine once async |
|
|
operation is finished. |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
void g() { |
|
|
for (;;) |
|
|
if (cond()) { |
|
|
async_op1(<coroutine-handle>); // will resume once async_op1 completes |
|
|
<suspend> |
|
|
do_one(); |
|
|
} |
|
|
else { |
|
|
async_op2(<coroutine-handle>); // will resume once async_op2 completes |
|
|
<suspend> |
|
|
do_two(); |
|
|
} |
|
|
} |
|
|
} |
|
|
|
|
|
In this case, coroutine should be ready for resumption prior to a call to |
|
|
`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a |
|
|
point when coroutine should be ready for resumption (namely, when a resume index |
|
|
should be stored in the coroutine frame, so that it can be resumed at the |
|
|
correct resume point): |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
if.true: |
|
|
%save1 = call token @llvm.coro.save(i8* %hdl) |
|
|
call void async_op1(i8* %hdl) |
|
|
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false) |
|
|
switch i8 %suspend1, label %suspend [i8 0, label %resume1 |
|
|
i8 1, label %cleanup] |
|
|
if.false: |
|
|
%save2 = call token @llvm.coro.save(i8* %hdl) |
|
|
call void async_op2(i8* %hdl) |
|
|
%suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false) |
|
|
switch i8 %suspend1, label %suspend [i8 0, label %resume2 |
|
|
i8 1, label %cleanup] |
|
|
|
|
|
.. _coroutine promise: |
|
|
|
|
|
Coroutine Promise |
|
|
----------------- |
|
|
|
|
|
A coroutine author or a frontend may designate a distinguished `alloca` that can |
|
|
be used to communicate with the coroutine. This distinguished alloca is called |
|
|
**coroutine promise** and is provided as a third parameter to the `coro.begin`_ |
|
|
intrinsic. |
|
|
|
|
|
The following coroutine designates a 32 bit integer `promise` and uses it to |
|
|
store the current value produced by a coroutine. |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i8* @f(i32 %n) { |
|
|
entry: |
|
|
%promise = alloca i32 |
|
|
%pv = bitcast i32* %promise to i8* |
|
|
%size = call i32 @llvm.coro.size.i32() |
|
|
%alloc = call i8* @malloc(i32 %size) |
|
|
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null) |
|
|
br label %loop |
|
|
loop: |
|
|
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ] |
|
|
%inc = add nsw i32 %n.val, 1 |
|
|
store i32 %n.val, i32* %promise |
|
|
%0 = call i8 @llvm.coro.suspend(token none, i1 false) |
|
|
switch i8 %0, label %suspend [i8 0, label %loop |
|
|
i8 1, label %cleanup] |
|
|
cleanup: |
|
|
%mem = call i8* @llvm.coro.free(i8* %hdl) |
|
|
call void @free(i8* %mem) |
|
|
br label %suspend |
|
|
suspend: |
|
|
call void @llvm.coro.end(i8* %hdl, i1 false) |
|
|
ret i8* %hdl |
|
|
} |
|
|
|
|
|
A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the |
|
|
coroutine promise. |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
%hdl = call i8* @f(i32 4) |
|
|
%promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false) |
|
|
%promise.addr = bitcast i8* %promise.addr.raw to i32* |
|
|
%val0 = load i32, i32* %promise.addr |
|
|
call void @print(i32 %val0) |
|
|
call void @llvm.coro.resume(i8* %hdl) |
|
|
%val1 = load i32, i32* %promise.addr |
|
|
call void @print(i32 %val1) |
|
|
call void @llvm.coro.resume(i8* %hdl) |
|
|
%val2 = load i32, i32* %promise.addr |
|
|
call void @print(i32 %val2) |
|
|
call void @llvm.coro.destroy(i8* %hdl) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
After example in this section is compiled, result of the compilation will |
|
|
exactly like the result of the very first example: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
tail call void @print(i32 4) |
|
|
tail call void @print(i32 5) |
|
|
tail call void @print(i32 6) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
.. _final: |
|
|
.. _final suspend: |
|
|
|
|
|
Final Suspend |
|
|
------------- |
|
|
|
|
|
A coroutine author or a frontend may designate a particular suspend to be final, |
|
|
by setting the second argument of the `coro.suspend`_ intrinsic to `true`. |
|
|
Such a suspend point has two properties: |
|
|
|
|
|
* it is possible to check whether a suspended coroutine is at the final suspend |
|
|
point via `coro.done`_ intrinsic; |
|
|
|
|
|
* a resumption of a coroutine stopped at the final suspend point leads to |
|
|
undefined behavior. The only possible action for a coroutine at a final |
|
|
suspend point is destroying it via `coro.destroy`_ intrinsic. |
|
|
|
|
|
From the user perspective, the final suspend point represents an idea of a |
|
|
coroutine reaching the end. From the compiler perspective, it is an optimization |
|
|
opportunity for reducing number of resume points (and therefore switch cases) in |
|
|
the resume function. |
|
|
|
|
|
The following is an example of a function that keeps resuming the coroutine |
|
|
until the final suspend point is reached after which point the coroutine is |
|
|
destroyed: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
%hdl = call i8* @f(i32 4) |
|
|
br label %while |
|
|
while: |
|
|
call void @llvm.coro.resume(i8* %hdl) |
|
|
%done = call i1 @llvm.coro.done(i8* %hdl) |
|
|
br i1 %done, label %end, label %while |
|
|
end: |
|
|
call void @llvm.coro.destroy(i8* %hdl) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
Usually, final suspend point is a frontend injected suspend point that does not |
|
|
correspond to any explicitly authored suspend point of the high level language. |
|
|
For example, for a Python generator that has only one suspend point: |
|
|
|
|
|
.. code-block:: python |
|
|
|
|
|
def coroutine(n): |
|
|
for i in range(n): |
|
|
yield i |
|
|
|
|
|
Python frontend would inject two more suspend points, so that the actual code |
|
|
looks like this: |
|
|
|
|
|
.. code-block:: C |
|
|
|
|
|
void* coroutine(int n) { |
|
|
int current_value; |
|
|
<designate current_value to be coroutine promise> |
|
|
<SUSPEND> // injected suspend point, so that the coroutine starts suspended |
|
|
for (int i = 0; i < n; ++i) { |
|
|
current_value = i; <SUSPEND>; // corresponds to "yield i" |
|
|
} |
|
|
<SUSPEND final=true> // injected final suspend point |
|
|
} |
|
|
|
|
|
and python iterator `__next__` would look like: |
|
|
|
|
|
.. code-block:: C++ |
|
|
|
|
|
int __next__(void* hdl) { |
|
|
coro.resume(hdl); |
|
|
if (coro.done(hdl)) throw StopIteration(); |
|
|
return *(int*)coro.promise(hdl, 4, false); |
|
|
} |
|
|
|
|
|
Intrinsics |
|
|
========== |
|
|
|
|
|
Coroutine Manipulation Intrinsics |
|
|
--------------------------------- |
|
|
|
|
|
Intrinsics described in this section are used to manipulate an existing |
|
|
coroutine. They can be used in any function which happen to have a pointer |
|
|
to a `coroutine frame`_ or a pointer to a `coroutine promise`_. |
|
|
|
|
|
.. _coro.destroy: |
|
|
|
|
|
'llvm.coro.destroy' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
|
|
Syntax: |
|
|
""""""" |
|
|
|
|
|
:: |
|
|
|
|
|
declare void @llvm.coro.destroy(i8* <handle>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.destroy``' intrinsic destroys a suspended |
|
|
coroutine. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The argument is a coroutine handle to a suspended coroutine. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
When possible, the `coro.destroy` intrinsic is replaced with a direct call to |
|
|
the coroutine destroy function. Otherwise it is replaced with an indirect call |
|
|
based on the function pointer for the destroy function stored in the coroutine |
|
|
frame. Destroying a coroutine that is not suspended leads to undefined behavior. |
|
|
|
|
|
.. _coro.resume: |
|
|
|
|
|
'llvm.coro.resume' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
|
|
:: |
|
|
|
|
|
declare void @llvm.coro.resume(i8* <handle>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The argument is a handle to a suspended coroutine. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
When possible, the `coro.resume` intrinsic is replaced with a direct call to the |
|
|
coroutine resume function. Otherwise it is replaced with an indirect call based |
|
|
on the function pointer for the resume function stored in the coroutine frame. |
|
|
Resuming a coroutine that is not suspended leads to undefined behavior. |
|
|
|
|
|
.. _coro.done: |
|
|
|
|
|
'llvm.coro.done' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
|
|
:: |
|
|
|
|
|
declare i1 @llvm.coro.done(i8* <handle>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at |
|
|
the final suspend point or not. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The argument is a handle to a suspended coroutine. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
Using this intrinsic on a coroutine that does not have a `final suspend`_ point |
|
|
or on a coroutine that is not suspended leads to undefined behavior. |
|
|
|
|
|
.. _coro.promise: |
|
|
|
|
|
'llvm.coro.promise' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
|
|
|
:: |
|
|
|
|
|
declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.promise``' intrinsic obtains a pointer to a |
|
|
`coroutine promise`_ given a coroutine handle and vice versa. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument is a handle to a coroutine if `from` is false. Otherwise, |
|
|
it is a pointer to a coroutine promise. |
|
|
|
|
|
The second argument is an alignment requirements of the promise. |
|
|
If a frontend designated `%promise = alloca i32` as a promise, the alignment |
|
|
argument to `coro.promise` should be the alignment of `i32` on the target |
|
|
platform. If a frontend designated `%promise = alloca i32, align 16` as a |
|
|
promise, the alignment argument should be 16. |
|
|
This argument only accepts constants. |
|
|
|
|
|
The third argument is a boolean indicating a direction of the transformation. |
|
|
If `from` is true, the intrinsic returns a coroutine handle given a pointer |
|
|
to a promise. If `from` is false, the intrinsics return a pointer to a promise |
|
|
from a coroutine handle. This argument only accepts constants. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
Using this intrinsic on a coroutine that does not have a coroutine promise |
|
|
leads to undefined behavior. It is possible to read and modify coroutine |
|
|
promise of the coroutine which is currently executing. The coroutine author and |
|
|
a coroutine user are responsible to makes sure there is no data races. |
|
|
|
|
|
Example: |
|
|
"""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
define i8* @f(i32 %n) { |
|
|
entry: |
|
|
%promise = alloca i32 |
|
|
%pv = bitcast i32* %promise to i8* |
|
|
... |
|
|
; the third argument to coro.begin points to the coroutine promise. |
|
|
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null) |
|
|
... |
|
|
store i32 42, i32* %promise ; store something into the promise |
|
|
... |
|
|
ret i8* %hdl |
|
|
} |
|
|
|
|
|
define i32 @main() { |
|
|
entry: |
|
|
%hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle |
|
|
%promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false) |
|
|
%promise.addr = bitcast i8* %promise.addr.raw to i32* |
|
|
%val = load i32, i32* %promise.addr ; load a value from the promise |
|
|
call void @print(i32 %val) |
|
|
call void @llvm.coro.destroy(i8* %hdl) |
|
|
ret i32 0 |
|
|
} |
|
|
|
|
|
.. _coroutine intrinsics: |
|
|
|
|
|
Coroutine Structure Intrinsics |
|
|
------------------------------ |
|
|
Intrinsics described in this section are used within a coroutine to describe |
|
|
the coroutine structure. They should not be used outside of a coroutine. |
|
|
|
|
|
.. _coro.size: |
|
|
|
|
|
'llvm.coro.size' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i32 @llvm.coro.size.i32() |
|
|
declare i64 @llvm.coro.size.i64() |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.size``' intrinsic returns the number of bytes |
|
|
required to store a `coroutine frame`_. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
None |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
The `coro.size` intrinsic is lowered to a constant representing the size of |
|
|
the coroutine frame. |
|
|
|
|
|
.. _coro.begin: |
|
|
|
|
|
'llvm.coro.begin' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.begin``' intrinsic returns an address of the |
|
|
coroutine frame. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument is a pointer to a block of memory in which coroutine frame |
|
|
may use if memory for the coroutine frame needs to be allocated dynamically. |
|
|
|
|
|
The second argument provides information on the alignment of the memory returned |
|
|
by the allocation function and given to `coro.begin` by the first argument. If |
|
|
this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*). |
|
|
This argument only accepts constants. |
|
|
|
|
|
The third argument, if not `null`, designates a particular alloca instruction to |
|
|
be a `coroutine promise`_. |
|
|
|
|
|
The fourth argument is `null` before coroutine is split, and later is replaced |
|
|
to point to a private global constant array containing function pointers to |
|
|
outlined resume and destroy parts of the coroutine. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
Depending on the alignment requirements of the objects in the coroutine frame |
|
|
and/or on the codegen compactness reasons the pointer returned from `coro.begin` |
|
|
may be at offset to the `%mem` argument. (This could be beneficial if |
|
|
instructions that express relative access to data can be more compactly encoded |
|
|
with small positive and negative offsets). |
|
|
|
|
|
Frontend should emit exactly one `coro.begin` intrinsic per coroutine. |
|
|
|
|
|
.. _coro.free: |
|
|
|
|
|
'llvm.coro.free' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i8* @llvm.coro.free(i8* <frame>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where |
|
|
coroutine frame is stored or `null` if this instance of a coroutine did not use |
|
|
dynamically allocated memory for its coroutine frame. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
A pointer to the coroutine frame. This should be the same pointer that was |
|
|
returned by prior `coro.begin` call. |
|
|
|
|
|
Example (custom deallocation function): |
|
|
""""""""""""""""""""""""""""""""""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
cleanup: |
|
|
%mem = call i8* @llvm.coro.free(i8* %frame) |
|
|
%mem_not_null = icmp ne i8* %mem, null |
|
|
br i1 %mem_not_null, label %if.then, label %if.end |
|
|
if.then: |
|
|
call void @CustomFree(i8* %mem) |
|
|
br label %if.end |
|
|
if.end: |
|
|
ret void |
|
|
|
|
|
Example (standard deallocation functions): |
|
|
"""""""""""""""""""""""""""""""""""""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
cleanup: |
|
|
%mem = call i8* @llvm.coro.free(i8* %frame) |
|
|
call void @free(i8* %mem) |
|
|
ret void |
|
|
|
|
|
.. _coro.alloc: |
|
|
|
|
|
'llvm.coro.alloc' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i8* @llvm.coro.alloc() |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.alloc``' intrinsic returns an address of the memory on the |
|
|
callers frame where coroutine frame of this coroutine can be placed or `null` |
|
|
otherwise. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
None |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
If the coroutine is eligible for heap elision, this intrinsic is lowered to an |
|
|
alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`. |
|
|
This intrinsic only needs to be used if a custom allocation function is used |
|
|
(i.e. a function not recognized by LLVM as a memory allocation function) and the |
|
|
language rules allow for custom allocation / deallocation to be elided when not |
|
|
needed. |
|
|
|
|
|
Example: |
|
|
"""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
entry: |
|
|
%elide = call i8* @llvm.coro.alloc() |
|
|
%0 = icmp ne i8* %elide, null |
|
|
br i1 %0, label %coro.begin, label %coro.alloc |
|
|
|
|
|
coro.alloc: |
|
|
%frame.size = call i32 @llvm.coro.size() |
|
|
%alloc = call i8* @MyAlloc(i32 %frame.size) |
|
|
br label %coro.begin |
|
|
|
|
|
coro.begin: |
|
|
%phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ] |
|
|
%frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null) |
|
|
|
|
|
.. _coro.frame: |
|
|
|
|
|
'llvm.coro.frame' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i8* @llvm.coro.frame() |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of |
|
|
the enclosing coroutine. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
None |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is |
|
|
a frontend convenience intrinsic that makes it easier to refer to the |
|
|
coroutine frame. |
|
|
|
|
|
.. _coro.end: |
|
|
|
|
|
'llvm.coro.end' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare void @llvm.coro.end(i8* <handle>, i1 <unwind>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.end``' marks the point where execution of the resume part of |
|
|
the coroutine should end and control returns back to the caller. |
|
|
|
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument should refer to the coroutine handle of the enclosing coroutine. |
|
|
|
|
|
The second argument should be `true` if this coro.end is in the block that is |
|
|
part of the unwind sequence leaving the coroutine body due to exception prior to |
|
|
the first reaching any suspend points, and `false` otherwise. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
The `coro.end`_ intrinsic is a no-op during an initial invocation of the |
|
|
coroutine. When the coroutine resumes, the intrinsic marks the point when |
|
|
coroutine need to return control back to the caller. |
|
|
|
|
|
This intrinsic is removed by the CoroSplit pass when a coroutine is split into |
|
|
the start, resume and destroy parts. In start part, the intrinsic is removed, |
|
|
in resume and destroy parts, it is replaced with `ret void` instructions and |
|
|
the rest of the block containing `coro.end` instruction is discarded. |
|
|
|
|
|
In landing pads it is replaced with an appropriate instruction to unwind to |
|
|
caller. |
|
|
|
|
|
A frontend is allowed to supply null as the first parameter, in this case |
|
|
`coro-early` pass will replace the null with an appropriate coroutine handle |
|
|
value. |
|
|
|
|
|
.. _coro.suspend: |
|
|
.. _suspend points: |
|
|
|
|
|
'llvm.coro.suspend' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i8 @llvm.coro.suspend(token <save>, i1 <final>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.suspend``' marks the point where execution of the coroutine |
|
|
need to get suspended and control returned back to the caller. |
|
|
Conditional branches consuming the result of this intrinsic lead to basic blocks |
|
|
where coroutine should proceed when suspended (-1), resumed (0) or destroyed |
|
|
(1). |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument refers to a token of `coro.save` intrinsic that marks the |
|
|
point when coroutine state is prepared for suspension. If `none` token is passed, |
|
|
the intrinsic behaves as if there were a `coro.save` immediately preceding |
|
|
the `coro.suspend` intrinsic. |
|
|
|
|
|
The second argument indicates whether this suspension point is `final`_. |
|
|
The second argument only accepts constants. If more than one suspend point is |
|
|
designated as final, the resume and destroy branches should lead to the same |
|
|
basic blocks. |
|
|
|
|
|
Example (normal suspend point): |
|
|
""""""""""""""""""""""""""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
%0 = call i8 @llvm.coro.suspend(token none, i1 false) |
|
|
switch i8 %0, label %suspend [i8 0, label %resume |
|
|
i8 1, label %cleanup] |
|
|
|
|
|
Example (final suspend point): |
|
|
"""""""""""""""""""""""""""""" |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
while.end: |
|
|
%s.final = call i8 @llvm.coro.suspend(token none, i1 true) |
|
|
switch i8 %s.final, label %suspend [i8 0, label %trap |
|
|
i8 1, label %cleanup] |
|
|
trap: |
|
|
call void @llvm.trap() |
|
|
unreachable |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
If a coroutine that was suspended at the suspend point marked by this intrinsic |
|
|
is resumed via `coro.resume`_ the control will transfer to the basic block |
|
|
of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the |
|
|
basic block indicated by the 1-case. To suspend, coroutine proceed to the |
|
|
default label. |
|
|
|
|
|
If suspend intrinsic is marked as final, it can consider the `true` branch |
|
|
unreachable and can perform optimizations that can take advantage of that fact. |
|
|
|
|
|
.. _coro.save: |
|
|
|
|
|
'llvm.coro.save' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare token @llvm.coro.save(i8* <handle>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.save``' marks the point where a coroutine need to update its |
|
|
state to prepare for resumption to be considered suspended (and thus eligible |
|
|
for resumption). |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument points to a coroutine handle of the enclosing coroutine. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
Whatever coroutine state changes are required to enable resumption of |
|
|
the coroutine from the corresponding suspend point should be done at the point |
|
|
of `coro.save` intrinsic. |
|
|
|
|
|
Example: |
|
|
"""""""" |
|
|
|
|
|
Separate save and suspend points are necessary when a coroutine is used to |
|
|
represent an asynchronous control flow driven by callbacks representing |
|
|
completions of asynchronous operations. |
|
|
|
|
|
In such a case, a coroutine should be ready for resumption prior to a call to |
|
|
`async_op` function that may trigger resumption of a coroutine from the same or |
|
|
a different thread possibly prior to `async_op` call returning control back |
|
|
to the coroutine: |
|
|
|
|
|
.. code-block:: llvm |
|
|
|
|
|
%save1 = call token @llvm.coro.save(i8* %hdl) |
|
|
call void async_op1(i8* %hdl) |
|
|
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false) |
|
|
switch i8 %suspend1, label %suspend [i8 0, label %resume1 |
|
|
i8 1, label %cleanup] |
|
|
|
|
|
.. _coro.param: |
|
|
|
|
|
'llvm.coro.param' Intrinsic |
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
:: |
|
|
|
|
|
declare i1 @llvm.coro.param(i8* <original>, i8* <copy>) |
|
|
|
|
|
Overview: |
|
|
""""""""" |
|
|
|
|
|
The '``llvm.coro.param``' is used by the frontend to mark up the code used to |
|
|
construct and destruct copies of the parameters. If the optimizer discovers that |
|
|
a particular parameter copy is not used after any suspends, it can remove the |
|
|
construction and destruction of the copy by replacing corresponding coro.param |
|
|
with `i1 false` and replacing any use of the `copy` with the `original`. |
|
|
|
|
|
Arguments: |
|
|
"""""""""" |
|
|
|
|
|
The first argument points to an `alloca` storing the value of a parameter to a |
|
|
coroutine. |
|
|
|
|
|
The second argument points to an `alloca` storing the value of the copy of that |
|
|
parameter. |
|
|
|
|
|
Semantics: |
|
|
"""""""""" |
|
|
|
|
|
The optimizer is free to always replace this intrinsic with `i1 true`. |
|
|
|
|
|
The optimizer is also allowed to replace it with `i1 false` provided that the |
|
|
parameter copy is only used prior to control flow reaching any of the suspend |
|
|
points. The code that would be DCE'd if the `coro.param` is replaced with |
|
|
`i1 false` is not considered to be a use of the parameter copy. |
|
|
|
|
|
The frontend can emit this intrinsic if its language rules allow for this |
|
|
optimization. |
|
|
|
|
|
Example: |
|
|
"""""""" |
|
|
Consider the following example. A coroutine takes two parameters `a` and `b` |
|
|
that has a destructor and a move constructor. |
|
|
|
|
|
.. code-block:: C++ |
|
|
|
|
|
struct A { ~A(); A(A&&); bool foo(); void bar(); }; |
|
|
|
|
|
task<int> f(A a, A b) { |
|
|
if (a.foo()) |
|
|
return 42; |
|
|
|
|
|
a.bar(); |
|
|
co_await read_async(); // introduces suspend point |
|
|
b.bar(); |
|
|
} |
|
|
|
|
|
Note that, uses of `b` is used after a suspend point and thus must be copied |
|
|
into a coroutine frame, whereas `a` does not have to, since it never used |
|
|
after suspend. |
|
|
|
|
|
A frontend can create parameter copies for `a` and `b` as follows: |
|
|
|
|
|
.. code-block:: C++ |
|
|
|
|
|
task<int> f(A a', A b') { |
|
|
a = alloca A; |
|
|
b = alloca A; |
|
|
// move parameters to its copies |
|
|
if (coro.param(a', a)) A::A(a, A&& a'); |
|
|
if (coro.param(b', b)) A::A(b, A&& b'); |
|
|
... |
|
|
// destroy parameters copies |
|
|
if (coro.param(a', a)) A::~A(a); |
|
|
if (coro.param(b', b)) A::~A(b); |
|
|
} |
|
|
|
|
|
The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses |
|
|
of `a` with `a'`, since it is not used after suspend. |
|
|
|
|
|
The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used |
|
|
after suspend and therefore, it has to reside in the coroutine frame. |
|
|
|
|
|
Coroutine Transformation Passes |
|
|
=============================== |
|
|
CoroEarly |
|
|
--------- |
|
|
The pass CoroEarly lowers coroutine intrinsics that hide the details of the |
|
|
structure of the coroutine frame, but, otherwise not needed to be preserved to |
|
|
help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_, |
|
|
and `coro.promise`_ intrinsics. |
|
|
|
|
|
.. _CoroSplit: |
|
|
|
|
|
CoroSplit |
|
|
--------- |
|
|
The pass CoroSplit buides coroutine frame and outlines resume and destroy parts |
|
|
into separate functions. |
|
|
|
|
|
CoroElide |
|
|
--------- |
|
|
The pass CoroElide examines if the inlined coroutine is eligible for heap |
|
|
allocation elision optimization. If so, it replaces `coro.alloc` and |
|
|
`coro.begin` intrinsic with an address of a coroutine frame placed on its caller |
|
|
and replaces `coro.free` intrinsics with `null` to remove the deallocation code. |
|
|
This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct |
|
|
calls to resume and destroy functions for a particular coroutine where possible. |
|
|
|
|
|
CoroCleanup |
|
|
----------- |
|
|
This pass runs late to lower all coroutine related intrinsics not replaced by |
|
|
earlier passes. |
|
|
|
|
|
Upstreaming sequence (rough plan) |
|
|
================================= |
|
|
#. Add documentation. <= we are here |
|
|
#. Add coroutine intrinsics. |
|
|
#. Add empty coroutine passes. |
|
|
#. Add coroutine devirtualization + tests. |
|
|
#. Add CGSCC restart trigger + tests. |
|
|
#. Add coroutine heap elision + tests. |
|
|
#. Add custom allocation heap elision + tests. |
|
|
#. Add coroutine splitting logic + tests. |
|
|
#. Add simple coroutine frame builder + tests. |
|
|
#. Add the rest of the logic + tests. (Maybe split further as needed). |
|
|
|
|
|
Areas Requiring Attention |
|
|
========================= |
|
|
#. A coroutine frame is bigger than it could be. Adding stack packing and stack |
|
|
coloring like optimization on the coroutine frame will result in tighter |
|
|
coroutine frames. |
|
|
|
|
|
#. Take advantage of the lifetime intrinsics for the data that goes into the |
|
|
coroutine frame. Leave lifetime intrinsics as is for the data that stays in |
|
|
allocas. |
|
|
|
|
|
#. The CoroElide optimization pass relies on coroutine ramp function to be |
|
|
inlined. It would be beneficial to split the ramp function further to |
|
|
increase the chance that it will get inlined into its caller. |
|
|
|
|
|
#. Design a convention that would make it possible to apply coroutine heap |
|
|
elision optimization across ABI boundaries. |
|
|
|
|
|
#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows). |
|
|
|
|
|
#. Alignment is ignored by coro.begin and coro.free intrinsics. |
|
|
|
|
|
#. Make required changes to make sure that coroutine optimizations work with |
|
|
LTO. |
|
|
|
|
|
#. More tests, more tests, more tests |