New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion for Coroutine Theory #1
Comments
Excellent article, thanks! Quick question: could you explain if/how one can customize the way coroutines allocate memory on the heap? May be using an allocator of some sort... Thanks! |
The Coroutines TS provides the ability to customise the allocation strategy by overloading I'll cover this in a bit more detail in my next post. Stay tuned! |
Great article, thanks! In the case, the compiler can verify that the co-routine doesn't outlive the caller (e.g. cases like python generators in loop), does the activation frame of the co-routine remain on the stack? (and the stack part of the co-routine is not destroyed? or maybe some of it is destroyed and some of it is not.) |
@skgbanga Where the nested coroutine-frame is placed in the caller's activation frame depends on the whether or not the caller is a coroutine or a normal function and if it's a coroutine on whether or not the lifetime of the nested coroutine spans a suspend-point in the caller. If the caller is a coroutine and the lifetime of the nested coroutine spans a suspend-point then the compiler is going to have to put it in the coroutine-frame, if it doesn't span a suspend-point then the nested coroutine frame may be placed either in the stack-frame or the coroutine-frame (compiler-dependant). If the caller is a regular function then it must be placed on the stack-frame if allocation is elided. Placing it within the stack-frame in these cases should be safe since the lifetime of the nested coroutine-frame does not outlive the lifetime of the enclosing stack-frame. |
Makes sense. I am looking forward to an example in the series where the allocation is elided. e.g consider this (
I am quite interested to know where does the compiler put the Stage 2 variables on the stack? after the co-routine's activation frame? What if co-routine wants to increase its stack? Let me know if I am not super clear in what I am describing. |
@lewissbaker. Thanks for the post and your cool work, I have 2 questions:
|
@danghvu Great questions!
The size of the coroutine frame is highly dependent on the coroutine body, type of coroutine as well as the compiler and optimisation flags so it's difficult to give an exact answer here. However, we can put a lower-bound on the size of the coroutine frame. The coroutine frame needs to store a number of things:
Clang is currently more effective at trimming down coroutine frame size than MSVC.
Yes, that's right. A coroutine suspends when it The executor resumes the awaiting coroutine by calling |
Thank you for the prompt reply! |
@danghvu The For example, the If you watch @GorNishanov's recent CppCon talk on coroutine interactions with Networking TS he shows how the coroutine handle can be stored in a lambda that is passed into the underlying callback mechanism of the networking TS. |
Great article, clear explanation of the mechanics of stackless coroutines, thanks. From what I understand, if you call a coroutine and give it a pointer parameter to data that 's on the stack of the caller (e.g. coroutine( int *integer_on_call_stack) ) , it's quite possible the coroutine frame will point to invalid data if it's resumed on a different stack frame or thread. I guess you just have to be aware of this ? |
@breathe67 Yes, that's right. If you pass a pointer or reference to a value owned by the caller into a coroutine then you need to be aware that the coroutine will continue to hold a reference to that data after the initial call returns. This can be safe if you make sure you |
I think you should link to this video: https://www.youtube.com/watch?v=_fu0gx-xseY |
Would you consider adding a code example? |
Well done! |
@ioquatix I've been thinking about just making a separate post listing some useful coroutine resources. I'll make sure to include that link in there. @JimViebke I'll try to include more code examples into future posts. In the mean-time, check out some code examples on the cppcoro README: https://github.com/lewissbaker/cppcoro |
Good article. Few notes:
|
|
@crusader-mike However it would be harder to look at a function and realize it suspends rather than executes normally. You might often be looking at normal non-waiting functions in your (or someone else's) codebase and wondering if any of them actually suspend. |
@breathe67 this can be addressed in many ways -- naming conventions ( One more point: |
@crusader-mike Thanks for the feedback and the great questions.
The advantage of making coroutine functions look exactly like ordinary functions from the call-site is that it allows you to change the implementation of that function to delegate creation of the coroutine to some other function without needing to change the signature. eg. // Change from this...
task<> foo()
{
co_await bar();
}
// To this, without any signature change.
// note that foo() is no longer a coroutine, but it's still a function that returns a task<>
task<> foo()
{
auto x = some_non_async_function();
task<> result = make_task<void>(when_all(bar(), baz(x)));
return result;
}
The Coroutines TS already has provisions for allowing you to hook in custom allocators for the coroutine frame which would allow the caller to specify the allocation strategy. The main limitation here is that you cannot know the coroutine frame allocation size at compile time as the frame size can vary based on backend compiler optimisations. This makes it difficult to allocate the memory for the coroutine on the stack, since you don't know how big to make the buffer. The compiler is in a much better position to be able to determine the size of the allocations and also whether the allocations can be elided.
There are also use-cases where I want to call a coroutine but not actually block the caller while calling it (eg. so that I can execute two coroutines concurrently). e.g. See
|
@lewissbaker I spent few days reading coroutine-related material and watching presentations. Wow... complicated topic. Let me summarize my mental model first (please let me know if I am way off the mark somewhere):
I think this should cover (at a high level) most of what I've learned... Now it is clear that there is a lot of depth to this and I am just only scratching the surface -- you probably went through this line of thinking many times long ago. Therefore, I suggest looking at my comments as an attempt to learn, not criticize -- if my notes contain nothing new, please point me out why I am wrong. If you think given idea is interesting -- you are welcome to steal it. Now to original points: points 1-3,5: it seems that coroutine construction has the same semantics as object construction -- why not use the same approach? (after all it is an object) Smth like this:
As you have pointed out for smth like this to work related information (frame size, required arguments, etc) need to be already available at compile time. Maybe another compilation step where coroutines are converted to objects and co_* keywords -- to related boilerplate? Or a new ABI-lvl object (coroutine) that carries it's frame size along with signature -- so that stack allocation (or parameter passed to operator point 4: problem here is that in couroutine Destroy() can be called in multiple non-obvious places -- i.e. it is hard to look at the code and tell if it is going to kill your program or not. Well... It is always hard in case of throwing destructor, but now it will be even harder :-) point 6: Here is what I have in mind:
point 7: since coroutine body gets split into chunks (with addition of constructor) -- it is probably makes sense to forbid (or change semantics) of More notes:
|
I am positively sure this idea is worth consideration... Basically, you write a coroutine, then on translation step (similar to templates) compiler will convert it into object (with associated facilities) and it can be used by the rest of the code in the similar way template instantiations get used.
|
@crusader-mike, re:
as well as @lewissbaker, re:
Are there clear rules for this behavior, or is a bit compiler-specific, or..? I'm running into weird issues where, for reasons I can't seem to nail down, my function parameters will sometimes survive past the first suspend point without issue, but other times require me to copy or move them into a local variable first in order to access them later. |
@dfct what platform are you running on and can you show me the assembly of the transfer function? |
@ioquatix I'm using macOS & Apple's Xcode-bundled version of clang. I've been trying to extract enough code to have a reproducible test case, and can post the code & assembly then. It has proven a bit challenging due to time constraints & Xcode's unintentional lack of support for indexing & debugging coroutine code. Working with what I've got.. edit: Here's an example: https://wandbox.org/permlink/lQFoC2noGQPaZxTj |
@ioquatix I've been playing with this a bit more, and I should note that using the asio executor is /not/ required to reproduce the issue, though it does seem to make it significantly more apparent.. This code without asio reliably fails within a few runs for me: https://wandbox.org/permlink/lz45rYsmcLz5pub9 |
What's the failure? Have you tried using |
@ioquatix In the coroutine that does not explicitly move the function parameter to a local variable before calling co_await, the function parameter is /sometimes/ invalid when the coroutine resumes. In the first example above, which reproduces on wandbox, you can see it print incorrect values e.g. "1946159296" instead of "1" after the co_await. (Similar for the second link, though that one doesn't reproduce well on wandbox.) I'm not familiar with the use of those sanitizers, I'm afraid... RE: the two quotes above of @lewissbaker & @crusader-mike, it seems like I shouldn't need to explicitly move the function parameter to a local variable for it to persist past a co_await call. |
If you are not careful, those registers get clobbered. It will depend on the exact semantics of The first resume loads the registers with the arguments. The |
I would suggest stepping through with the debugger and after resuming the co_await, check the values of the registers. On x64 it's |
Can you disassemble the function so we can see the code inserted by |
Like this? https://godbolt.org/g/cGy2j6 That code reliably prints gibberish for instead of '1' for test_obj.val. Here's the same executing code on wandbox: https://wandbox.org/permlink/XpEmgfWHhcmFoIma
|
Excellent. Can you make a simple example (MVP) that still fails? |
It's interesting that with -O2, it doesn't seem to fail. |
Are you sure? For me when I add -O2 to the last example ( https://wandbox.org/permlink/XpEmgfWHhcmFoIma ), it just goes from printing gibberish to printing 0. But test_obj.val should be (and is before co_await) 1. I'm not sure how boil the code down further; when I removed asio and launched lambdas in new threads instead it stopped reliably failing on wandbox (though it still fails locally). Not sure what else to try, I'm a bit of a novice here.. For what it may be worth, I just tried running that wandbox code ^ on an arm64 iPad via a basic app, and the same issue occurred. Thank you for spending so much of your time investigating this with me, ioquatix! Very much appreciated.. :) |
Oh, my bad, I thought 0 was the correct output, and it was producing it reliably :p I think the next step, without digging into the assembly too much, is to use the sanitisers to check the code at run time. It sounds like something is going wrong with the reference and it's getting blown away. |
I drifted away from coroutine topic in last few months, but aren't you are supposed to wait for your Coro to complete somehow before leaving scope of your lambda (and destroying your objects)?
|
That would make sense - if the Coro goes out of scope, it would be a problem :p |
I've been working on a C library for very low level (stackful) coroutines. If you want something a bit less magic, you could take a look at it: https://github.com/kurocha/coroutine |
It's still a work in progress though. There are other similar libraries, but none that have the same coroutine transfer arguments/return. |
in this case it is Thank you. I'll have a look. |
That's super interesting, sleeping there does seem to fix it. If you've std::move'd the object, though, why would leaving the previous scope matter? Shouldn't the object exist in the coroutine frame, which outlives the scope..? |
because std::move doesn't move anything -- it enables "move" (which is just a fancy copy constructor). Your Coro takes test_obj**&&**, not test_obj |
omg. THANK YOU! I completely misunderstood what std::move and && were doing for me there. Sure enough, removing && from Coro(test_obj&& t) results in the behavior I was expecting. Can't believe this entire saga has boiled down to two misplaced ampersands... |
Hehe, |
|
Yes, you are right. On most architectures the stack grows backwards. |
Hi, can the article explain how the "exception propagation" replace the "resume" behavior? Thanks //eg. |
Hi there!
Thanks! |
Please add comments to this issue for discussion of the article Coroutine Theory.
The text was updated successfully, but these errors were encountered: