-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
status quo: Barbara carefully dismisses embedded Future
#85
Conversation
…t zero-cost after all.
src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md
Outdated
Show resolved
Hide resolved
…native character. I two statements that disagreed with each other: 1. I could have used Grace instead of Barbara. 2. Grace may have handled the situation differently. I do think that Barbara's experience with Rust affected how the story played out, so I removed the note stating that I could have used Grace instead. I think this is a bit more accurate, and removes the apparent contradiction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really insightful - thank you @jrvanwhy! There are some things that can be expanded upon to make this right for merging, but this is well on its way!
/// Passes `buffer` to the kernel, and prints it to the console. Returns a | ||
/// future that returns `buffer` when the print is complete. The caller must | ||
/// call kernel_ready_for_callbacks() when it is ready for the future to return. | ||
fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason Barbara did not use async/await notation relevant for this story?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is. My understanding is that async/await notation can only block on other futures, and that at the leaf nodes where Rust's async APIs meet an external async API you need handwritten futures. In this case, this driver was interacting directly with the kernel.
I added a sentence to the above paragraph ("Barbara tries to implement... skeleton") explaining this.
is two words in size. It points to a | ||
[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html) | ||
that is provided by the executor. Considering that this code is designed to run | ||
without `alloc`, `RawWakerVTable` seems awfully wasteful. `drop` is a no-op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to talk about the impact of this waste. RawWakerVTable
is only 4 words big. Probably many reading this story won't understand why 4 words is considered wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(chiming in from Tock here, so ignore if not useful) @jrvanwhy: did you mean that the stable itself (which is 4 words in flash) is wasteful or the 2 words in memory---seems wasteful because without alloc we actually could know statically exactly what concrete type it would be, and also because we need these 2 words for each instance of async, rather than once for any program that uses futures?
Maybe both?
My interpretation had been the 2 words in memory, but @rylev's comment highlighted to me that I actually am not sure which this is referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a single copy of RawWakerVTable
in flash, that's not bad. If there's 1 copy in RAM for each executor instance, that's pretty significant.
When I wrote the executor, I tried to avoid dynamic dispatch so the compiler could perform inlining. This meant the executor depended on the future's type, like:
pub struct Task<F: Future> {
future: Option<F>,
}
Because of this, I needed the RawWakerVTable to be specific to the type of Future. I implemented this by putting it directly into the executor, i.e.:
pub struct Task<F: Future> {
future: Option<F>,
vtable: RawWakerVTable,
}
As a result, I had a RawWakerVTable
instance in RAM for each Task
.
Because future
needs to be mutable, the Task
gets placed in .data
in RAM. As a result, the vtable ends up in RAM, and there's one per Task
.
In retrospect, I could have broken the executor into two parts:
// Instantiated by client code, lives in flash.
pub struct TaskVTable<F: Future> {
vtable: RawWakerVTable
}
// Instantiated by client code, lives in RAM
pub struct Task<F: future> {
future: Option<F>,
vtable: &'static TaskVTable<F>,
}
This would reduce the RAM overhead to 1 word per Task
and move the overall vtable into RAM. I expect you could do the same with future combinators (although most combinators probably don't need their own RawWakerVTable
anyway).
If you use the Locator
trait from my async API design, we could remove the vtable
member from Task
and replace it with a zero-sized type.
|
||
1. `spawn` should be a safe function, because it is called by high-level | ||
application code. However, that means it can be called by the future it | ||
contains. If handled naively, this would result in dropping the future while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully grok this point. Can you expand a bit more? Why would the future potentially be dropped while spawn
executes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spawn
takes the future to be executed as an argument. Because the executor is designed to work without alloc
, it can only hold a single future inside. If spawn
is called while a future is running I assume that means the caller wants to overwrite the existing future.
I added a note a couple paragraphs up (just before "Barbara identifies several factors...") with an explanation.
I could avoid this cost by making the future fixed (i.e. construct it in a const
context) at the expense of flexibility.
have tried as hard to use `Future`. Barbara has been paying attention to | ||
Rust long enough to know how significant the `Future` trait is in the Rust | ||
community and ecosystem. | ||
* [Niklaus] would really have struggled. If he asked for help, he probably |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as an important point. It's fairly "common knowledge" that Future
s themselves are very lightweight. This story makes it clear that for some resource constrained environments, the machinery required for getting futures to work can sometimes be too heavy weight. I can imagine there being many who would dismiss Niklaus, saying that Future
s are lightweight and totally appropriate for embedded development without realizing that this might not always be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 2016, Aaron Turon described Future
as zero cost, but he elaborated on what that meant, and "minimal impact on code size and RAM usage" was not a part of that definition. The phrase "zero cost" has been repeated in descriptions of Future
ever since, generally without the context that Aaron provided.
Most of Rust's "zero cost" abstractions have a much smaller (or zero) impact on code size and RAM usage after inlining, so it's easy to assume that Future
is lighter weight than it is.
required to implement executors and combinators. | ||
|
||
Barbara redesigns the library she is building to use a [different | ||
concept](https://github.com/tock/design-explorations/tree/master/zst_pointer_async) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you talk more about what design decisions this implementation made that made it more light-weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 2 things to address this:
- An analysis of why the
Future
-based app is larger than the no-Future
app (currently lines 174-200) - A list below (currently lines 214-220) explaining how I (Barbara) avoided the same costs in my async traits.
current `poll` returns `Poll::Pending`. This requires putting a retry loop | ||
into the executor. | ||
1. A kernel callback may call `Waker::wake` after its future finishes executing, | ||
and the executor must ignore those wakeups. This duplicates the "ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a sentence on why must the executor ignore the spurious wakeups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded it to state that the executor should not poll
a future after the future has returned Poll::Ready
, which is directly stated in the Future
docs. Is it clear now?
* `core::future::Future` isn't suitable for every asynchronous API in Rust. | ||
`Future` has a lot of capabilities, such as the ability to spawn | ||
dynamically-allocated futures, that are unnecessary in embedded system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it Future
itself or rather the wake based API? Also, are we sure that the tradeoffs in code size experienced here are fundamental to getting Future
to work? Or is it possible that there can be tweaks to the design that will allow these tradeoffs to go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really distinguish Future
from Waker
.
I think there are fundamental tradeoffs at play. The fact that Future
is demand-driven results in an information loss during callbacks. Kernel callbacks need to stash data somewhere for poll
to retrieve later, and poll
has to contain that retrieval logic. That ended up being the majority of the cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was great. I think @rylev identified everything that could use improvement. Thanks for writing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to make out of this. I think there is a lot of truth that Future
s are not zero-cost and it might impact embedded development. Some things that came to my mind here are:
- The lazy nature of
Future
s require them to be constructed on the stack before being moved to a place where they are executed. HugeFuture
s on the stack had been observed. That could easily blow the memory limits of a constrained device. - Building an allocator for variably sized
Future
s to support spawn might be tricky (ifspawn
only at startup is good enough could probably be done with a bump allocator, but that requires a customBox
and allocator APIs are also not fully there yet) Waker
s can outliveFuture
s, and executors need to support the.wake()
of aFuture
which no longer exists. But reference-counting is not an option, and static state is a bit tricky in combination with theSend/Sync
requirements.
But the experience report here centers a lot more on what had been experienced on a design which is centered around tock-os. In other places polling Future
s from arbitrary contexts would likely not be safe.
Maybe this report could provide some input on where documentation on how to build and use Future
s and executors could be improved?
|
||
* **What are the morals of the story?** | ||
* `core::future::Future` isn't suitable for every asynchronous API in Rust. | ||
`Future` has a lot of capabilities, such as the ability to spawn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a capability of Future
. It's a capability of some Future
executors. A Future
itself is just a resumable function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have non-static Future
s, but you cannot have non-static asynchronous operations with the traits I designed. I think the ability to support non-static futures is a feature of Future
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a good call out @jrvanwhy -- I've been looking for "design decisions" we made along the way that may inform the overall async tenets. There was definitely a push to remove Send+Sync+'static
bounds from future.
`PRINT_WAKER` is stored in `.bss`, which occupies space in RAM but not flash. It | ||
is two words in size. It points to a | ||
[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html) | ||
that is provided by the executor. `RawWakerVTable` does not appear to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the person who designed this, I feel not good about this statement. The whole RawWaker
mechansim was especially designed for supporting no-std environments. It allows to crate Waker
s which don't require any dynamic memory allocations by using static state. That wasn't possible before RawWaker
.
This section mainly critizes a vtable requiring 2 pointer sizes more than necessary for a certain use-case - in static memory. That should be very rarely an argument which breaks a certain design. And RawWaker
should also exist once per executor and not countless times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole
RawWaker
mechansim was especially designed for supporting no-std environments.
I reworded and reorganized this segment significantly, because I definitely didn't convey my thoughts accurately. RawWaker
was designed to work in both alloc
and no-alloc
environments, and it made compromises to support both. I decided to evaluate the overhead of using Future
when I noticed the compromises in RawWaker
's design in order to see if other compromises cost more.
Ultimately, the 2 extra words in RawWaker
are not a deal-breaker.
available, this executor contains a single future. The executor has a `spawn` | ||
function that accepts a future and starts running that future (overwriting the | ||
existing future in the executor if one is already present). Once started, the | ||
executor runs entirely in kernel callbacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether this is a valid design for Future
s. In general Future
s assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void all Send/Sync
guarantees. And not only this, it would open up the door to a variety of reentrancy problems that the Future
design avoids.
This might all be valid in the tock-os world which might make some extra guarantees. But even on a bare-metal microcontroller it wouldn't work. A more typical design which is inline with typical kernel and application design is to have the Waker
purely wake up the executor in the application space, and make sure that one continues to make forward progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general
Future
s assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void allSend/Sync
guarantees.
Everything happens in a single thread, so I don't see how Send
/Sync
are relevant. That may be the "extra guarantees" you refer to that Tock OS provides, though.
And not only this, it would open up the door to a variety of reentrancy problems that the Future design avoids.
The executor described here will not call poll
reentrantly. I don't see anywhere else reentrancy problems can occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example is that a Future
which is !Send
can expect all state which is stored in thread-local storage to be consistently accessible after polls. If one calls a future from arbitrary threads that is no longer true. In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.
Oh, that doesn't apply in Tock. Invoking a callback behaves almost identically to a virtual function call whose address is determined by the kernel.
application code. However, that means it can be called by the future it | ||
contains. If handled naively, this would result in dropping the future while | ||
it executes. Barbara adds runtime checks to identify this situation. | ||
1. `Waker` is `Sync`, so on a multithreaded system, a future could give another |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As outlined earlier, this is some kind of semi-legit design. The Waker
s really should just schedule the application tasks again and not trying to poll inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding you here. Are you suggesting something like the following?
wake
marks aFuture
as ready to run, but does not callpoll
.- The executor offers a
poll_future
function that polls the future if the future has been awoken (repeatedly, if necessary). - The application code has to call
poll_future
after invoking kernel callbacks.
Then the application would need to have a main loop, and that loop would look something like the following:
loop {
TASK1.poll_future();
TASK2.poll_future();
TASK3.poll_future();
kernel_ready_for_callbacks();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But the main-loop would typically be something like executor.run()
- and you've spawned all your tasks on the executor upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all something like a Winapi/QT/GTK eventloop are also executors :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how that would work, but the benefit is not clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that you could make all RawWaker
methods no-ops and instead poll every unfinished Future
on every iteration, saving some code in the executor at the cost of runtime performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is e.g. required to avoid the reentrancy problems. Let's say our async task calls a UART driver, which submits some bytes. It only writes half of it as part of an async fn write(&self, &[u8]) -> Result<usize, Error>
. The amounts of bytes written is signalled by the interrupt context, and then forwarded as the result of this sub-Future
- which returns Ready
. From there another call to the UART driver is made to start another async operation. This operation is now started from an interrupt context instead of the normal application context. In usual environments that is not allowed, because it now requires that resource to be reentrancy safe.
When the usual Future
model is implemented it isn't required to be reentrancy safe, because the execution model already guarantees that from inside a callback no further operations are started. The only thing which will ever happen is waking up the task in the executor, so that it can later run from a safe context.
This also prevents some other issues, e.g. "what happens if this resource is freed inside the callback". But maybe that's not too important in your "everything is static and will never get freed" scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that you could make all RawWaker methods no-ops and instead poll every unfinished Future on every iteration, saving some code in the executor at the cost of runtime performance.
Yes, that is indeed the most basic executor.
More efficient versions can e.g. block the main thread using an OS synchronization mechanism or a low level one like WFI
, and wake the executor from the interrupt.
https://docs.rs/embedded-executor/0.6.1/embedded_executor/ seems like a decent example on what can be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the dynamic allocs there: You would avoid them by making sure they only run at startup, and size the memory pool which is used for it accordingly (yes, requires some fiddling 😥). For the list of ready tasks, it's probably doable with a fixed size list or even some bitmasks if a well known amount of tasks need to be executed.
src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md
Outdated
Show resolved
Hide resolved
4. The application future has to look at the state saved in step 1 to determine | ||
what event happened. | ||
5. The application future handles the event. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the Future
s based version provides a lot more guarantees! E.g. it avoids reentrancy problems and makes sure all state associated with the async operation is only modified in a single place (and from a single thread/task). It's a bit of an apples to oranges comparison. If a trivial callback is good enough then Future
s will likely have overhead. But the places where state has to be persisted in a complex fashion across .await
pointers is where Future
s win, since they don't require encoding of complex state machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding reentrancy problems, making async state management easy, and easily representing complex asynchronous operations are all really nice features of Future
! I don't contest that.
However, they are all nice-to-have features, and it takes an awful lot of nice-to-have features to outweigh one deal-breaker. The amount of code generated by the executor and in poll
that doesn't exist in the non-Future
version is probably a deal-breaker for us.
…. RawWakerVTable and executor size. I never intended to imply that RawWakerVTable is a huge issue; rather, it is the item that made Barbara realize that she needs to do some benchmarking. The benchmark results showed that the executor logic and combinator logic is the main cost of `Future` for her use cases.
Building a It's been over a year since I ran this experiment, so I'm unsure what the documentation looked like at the time. However, searching for documentation on building executors gives a handful of tutorials, all of which require Ultimately, I realized I had to handle the case where |
One quick note as I read some of the prior comments. Remember that the role of these status quo stories is for people to convey what they experienced. It is explicitly not to propose solutions. Therefore, e.g. when @Matthias247 suggests this:
I would actually not want that at all. I think the report should focus on what people felt and experienced. The stories should definitely be technically accurate -- but in some cases, characters thought things that were wrong, or only partially informed. We should be able to capture that, and the FAQ can be especially useful for providing deeper technical info than the character knew at the time. |
Thanks so much for writing this up, @jrvanwhy! I'll second @nikomatsakis that I think this can be merged. |
One thing that we were discussing on zulip is that some of the comments on this PR might make good FAQs. |
I'll send the changes in a separate PR. |
Rendered
Barbara tries to implement an asynchronous API for embedded systems, and finds that
core::future::Future
has a very high overhead.