status quo: Barbara carefully dismisses embedded `Future` #85

jrvanwhy · 2021-03-24T00:28:04Z

Barbara tries to implement an asynchronous API for embedded systems, and finds that core::future::Future has a very high overhead.

…t zero-cost after all.

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

…native character. I two statements that disagreed with each other: 1. I could have used Grace instead of Barbara. 2. Grace may have handled the situation differently. I do think that Barbara's experience with Rust affected how the story played out, so I removed the note stating that I could have used Grace instead. I think this is a bit more accurate, and removes the apparent contradiction.

rylev

This is really insightful - thank you @jrvanwhy! There are some things that can be expanded upon to make this right for merging, but this is well on its way!

rylev · 2021-03-24T09:28:59Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+/// Passes `buffer` to the kernel, and prints it to the console. Returns a
+/// future that returns `buffer` when the print is complete. The caller must
+/// call kernel_ready_for_callbacks() when it is ready for the future to return. 
+fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture {


Is the reason Barbara did not use async/await notation relevant for this story?

Yes, I think it is. My understanding is that async/await notation can only block on other futures, and that at the leaf nodes where Rust's async APIs meet an external async API you need handwritten futures. In this case, this driver was interacting directly with the kernel.

I added a sentence to the above paragraph ("Barbara tries to implement... skeleton") explaining this.

rylev · 2021-03-24T09:32:26Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+is two words in size. It points to a
+[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html)
+that is provided by the executor. Considering that this code is designed to run
+without `alloc`, `RawWakerVTable` seems awfully wasteful. `drop` is a no-op,


It would be great to talk about the impact of this waste. RawWakerVTable is only 4 words big. Probably many reading this story won't understand why 4 words is considered wasteful.

(chiming in from Tock here, so ignore if not useful) @jrvanwhy: did you mean that the stable itself (which is 4 words in flash) is wasteful or the 2 words in memory---seems wasteful because without alloc we actually could know statically exactly what concrete type it would be, and also because we need these 2 words for each instance of async, rather than once for any program that uses futures?

Maybe both?

My interpretation had been the 2 words in memory, but @rylev's comment highlighted to me that I actually am not sure which this is referring to.

If there's a single copy of RawWakerVTable in flash, that's not bad. If there's 1 copy in RAM for each executor instance, that's pretty significant.

When I wrote the executor, I tried to avoid dynamic dispatch so the compiler could perform inlining. This meant the executor depended on the future's type, like:

pub struct Task<F: Future> { future: Option<F>, }

Because of this, I needed the RawWakerVTable to be specific to the type of Future. I implemented this by putting it directly into the executor, i.e.:

pub struct Task<F: Future> { future: Option<F>, vtable: RawWakerVTable, }

As a result, I had a RawWakerVTable instance in RAM for each Task.

Because future needs to be mutable, the Task gets placed in .data in RAM. As a result, the vtable ends up in RAM, and there's one per Task.

In retrospect, I could have broken the executor into two parts:

// Instantiated by client code, lives in flash. pub struct TaskVTable<F: Future> { vtable: RawWakerVTable } // Instantiated by client code, lives in RAM pub struct Task<F: future> { future: Option<F>, vtable: &'static TaskVTable<F>, }

This would reduce the RAM overhead to 1 word per Task and move the overall vtable into RAM. I expect you could do the same with future combinators (although most combinators probably don't need their own RawWakerVTable anyway).

If you use the Locator trait from my async API design, we could remove the vtable member from Task and replace it with a zero-sized type.

rylev · 2021-03-24T09:35:02Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+
+1. `spawn` should be a safe function, because it is called by high-level
+   application code. However, that means it can be called by the future it
+   contains. If handled naively, this would result in dropping the future while


I'm not sure I fully grok this point. Can you expand a bit more? Why would the future potentially be dropped while spawn executes?

spawn takes the future to be executed as an argument. Because the executor is designed to work without alloc, it can only hold a single future inside. If spawn is called while a future is running I assume that means the caller wants to overwrite the existing future.

I added a note a couple paragraphs up (just before "Barbara identifies several factors...") with an explanation.

I could avoid this cost by making the future fixed (i.e. construct it in a const context) at the expense of flexibility.

rylev · 2021-03-24T09:39:58Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+      have tried as hard to use `Future`. Barbara has been paying attention to
+      Rust long enough to know how significant the `Future` trait is in the Rust
+      community and ecosystem.
+    * [Niklaus] would really have struggled. If he asked for help, he probably


This strikes me as an important point. It's fairly "common knowledge" that Futures themselves are very lightweight. This story makes it clear that for some resource constrained environments, the machinery required for getting futures to work can sometimes be too heavy weight. I can imagine there being many who would dismiss Niklaus, saying that Futures are lightweight and totally appropriate for embedded development without realizing that this might not always be true.

In 2016, Aaron Turon described Future as zero cost, but he elaborated on what that meant, and "minimal impact on code size and RAM usage" was not a part of that definition. The phrase "zero cost" has been repeated in descriptions of Future ever since, generally without the context that Aaron provided.

Most of Rust's "zero cost" abstractions have a much smaller (or zero) impact on code size and RAM usage after inlining, so it's easy to assume that Future is lighter weight than it is.

rylev · 2021-03-24T09:40:35Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+required to implement executors and combinators.
+
+Barbara redesigns the library she is building to use a [different
+concept](https://github.com/tock/design-explorations/tree/master/zst_pointer_async)


Can you talk more about what design decisions this implementation made that made it more light-weight?

I added 2 things to address this:

An analysis of why the Future-based app is larger than the no-Future app (currently lines 174-200)

A list below (currently lines 214-220) explaining how I (Barbara) avoided the same costs in my async traits.

rylev · 2021-03-24T09:42:03Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+   current `poll` returns `Poll::Pending`. This requires putting a retry loop
+   into the executor.
+1. A kernel callback may call `Waker::wake` after its future finishes executing,
+   and the executor must ignore those wakeups. This duplicates the "ignore


Can you add a sentence on why must the executor ignore the spurious wakeups?

I reworded it to state that the executor should not poll a future after the future has returned Poll::Ready, which is directly stated in the Future docs. Is it clear now?

rylev · 2021-03-24T09:44:57Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+    * `core::future::Future` isn't suitable for every asynchronous API in Rust.
+      `Future` has a lot of capabilities, such as the ability to spawn
+      dynamically-allocated futures, that are unnecessary in embedded system.


Is it Future itself or rather the wake based API? Also, are we sure that the tradeoffs in code size experienced here are fundamental to getting Future to work? Or is it possible that there can be tweaks to the design that will allow these tradeoffs to go away?

I didn't really distinguish Future from Waker.

I think there are fundamental tradeoffs at play. The fact that Future is demand-driven results in an information loss during callbacks. Kernel callbacks need to stash data somewhere for poll to retrieve later, and poll has to contain that retrieval logic. That ended up being the majority of the cost.

nikomatsakis

This was great. I think @rylev identified everything that could use improvement. Thanks for writing this.

…ents.

Matthias247

I'm not sure what to make out of this. I think there is a lot of truth that Futures are not zero-cost and it might impact embedded development. Some things that came to my mind here are:

The lazy nature of Futures require them to be constructed on the stack before being moved to a place where they are executed. Huge Futures on the stack had been observed. That could easily blow the memory limits of a constrained device.
Building an allocator for variably sized Futures to support spawn might be tricky (if spawn only at startup is good enough could probably be done with a bump allocator, but that requires a custom Box and allocator APIs are also not fully there yet)
Wakers can outlive Futures, and executors need to support the .wake() of a Future which no longer exists. But reference-counting is not an option, and static state is a bit tricky in combination with the Send/Sync requirements.

But the experience report here centers a lot more on what had been experienced on a design which is centered around tock-os. In other places polling Futures from arbitrary contexts would likely not be safe.

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?

Matthias247 · 2021-03-25T02:49:31Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+
+* **What are the morals of the story?**
+    * `core::future::Future` isn't suitable for every asynchronous API in Rust.
+      `Future` has a lot of capabilities, such as the ability to spawn


This is not really a capability of Future. It's a capability of some Future executors. A Future itself is just a resumable function.

You can have non-static Futures, but you cannot have non-static asynchronous operations with the traits I designed. I think the ability to support non-static futures is a feature of Future.

Ah, this is a good call out @jrvanwhy -- I've been looking for "design decisions" we made along the way that may inform the overall async tenets. There was definitely a push to remove Send+Sync+'static bounds from future.

Matthias247 · 2021-03-25T02:53:38Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+`PRINT_WAKER` is stored in `.bss`, which occupies space in RAM but not flash. It
+is two words in size. It points to a
+[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html)
+that is provided by the executor. `RawWakerVTable` does not appear to be


As the person who designed this, I feel not good about this statement. The whole RawWaker mechansim was especially designed for supporting no-std environments. It allows to crate Wakers which don't require any dynamic memory allocations by using static state. That wasn't possible before RawWaker.

This section mainly critizes a vtable requiring 2 pointer sizes more than necessary for a certain use-case - in static memory. That should be very rarely an argument which breaks a certain design. And RawWaker should also exist once per executor and not countless times.

The whole RawWaker mechansim was especially designed for supporting no-std environments.

I reworded and reorganized this segment significantly, because I definitely didn't convey my thoughts accurately. RawWaker was designed to work in both alloc and no-alloc environments, and it made compromises to support both. I decided to evaluate the overhead of using Future when I noticed the compromises in RawWaker's design in order to see if other compromises cost more.

Ultimately, the 2 extra words in RawWaker are not a deal-breaker.

Matthias247 · 2021-03-25T02:59:16Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+available, this executor contains a single future. The executor has a `spawn`
+function that accepts a future and starts running that future (overwriting the
+existing future in the executor if one is already present). Once started, the
+executor runs entirely in kernel callbacks.


I am not sure whether this is a valid design for Futures. In general Futures assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void all Send/Sync guarantees. And not only this, it would open up the door to a variety of reentrancy problems that the Future design avoids.

This might all be valid in the tock-os world which might make some extra guarantees. But even on a bare-metal microcontroller it wouldn't work. A more typical design which is inline with typical kernel and application design is to have the Waker purely wake up the executor in the application space, and make sure that one continues to make forward progress.

In general Futures assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void all Send/Sync guarantees.

Everything happens in a single thread, so I don't see how Send/Sync are relevant. That may be the "extra guarantees" you refer to that Tock OS provides, though.

And not only this, it would open up the door to a variety of reentrancy problems that the Future design avoids.

The executor described here will not call poll reentrantly. I don't see anywhere else reentrancy problems can occur.

One example is that a Future which is !Send can expect all state which is stored in thread-local storage to be consistently accessible after polls. If one calls a future from arbitrary threads that is no longer true. In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.

In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.

Oh, that doesn't apply in Tock. Invoking a callback behaves almost identically to a virtual function call whose address is determined by the kernel.

Matthias247 · 2021-03-25T03:00:58Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+   application code. However, that means it can be called by the future it
+   contains. If handled naively, this would result in dropping the future while
+   it executes. Barbara adds runtime checks to identify this situation.
+1. `Waker` is `Sync`, so on a multithreaded system, a future could give another


As outlined earlier, this is some kind of semi-legit design. The Wakers really should just schedule the application tasks again and not trying to poll inline.

I'm having trouble understanding you here. Are you suggesting something like the following?

wake marks a Future as ready to run, but does not call poll.

The executor offers a poll_future function that polls the future if the future has been awoken (repeatedly, if necessary).

The application code has to call poll_future after invoking kernel callbacks.

Then the application would need to have a main loop, and that loop would look something like the following:

loop { TASK1.poll_future(); TASK2.poll_future(); TASK3.poll_future(); kernel_ready_for_callbacks(); }

Yes. But the main-loop would typically be something like executor.run() - and you've spawned all your tasks on the executor upfront.

After all something like a Winapi/QT/GTK eventloop are also executors :-)

I see how that would work, but the benefit is not clear to me.

I suppose that you could make all RawWaker methods no-ops and instead poll every unfinished Future on every iteration, saving some code in the executor at the cost of runtime performance.

This is e.g. required to avoid the reentrancy problems. Let's say our async task calls a UART driver, which submits some bytes. It only writes half of it as part of an async fn write(&self, &[u8]) -> Result<usize, Error>. The amounts of bytes written is signalled by the interrupt context, and then forwarded as the result of this sub-Future - which returns Ready. From there another call to the UART driver is made to start another async operation. This operation is now started from an interrupt context instead of the normal application context. In usual environments that is not allowed, because it now requires that resource to be reentrancy safe.

When the usual Future model is implemented it isn't required to be reentrancy safe, because the execution model already guarantees that from inside a callback no further operations are started. The only thing which will ever happen is waking up the task in the executor, so that it can later run from a safe context.

This also prevents some other issues, e.g. "what happens if this resource is freed inside the callback". But maybe that's not too important in your "everything is static and will never get freed" scenario.

I suppose that you could make all RawWaker methods no-ops and instead poll every unfinished Future on every iteration, saving some code in the executor at the cost of runtime performance.

Yes, that is indeed the most basic executor.

More efficient versions can e.g. block the main thread using an OS synchronization mechanism or a low level one like WFI, and wake the executor from the interrupt.

https://docs.rs/embedded-executor/0.6.1/embedded_executor/ seems like a decent example on what can be done.

Regarding the dynamic allocs there: You would avoid them by making sure they only run at startup, and size the memory pool which is used for it accordingly (yes, requires some fiddling 😥). For the list of ready tasks, it's probably doable with a fixed size list or even some bitmasks if a well known amount of tasks need to be executed.

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

Matthias247 · 2021-03-25T03:06:30Z

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

+4. The application future has to look at the state saved in step 1 to determine
+   what event happened.
+5. The application future handles the event.
+


But the Futures based version provides a lot more guarantees! E.g. it avoids reentrancy problems and makes sure all state associated with the async operation is only modified in a single place (and from a single thread/task). It's a bit of an apples to oranges comparison. If a trivial callback is good enough then Futures will likely have overhead. But the places where state has to be persisted in a complex fashion across .await pointers is where Futures win, since they don't require encoding of complex state machines.

Avoiding reentrancy problems, making async state management easy, and easily representing complex asynchronous operations are all really nice features of Future! I don't contest that.

However, they are all nice-to-have features, and it takes an awful lot of nice-to-have features to outweigh one deal-breaker. The amount of code generated by the executor and in poll that doesn't exist in the non-Future version is probably a deal-breaker for us.

…. RawWakerVTable and executor size. I never intended to imply that RawWakerVTable is a huge issue; rather, it is the item that made Barbara realize that she needs to do some benchmarking. The benchmark results showed that the executor logic and combinator logic is the main cost of `Future` for her use cases.

jrvanwhy · 2021-03-25T04:58:10Z

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?

Building a Future by hand seems easy enough (except for combinators), but I think there is room for better documentation for building executors.

It's been over a year since I ran this experiment, so I'm unsure what the documentation looked like at the time. However, searching for documentation on building executors gives a handful of tutorials, all of which require alloc (even with "embedded" added to my search). A no-alloc executor ends up looking significantly different from the executors in the tutorials.

Ultimately, I realized I had to handle the case where Waker::wake() is called during poll by thinking about what happens in a multithreaded context. I did not find that requirement in any of the documentation I looked at.

nikomatsakis · 2021-03-25T13:01:31Z

One quick note as I read some of the prior comments. Remember that the role of these status quo stories is for people to convey what they experienced. It is explicitly not to propose solutions. Therefore, e.g. when @Matthias247 suggests this:

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?

I would actually not want that at all. I think the report should focus on what people felt and experienced.

The stories should definitely be technically accurate -- but in some cases, characters thought things that were wrong, or only partially informed. We should be able to capture that, and the FAQ can be especially useful for providing deeper technical info than the character knew at the time.

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md

tmandry · 2021-03-30T00:52:46Z

Thanks so much for writing this up, @jrvanwhy! I'll second @nikomatsakis that I think this can be merged.

nikomatsakis · 2021-03-30T01:09:40Z

One thing that we were discussing on zulip is that some of the comments on this PR might make good FAQs.

jrvanwhy · 2021-03-30T01:24:11Z

One thing that we were discussing on zulip is that some of the comments on this PR might make good FAQs.

I'll send the changes in a separate PR.

Add a story about Barbara discovering that core::future::Future is no…

ab7713c

…t zero-cost after all.

alevy reviewed Mar 24, 2021

View reviewed changes

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md Outdated Show resolved Hide resolved

rylev suggested changes Mar 24, 2021

View reviewed changes

MidasLamb mentioned this pull request Mar 24, 2021

Alan becomes a better C# programmer #84

Closed

nikomatsakis approved these changes Mar 24, 2021

View reviewed changes

Barbara carefully dismisses embedded future: address more review comm…

d311dae

…ents.

Matthias247 reviewed Mar 25, 2021

View reviewed changes

gilescope reviewed Mar 26, 2021

View reviewed changes

src/vision/status_quo/barbara_carefully_dismisses_embedded_future.md Show resolved Hide resolved

Status quo story: fix a missing return statement.

80b842e

eminence mentioned this pull request Mar 26, 2021

[meta] Covering the full range of status-quo stories #98

Open

tmandry merged commit 871aee6 into rust-lang:master Mar 30, 2021

jrvanwhy mentioned this pull request Mar 30, 2021

Status quo: add FAQ entries based on code review feedback to Barbara Carefully Dismisses Embedded Future. #108

Merged

nikomatsakis added the status-quo-story-ideas "Status quo" user story ideas label Apr 6, 2021

status quo: Barbara carefully dismisses embedded Future #85

status quo: Barbara carefully dismisses embedded Future #85

Conversation

jrvanwhy commented Mar 24, 2021

rylev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis left a comment

Choose a reason for hiding this comment

Matthias247 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrvanwhy Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Matthias247 Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrvanwhy commented Mar 25, 2021

nikomatsakis commented Mar 25, 2021

tmandry commented Mar 30, 2021

nikomatsakis commented Mar 30, 2021

jrvanwhy commented Mar 30, 2021

status quo: Barbara carefully dismisses embedded `Future` #85

status quo: Barbara carefully dismisses embedded `Future` #85

jrvanwhy Mar 25, 2021 •

edited

Loading

Matthias247 Mar 25, 2021 •

edited

Loading