Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status quo: Barbara carefully dismisses embedded Future #85

Merged
merged 5 commits into from
Mar 30, 2021

Conversation

jrvanwhy
Copy link
Contributor

Rendered

Barbara tries to implement an asynchronous API for embedded systems, and finds that core::future::Future has a very high overhead.

…native character.

I two statements that disagreed with each other:
1. I could have used Grace instead of Barbara.
2. Grace may have handled the situation differently.

I do think that Barbara's experience with Rust affected how the story played out, so I removed the note stating that I could have used Grace instead. I think this is a bit more accurate, and removes the apparent contradiction.
Copy link
Member

@rylev rylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really insightful - thank you @jrvanwhy! There are some things that can be expanded upon to make this right for merging, but this is well on its way!

/// Passes `buffer` to the kernel, and prints it to the console. Returns a
/// future that returns `buffer` when the print is complete. The caller must
/// call kernel_ready_for_callbacks() when it is ready for the future to return.
fn print_buffer(buffer: &'static mut [u8]) -> PrintFuture {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason Barbara did not use async/await notation relevant for this story?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is. My understanding is that async/await notation can only block on other futures, and that at the leaf nodes where Rust's async APIs meet an external async API you need handwritten futures. In this case, this driver was interacting directly with the kernel.

I added a sentence to the above paragraph ("Barbara tries to implement... skeleton") explaining this.

is two words in size. It points to a
[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html)
that is provided by the executor. Considering that this code is designed to run
without `alloc`, `RawWakerVTable` seems awfully wasteful. `drop` is a no-op,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to talk about the impact of this waste. RawWakerVTable is only 4 words big. Probably many reading this story won't understand why 4 words is considered wasteful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(chiming in from Tock here, so ignore if not useful) @jrvanwhy: did you mean that the stable itself (which is 4 words in flash) is wasteful or the 2 words in memory---seems wasteful because without alloc we actually could know statically exactly what concrete type it would be, and also because we need these 2 words for each instance of async, rather than once for any program that uses futures?

Maybe both?

My interpretation had been the 2 words in memory, but @rylev's comment highlighted to me that I actually am not sure which this is referring to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a single copy of RawWakerVTable in flash, that's not bad. If there's 1 copy in RAM for each executor instance, that's pretty significant.

When I wrote the executor, I tried to avoid dynamic dispatch so the compiler could perform inlining. This meant the executor depended on the future's type, like:

pub struct Task<F: Future> {
    future: Option<F>,
}

Because of this, I needed the RawWakerVTable to be specific to the type of Future. I implemented this by putting it directly into the executor, i.e.:

pub struct Task<F: Future> {
    future: Option<F>,
    vtable: RawWakerVTable,
}

As a result, I had a RawWakerVTable instance in RAM for each Task.

Because future needs to be mutable, the Task gets placed in .data in RAM. As a result, the vtable ends up in RAM, and there's one per Task.

In retrospect, I could have broken the executor into two parts:

// Instantiated by client code, lives in flash.
pub struct TaskVTable<F: Future> {
    vtable: RawWakerVTable
}

// Instantiated by client code, lives in RAM
pub struct Task<F: future> {
    future: Option<F>,
    vtable: &'static TaskVTable<F>,
}

This would reduce the RAM overhead to 1 word per Task and move the overall vtable into RAM. I expect you could do the same with future combinators (although most combinators probably don't need their own RawWakerVTable anyway).

If you use the Locator trait from my async API design, we could remove the vtable member from Task and replace it with a zero-sized type.


1. `spawn` should be a safe function, because it is called by high-level
application code. However, that means it can be called by the future it
contains. If handled naively, this would result in dropping the future while
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully grok this point. Can you expand a bit more? Why would the future potentially be dropped while spawn executes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawn takes the future to be executed as an argument. Because the executor is designed to work without alloc, it can only hold a single future inside. If spawn is called while a future is running I assume that means the caller wants to overwrite the existing future.

I added a note a couple paragraphs up (just before "Barbara identifies several factors...") with an explanation.

I could avoid this cost by making the future fixed (i.e. construct it in a const context) at the expense of flexibility.

have tried as hard to use `Future`. Barbara has been paying attention to
Rust long enough to know how significant the `Future` trait is in the Rust
community and ecosystem.
* [Niklaus] would really have struggled. If he asked for help, he probably
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as an important point. It's fairly "common knowledge" that Futures themselves are very lightweight. This story makes it clear that for some resource constrained environments, the machinery required for getting futures to work can sometimes be too heavy weight. I can imagine there being many who would dismiss Niklaus, saying that Futures are lightweight and totally appropriate for embedded development without realizing that this might not always be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2016, Aaron Turon described Future as zero cost, but he elaborated on what that meant, and "minimal impact on code size and RAM usage" was not a part of that definition. The phrase "zero cost" has been repeated in descriptions of Future ever since, generally without the context that Aaron provided.

Most of Rust's "zero cost" abstractions have a much smaller (or zero) impact on code size and RAM usage after inlining, so it's easy to assume that Future is lighter weight than it is.

required to implement executors and combinators.

Barbara redesigns the library she is building to use a [different
concept](https://github.com/tock/design-explorations/tree/master/zst_pointer_async)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you talk more about what design decisions this implementation made that made it more light-weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 things to address this:

  1. An analysis of why the Future-based app is larger than the no-Future app (currently lines 174-200)
  2. A list below (currently lines 214-220) explaining how I (Barbara) avoided the same costs in my async traits.

current `poll` returns `Poll::Pending`. This requires putting a retry loop
into the executor.
1. A kernel callback may call `Waker::wake` after its future finishes executing,
and the executor must ignore those wakeups. This duplicates the "ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a sentence on why must the executor ignore the spurious wakeups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded it to state that the executor should not poll a future after the future has returned Poll::Ready, which is directly stated in the Future docs. Is it clear now?

Comment on lines 183 to 185
* `core::future::Future` isn't suitable for every asynchronous API in Rust.
`Future` has a lot of capabilities, such as the ability to spawn
dynamically-allocated futures, that are unnecessary in embedded system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it Future itself or rather the wake based API? Also, are we sure that the tradeoffs in code size experienced here are fundamental to getting Future to work? Or is it possible that there can be tweaks to the design that will allow these tradeoffs to go away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really distinguish Future from Waker.

I think there are fundamental tradeoffs at play. The fact that Future is demand-driven results in an information loss during callbacks. Kernel callbacks need to stash data somewhere for poll to retrieve later, and poll has to contain that retrieval logic. That ended up being the majority of the cost.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was great. I think @rylev identified everything that could use improvement. Thanks for writing this.

Copy link

@Matthias247 Matthias247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to make out of this. I think there is a lot of truth that Futures are not zero-cost and it might impact embedded development. Some things that came to my mind here are:

  • The lazy nature of Futures require them to be constructed on the stack before being moved to a place where they are executed. Huge Futures on the stack had been observed. That could easily blow the memory limits of a constrained device.
  • Building an allocator for variably sized Futures to support spawn might be tricky (if spawn only at startup is good enough could probably be done with a bump allocator, but that requires a custom Box and allocator APIs are also not fully there yet)
  • Wakers can outlive Futures, and executors need to support the .wake() of a Future which no longer exists. But reference-counting is not an option, and static state is a bit tricky in combination with the Send/Sync requirements.

But the experience report here centers a lot more on what had been experienced on a design which is centered around tock-os. In other places polling Futures from arbitrary contexts would likely not be safe.

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?


* **What are the morals of the story?**
* `core::future::Future` isn't suitable for every asynchronous API in Rust.
`Future` has a lot of capabilities, such as the ability to spawn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a capability of Future. It's a capability of some Future executors. A Future itself is just a resumable function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have non-static Futures, but you cannot have non-static asynchronous operations with the traits I designed. I think the ability to support non-static futures is a feature of Future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a good call out @jrvanwhy -- I've been looking for "design decisions" we made along the way that may inform the overall async tenets. There was definitely a push to remove Send+Sync+'static bounds from future.

`PRINT_WAKER` is stored in `.bss`, which occupies space in RAM but not flash. It
is two words in size. It points to a
[`RawWakerVTable`](https://doc.rust-lang.org/stable/core/task/struct.RawWakerVTable.html)
that is provided by the executor. `RawWakerVTable` does not appear to be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the person who designed this, I feel not good about this statement. The whole RawWaker mechansim was especially designed for supporting no-std environments. It allows to crate Wakers which don't require any dynamic memory allocations by using static state. That wasn't possible before RawWaker.

This section mainly critizes a vtable requiring 2 pointer sizes more than necessary for a certain use-case - in static memory. That should be very rarely an argument which breaks a certain design. And RawWaker should also exist once per executor and not countless times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole RawWaker mechansim was especially designed for supporting no-std environments.

I reworded and reorganized this segment significantly, because I definitely didn't convey my thoughts accurately. RawWaker was designed to work in both alloc and no-alloc environments, and it made compromises to support both. I decided to evaluate the overhead of using Future when I noticed the compromises in RawWaker's design in order to see if other compromises cost more.

Ultimately, the 2 extra words in RawWaker are not a deal-breaker.

available, this executor contains a single future. The executor has a `spawn`
function that accepts a future and starts running that future (overwriting the
existing future in the executor if one is already present). Once started, the
executor runs entirely in kernel callbacks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this is a valid design for Futures. In general Futures assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void all Send/Sync guarantees. And not only this, it would open up the door to a variety of reentrancy problems that the Future design avoids.

This might all be valid in the tock-os world which might make some extra guarantees. But even on a bare-metal microcontroller it wouldn't work. A more typical design which is inline with typical kernel and application design is to have the Waker purely wake up the executor in the application space, and make sure that one continues to make forward progress.

Copy link
Contributor Author

@jrvanwhy jrvanwhy Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general Futures assume they are only called by one executor in a serial fashion, and not from a variety of places. Doing the latter would void all Send/Sync guarantees.

Everything happens in a single thread, so I don't see how Send/Sync are relevant. That may be the "extra guarantees" you refer to that Tock OS provides, though.

And not only this, it would open up the door to a variety of reentrancy problems that the Future design avoids.

The executor described here will not call poll reentrantly. I don't see anywhere else reentrancy problems can occur.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is that a Future which is !Send can expect all state which is stored in thread-local storage to be consistently accessible after polls. If one calls a future from arbitrary threads that is no longer true. In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general an interrupt context is similar to a separate thread. I'm not familiar enough with tock to say whether this applies or doesn't apply there.

Oh, that doesn't apply in Tock. Invoking a callback behaves almost identically to a virtual function call whose address is determined by the kernel.

application code. However, that means it can be called by the future it
contains. If handled naively, this would result in dropping the future while
it executes. Barbara adds runtime checks to identify this situation.
1. `Waker` is `Sync`, so on a multithreaded system, a future could give another

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As outlined earlier, this is some kind of semi-legit design. The Wakers really should just schedule the application tasks again and not trying to poll inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding you here. Are you suggesting something like the following?

  1. wake marks a Future as ready to run, but does not call poll.
  2. The executor offers a poll_future function that polls the future if the future has been awoken (repeatedly, if necessary).
  3. The application code has to call poll_future after invoking kernel callbacks.

Then the application would need to have a main loop, and that loop would look something like the following:

loop {
    TASK1.poll_future();
    TASK2.poll_future();
    TASK3.poll_future();
    kernel_ready_for_callbacks();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the main-loop would typically be something like executor.run() - and you've spawned all your tasks on the executor upfront.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After all something like a Winapi/QT/GTK eventloop are also executors :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how that would work, but the benefit is not clear to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you could make all RawWaker methods no-ops and instead poll every unfinished Future on every iteration, saving some code in the executor at the cost of runtime performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is e.g. required to avoid the reentrancy problems. Let's say our async task calls a UART driver, which submits some bytes. It only writes half of it as part of an async fn write(&self, &[u8]) -> Result<usize, Error>. The amounts of bytes written is signalled by the interrupt context, and then forwarded as the result of this sub-Future - which returns Ready. From there another call to the UART driver is made to start another async operation. This operation is now started from an interrupt context instead of the normal application context. In usual environments that is not allowed, because it now requires that resource to be reentrancy safe.

When the usual Future model is implemented it isn't required to be reentrancy safe, because the execution model already guarantees that from inside a callback no further operations are started. The only thing which will ever happen is waking up the task in the executor, so that it can later run from a safe context.

This also prevents some other issues, e.g. "what happens if this resource is freed inside the callback". But maybe that's not too important in your "everything is static and will never get freed" scenario.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you could make all RawWaker methods no-ops and instead poll every unfinished Future on every iteration, saving some code in the executor at the cost of runtime performance.

Yes, that is indeed the most basic executor.

More efficient versions can e.g. block the main thread using an OS synchronization mechanism or a low level one like WFI, and wake the executor from the interrupt.

https://docs.rs/embedded-executor/0.6.1/embedded_executor/ seems like a decent example on what can be done.

Copy link

@Matthias247 Matthias247 Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the dynamic allocs there: You would avoid them by making sure they only run at startup, and size the memory pool which is used for it accordingly (yes, requires some fiddling 😥). For the list of ready tasks, it's probably doable with a fixed size list or even some bitmasks if a well known amount of tasks need to be executed.

4. The application future has to look at the state saved in step 1 to determine
what event happened.
5. The application future handles the event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the Futures based version provides a lot more guarantees! E.g. it avoids reentrancy problems and makes sure all state associated with the async operation is only modified in a single place (and from a single thread/task). It's a bit of an apples to oranges comparison. If a trivial callback is good enough then Futures will likely have overhead. But the places where state has to be persisted in a complex fashion across .await pointers is where Futures win, since they don't require encoding of complex state machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding reentrancy problems, making async state management easy, and easily representing complex asynchronous operations are all really nice features of Future! I don't contest that.

However, they are all nice-to-have features, and it takes an awful lot of nice-to-have features to outweigh one deal-breaker. The amount of code generated by the executor and in poll that doesn't exist in the non-Future version is probably a deal-breaker for us.

…. RawWakerVTable and executor size.

I never intended to imply that RawWakerVTable is a huge issue; rather, it is the item that made Barbara realize that she needs to do some benchmarking. The benchmark results showed that the executor logic and combinator logic is the main cost of `Future` for her use cases.
@jrvanwhy
Copy link
Contributor Author

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?

Building a Future by hand seems easy enough (except for combinators), but I think there is room for better documentation for building executors.

It's been over a year since I ran this experiment, so I'm unsure what the documentation looked like at the time. However, searching for documentation on building executors gives a handful of tutorials, all of which require alloc (even with "embedded" added to my search). A no-alloc executor ends up looking significantly different from the executors in the tutorials.

Ultimately, I realized I had to handle the case where Waker::wake() is called during poll by thinking about what happens in a multithreaded context. I did not find that requirement in any of the documentation I looked at.

@nikomatsakis
Copy link
Contributor

One quick note as I read some of the prior comments. Remember that the role of these status quo stories is for people to convey what they experienced. It is explicitly not to propose solutions. Therefore, e.g. when @Matthias247 suggests this:

Maybe this report could provide some input on where documentation on how to build and use Futures and executors could be improved?

I would actually not want that at all. I think the report should focus on what people felt and experienced.

The stories should definitely be technically accurate -- but in some cases, characters thought things that were wrong, or only partially informed. We should be able to capture that, and the FAQ can be especially useful for providing deeper technical info than the character knew at the time.

@tmandry
Copy link
Member

tmandry commented Mar 30, 2021

Thanks so much for writing this up, @jrvanwhy! I'll second @nikomatsakis that I think this can be merged.

@tmandry tmandry merged commit 871aee6 into rust-lang:master Mar 30, 2021
@nikomatsakis
Copy link
Contributor

One thing that we were discussing on zulip is that some of the comments on this PR might make good FAQs.

@jrvanwhy
Copy link
Contributor Author

One thing that we were discussing on zulip is that some of the comments on this PR might make good FAQs.

I'll send the changes in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-quo-story-ideas "Status quo" user story ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants